metadata
base_model: Snowflake/snowflake-arctic-embed-m
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
- dot_accuracy@1
- dot_accuracy@3
- dot_accuracy@5
- dot_accuracy@10
- dot_precision@1
- dot_precision@3
- dot_precision@5
- dot_precision@10
- dot_recall@1
- dot_recall@3
- dot_recall@5
- dot_recall@10
- dot_ndcg@10
- dot_mrr@10
- dot_map@100
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:200
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
widget:
- source_sentence: >-
What measures should be taken to ensure that automated systems are safe
and effective before deployment?
sentences:
- >2
AI BILL OF RIGHTS
FFECTIVE SYSTEMS
ineffective systems. Automated systems should be
communities, stakeholders, and domain experts to identify
Systems should undergo pre-deployment testing, risk
that demonstrate they are safe and effective based on
including those beyond the intended use, and adherence to
protective measures should include the possibility of not
Automated systems should not be designed with an intent
reasonably foreseeable possibility of endangering your safety or the
safety of your community. They should
stemming from unintended, yet foreseeable, uses or
SECTION TITLE
BLUEPRINT FOR AN
SAFE AND E
You should be protected from unsafe or
developed with consultation from diverse
concerns, risks, and potential impacts of the system.
identification and mitigation, and ongoing monitoring
their intended use, mitigation of unsafe outcomes
domain-specific standards. Outcomes of these
deploying the system or removing a system from use.
or
be designed to proactively protect you from harms
impacts of automated systems. You should be protected from inappropriate
or irrelevant data use in the
design, development, and deployment of automated systems, and from the
compounded harm of its reuse.
Independent evaluation and reporting that confirms that the system is
safe and effective, including reporting of
steps taken to mitigate potential harms, should be performed and the
results made public whenever possible.
ALGORITHMIC DISCRIMINATION PROTECTIONS
You should not face discrimination by algorithms and systems should be
used and designed in
an equitable way. Algorithmic discrimination occurs when automated
systems contribute to unjustified
different treatment or impacts disfavoring people based on their race,
color, ethnicity, sex (including
pregnancy, childbirth, and related medical conditions, gender identity,
intersex status, and sexual
orientation), religion, age, national origin, disability, veteran
status, genetic information, or any other
classification protected by law. Depending on the specific
circumstances, such algorithmic discrimination
may violate legal protections. Designers, developers, and deployers of
automated systems should take
proactive
and
continuous
measures
to
protect
individuals
and
communities
from algorithmic
discrimination and to use and design systems in an equitable way. This
protection should include proactive
equity assessments as part of the system design, use of representative
data and protection against proxies
for demographic features, ensuring accessibility for people with
disabilities in design and development,
pre-deployment and ongoing disparity testing and mitigation, and clear
organizational oversight. Independent
evaluation and plain language reporting in the form of an algorithmic
impact assessment, including
disparity testing results and mitigation information, should be
performed and made public whenever
possible to confirm these protections.
5
- >
You should be protected from abusive data practices via built-in
protections and you should have agency over how data about
you is used. You should be protected from violations of privacy through
design choices that ensure such protections are included by default,
including
ensuring that data collection conforms to reasonable expectations and
that
only data strictly necessary for the specific context is collected.
Designers, de
velopers, and deployers of automated systems should seek your
permission
and respect your decisions regarding collection, use, access, transfer,
and de
letion of your data in appropriate ways and to the greatest extent
possible;
where not possible, alternative privacy by design safeguards should be
used.
Systems should not employ user experience and design decisions that
obfus
cate user choice or burden users with defaults that are privacy
invasive. Con
sent should only be used to justify collection of data in cases where it
can be
appropriately and meaningfully given. Any consent requests should be
brief,
be understandable in plain language, and give you agency over data
collection
and the specific context of use; current hard-to-understand no
tice-and-choice practices for broad uses of data should be changed.
Enhanced
protections and restrictions for data and inferences related to
sensitive do
mains, including health, work, education, criminal justice, and finance,
and
for data pertaining to youth should put you first. In sensitive domains,
your
data and related inferences should only be used for necessary functions,
and
you should be protected by ethical review and use prohibitions. You and
your
communities should be free from unchecked surveillance; surveillance
tech
nologies should be subject to heightened oversight that includes at
least
pre-deployment assessment of their potential harms and scope limits to
pro
tect privacy and civil liberties. Continuous surveillance and
monitoring
should not be used in education, work, housing, or in other contexts
where the
use of such surveillance technologies is likely to limit rights,
opportunities, or
access. Whenever possible, you should have access to reporting that
confirms
your data decisions have been respected and provides an assessment of
the
potential impact of surveillance technologies on your rights,
opportunities, or
access.
DATA PRIVACY
30
- >
APPENDIX
Lisa Feldman Barrett
Madeline Owens
Marsha Tudor
Microsoft Corporation
MITRE Corporation
National Association for the
Advancement of Colored People
Legal Defense and Educational
Fund
National Association of Criminal
Defense Lawyers
National Center for Missing &
Exploited Children
National Fair Housing Alliance
National Immigration Law Center
NEC Corporation of America
New America’s Open Technology
Institute
New York Civil Liberties Union
No Name Provided
Notre Dame Technology Ethics
Center
Office of the Ohio Public Defender
Onfido
Oosto
Orissa Rose
Palantir
Pangiam
Parity Technologies
Patrick A. Stewart, Jeffrey K.
Mullins, and Thomas J. Greitens
Pel Abbott
Philadelphia Unemployment
Project
Project On Government Oversight
Recording Industry Association of
America
Robert Wilkens
Ron Hedges
Science, Technology, and Public
Policy Program at University of
Michigan Ann Arbor
Security Industry Association
Sheila Dean
Software & Information Industry
Association
Stephanie Dinkins and the Future
Histories Studio at Stony Brook
University
TechNet
The Alliance for Media Arts and
Culture, MIT Open Documentary
Lab and Co-Creation Studio, and
Immerse
The International Brotherhood of
Teamsters
The Leadership Conference on
Civil and Human Rights
Thorn
U.S. Chamber of Commerce’s
Technology Engagement Center
Uber Technologies
University of Pittsburgh
Undergraduate Student
Collaborative
Upturn
US Technology Policy Committee
of the Association of Computing
Machinery
Virginia Puccio
Visar Berisha and Julie Liss
XR Association
XR Safety Initiative
• As an additional effort to reach out to stakeholders regarding the
RFI, OSTP conducted two listening sessions
for members of the public. The listening sessions together drew upwards
of 300 participants. The Science and
Technology Policy Institute produced a synopsis of both the RFI
submissions and the feedback at the listening
sessions.115
61
- source_sentence: How does the document address algorithmic discrimination protections?
sentences:
- >2
SAFE AND EFFECTIVE
SYSTEMS
WHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS
The expectations for automated systems are meant to serve as a blueprint
for the development of additional
technical standards and practices that are tailored for particular
sectors and contexts.
Ongoing monitoring. Automated systems should have ongoing monitoring
procedures, including recalibra
tion procedures, in place to ensure that their performance does not fall
below an acceptable level over time,
based on changing real-world conditions or deployment contexts,
post-deployment modification, or unexpect
ed conditions. This ongoing monitoring should include continuous
evaluation of performance metrics and
harm assessments, updates of any systems, and retraining of any machine
learning models as necessary, as well
as ensuring that fallback mechanisms are in place to allow reversion to
a previously working system. Monitor
ing should take into account the performance of both technical system
components (the algorithm as well as
any hardware components, data inputs, etc.) and human operators. It
should include mechanisms for testing
the actual accuracy of any predictions or recommendations generated by a
system, not just a human operator’s
determination of their accuracy. Ongoing monitoring procedures should
include manual, human-led monitor
ing as a check in the event there are shortcomings in automated
monitoring systems. These monitoring proce
dures should be in place for the lifespan of the deployed automated
system.
Clear organizational oversight. Entities responsible for the development
or use of automated systems
should lay out clear governance structures and procedures. This
includes clearly-stated governance proce
dures before deploying the system, as well as responsibility of specific
individuals or entities to oversee ongoing
assessment and mitigation. Organizational stakeholders including those
with oversight of the business process
or operation being automated, as well as other organizational divisions
that may be affected due to the use of
the system, should be involved in establishing governance procedures.
Responsibility should rest high enough
in the organization that decisions about resources, mitigation, incident
response, and potential rollback can be
made promptly, with sufficient weight given to risk mitigation
objectives against competing concerns. Those
holding this responsibility should be made aware of any use cases with
the potential for meaningful impact on
people’s rights, opportunities, or access as determined based on risk
identification procedures. In some cases,
it may be appropriate for an independent ethics review to be conducted
before deployment.
Avoid inappropriate, low-quality, or irrelevant data use and the
compounded harm of its
reuse
Relevant and high-quality data. Data used as part of any automated
system’s creation, evaluation, or
deployment should be relevant, of high quality, and tailored to the task
at hand. Relevancy should be
established based on research-backed demonstration of the causal
influence of the data to the specific use case
or justified more generally based on a reasonable expectation of
usefulness in the domain and/or for the
system design or ongoing development. Relevance of data should not be
established solely by appealing to
its historical connection to the outcome. High quality and tailored data
should be representative of the task at
hand and errors from data entry or other sources should be measured and
limited. Any data used as the target
of a prediction process should receive particular attention to the
quality and validity of the predicted outcome
or label to ensure the goal of the automated system is appropriately
identified and measured. Additionally,
justification should be documented for each data attribute and source to
explain why it is appropriate to use
that data to inform the results of the automated system and why such use
will not violate any applicable laws.
In cases of high-dimensional and/or derived attributes, such
justifications can be provided as overall
descriptions of the attribute generation process and appropriateness.
19
- |
TABLE OF CONTENTS
FROM PRINCIPLES TO PRACTICE: A TECHNICAL COMPANION TO THE BLUEPRINT
FOR AN AI BILL OF RIGHTS
USING THIS TECHNICAL COMPANION
SAFE AND EFFECTIVE SYSTEMS
ALGORITHMIC DISCRIMINATION PROTECTIONS
DATA PRIVACY
NOTICE AND EXPLANATION
HUMAN ALTERNATIVES, CONSIDERATION, AND FALLBACK
APPENDIX
EXAMPLES OF AUTOMATED SYSTEMS
LISTENING TO THE AMERICAN PEOPLE
ENDNOTES
12
14
15
23
30
40
46
53
53
55
63
13
- >
APPENDIX
Systems that impact the safety of communities such as automated traffic
control systems, elec
-ctrical grid controls, smart city technologies, and industrial
emissions and environmental
impact control algorithms; and
Systems related to access to benefits or services or assignment of
penalties such as systems that
support decision-makers who adjudicate benefits such as collating or
analyzing information or
matching records, systems which similarly assist in the adjudication of
administrative or criminal
penalties, fraud detection algorithms, services or benefits access
control algorithms, biometric
systems used as access control, and systems which make benefits or
services related decisions on a
fully or partially autonomous basis (such as a determination to revoke
benefits).
54
- source_sentence: >-
What legislation is referenced in the context that became effective on
October 3, 2008, regarding biometric information?
sentences:
- >2
HOW THESE PRINCIPLES CAN MOVE INTO PRACTICE
Real-life examples of how these principles can become reality, through
laws, policies, and practical
technical and sociotechnical approaches to protecting rights,
opportunities, and access.
The federal government is working to combat discrimination in mortgage
lending. The Depart
ment of Justice has launched a nationwide initiative to combat
redlining, which includes reviewing how
lenders who may be avoiding serving communities of color are conducting
targeted marketing and advertising.51
This initiative will draw upon strong partnerships across federal
agencies, including the Consumer Financial
Protection Bureau and prudential regulators. The Action Plan to Advance
Property Appraisal and Valuation
Equity includes a commitment from the agencies that oversee mortgage
lending to include a
nondiscrimination standard in the proposed rules for Automated Valuation
Models.52
The Equal Employment Opportunity Commission and the Department of
Justice have clearly
laid out how employers’ use of AI and other automated systems can result
in
discrimination against job applicants and employees with disabilities.53
The documents explain
how employers’ use of software that relies on algorithmic
decision-making may violate existing requirements
under Title I of the Americans with Disabilities Act (“ADA”). This
technical assistance also provides practical
tips to employers on how to comply with the ADA, and to job applicants
and employees who think that their
rights may have been violated.
Disparity assessments identified harms to Black patients' healthcare
access. A widely
used healthcare algorithm relied on the cost of each patient’s past
medical care to predict future medical needs,
recommending early interventions for the patients deemed most at risk.
This process discriminated
against Black patients, who generally have less access to medical care
and therefore have generated less cost
than white patients with similar illness and need. A landmark study
documented this pattern and proposed
practical ways that were shown to reduce this bias, such as focusing
specifically on active chronic health
conditions or avoidable future costs related to emergency visits and
hospitalization.54
Large employers have developed best practices to scrutinize the data and
models used
for hiring. An industry initiative has developed Algorithmic Bias
Safeguards for the Workforce, a structured
questionnaire that businesses can use proactively when procuring
software to evaluate workers. It covers
specific technical questions such as the training data used, model
training process, biases identified, and
mitigation steps employed.55
Standards organizations have developed guidelines to incorporate
accessibility criteria
into technology design processes. The most prevalent in the United
States is the Access Board’s Section
508 regulations,56 which are the technical standards for federal
information communication technology (software,
hardware, and web). Other standards include those issued by the
International Organization for
Standardization,57 and the World Wide Web Consortium Web Content
Accessibility Guidelines,58 a globally
recognized voluntary consensus standard for web content and other
information and communications
technology.
NIST has released Special Publication 1270, Towards a Standard for
Identifying and Managing Bias
in Artificial Intelligence.59 The special publication: describes the
stakes and challenges of bias in artificial
intelligence and provides examples of how and why it can chip away at
public trust; identifies three categories
of bias in AI – systemic, statistical, and human – and describes how and
where they contribute to harms; and
describes three broad challenges for mitigating bias – datasets, testing
and evaluation, and human factors – and
introduces preliminary guidance for addressing them. Throughout, the
special publication takes a socio-
technical perspective to identifying and managing AI bias.
29
Algorithmic
Discrimination
Protections
- >2
ENDNOTES
85. Mick Dumke and Frank Main. A look inside the watch list Chicago
police fought to keep secret. The
Chicago Sun Times. May 18, 2017.
https://chicago.suntimes.com/2017/5/18/18386116/a-look-inside-the-watch-list-chicago-police-fought
to-keep-secret
86. Jay Stanley. Pitfalls of Artificial Intelligence Decisionmaking
Highlighted In Idaho ACLU Case.
ACLU. Jun. 2, 2017.
https://www.aclu.org/blog/privacy-technology/pitfalls-artificial-intelligence-decisionmaking
highlighted-idaho-aclu-case
87. Illinois General Assembly. Biometric Information Privacy Act.
Effective Oct. 3, 2008.
https://www.ilga.gov/legislation/ilcs/ilcs3.asp?ActID=3004&ChapterID=57
88. Partnership on AI. ABOUT ML Reference Document. Accessed May 2,
2022.
https://partnershiponai.org/paper/about-ml-reference-document/1/
89. See, e.g., the model cards framework: Margaret Mitchell, Simone Wu,
Andrew Zaldivar, Parker
Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah
Raji, and Timnit Gebru.
Model Cards for Model Reporting. In Proceedings of the Conference on
Fairness, Accountability, and
Transparency (FAT* '19). Association for Computing Machinery, New York,
NY, USA, 220–229. https://
dl.acm.org/doi/10.1145/3287560.3287596
90. Sarah Ammermann. Adverse Action Notice Requirements Under the ECOA
and the FCRA. Consumer
Compliance Outlook. Second Quarter 2013.
https://consumercomplianceoutlook.org/2013/second-quarter/adverse-action-notice-requirements
under-ecoa-fcra/
91. Federal Trade Commission. Using Consumer Reports for Credit
Decisions: What to Know About
Adverse Action and Risk-Based Pricing Notices. Accessed May 2, 2022.
https://www.ftc.gov/business-guidance/resources/using-consumer-reports-credit-decisions-what
know-about-adverse-action-risk-based-pricing-notices#risk
92. Consumer Financial Protection Bureau. CFPB Acts to Protect the
Public from Black-Box Credit
Models Using Complex Algorithms. May 26, 2022.
https://www.consumerfinance.gov/about-us/newsroom/cfpb-acts-to-protect-the-public-from-black
box-credit-models-using-complex-algorithms/
93. Anthony Zaller. California Passes Law Regulating Quotas In
Warehouses – What Employers Need to
Know About AB 701. Zaller Law Group California Employment Law Report.
Sept. 24, 2021.
https://www.californiaemploymentlawreport.com/2021/09/california-passes-law-regulating-quotas
in-warehouses-what-employers-need-to-know-about-ab-701/
94. National Institute of Standards and Technology. AI Fundamental
Research – Explainability.
Accessed Jun. 4, 2022.
https://www.nist.gov/artificial-intelligence/ai-fundamental-research-explainability
95. DARPA. Explainable Artificial Intelligence (XAI). Accessed July 20,
2022.
https://www.darpa.mil/program/explainable-artificial-intelligence
71
- >2
ENDNOTES
12. Expectations about reporting are intended for the entity developing
or using the automated system. The
resulting reports can be provided to the public, regulators, auditors,
industry standards groups, or others
engaged in independent review, and should be made public as much as
possible consistent with law,
regulation, and policy, and noting that intellectual property or law
enforcement considerations may prevent
public release. These reporting expectations are important for
transparency, so the American people can
have confidence that their rights, opportunities, and access as well as
their expectations around
technologies are respected.
13. National Artificial Intelligence Initiative Office. Agency
Inventories of AI Use Cases. Accessed Sept. 8,
2022. https://www.ai.gov/ai-use-case-inventories/
14. National Highway Traffic Safety Administration.
https://www.nhtsa.gov/
15. See, e.g., Charles Pruitt. People Doing What They Do Best: The
Professional Engineers and NHTSA. Public
Administration Review. Vol. 39, No. 4. Jul.-Aug., 1979.
https://www.jstor.org/stable/976213?seq=1
16. The US Department of Transportation has publicly described the
health and other benefits of these
“traffic calming” measures. See, e.g.: U.S. Department of
Transportation. Traffic Calming to Slow Vehicle
Speeds. Accessed Apr. 17, 2022.
https://www.transportation.gov/mission/health/Traffic-Calming-to-Slow
Vehicle-Speeds
17. Karen Hao. Worried about your firm’s AI ethics? These startups are
here to help.
A growing ecosystem of “responsible AI” ventures promise to help
organizations monitor and fix their AI
models. MIT Technology Review. Jan 15., 2021.
https://www.technologyreview.com/2021/01/15/1016183/ai-ethics-startups/;
Disha Sinha. Top Progressive
Companies Building Ethical AI to Look Out for in 2021. Analytics
Insight. June 30, 2021. https://
www.analyticsinsight.net/top-progressive-companies-building-ethical-ai-to-look-out-for
in-2021/
https://www.technologyreview.com/2021/01/15/1016183/ai-ethics-startups/;
Disha Sinha. Top
Progressive Companies Building Ethical AI to Look Out for in 2021.
Analytics Insight. June 30, 2021.
18. Office of Management and Budget. Study to Identify Methods to Assess
Equity: Report to the President.
Aug. 2021.
https://www.whitehouse.gov/wp-content/uploads/2021/08/OMB-Report-on-E013985
Implementation_508-Compliant-Secure-v1.1.pdf
19. National Institute of Standards and Technology. AI Risk Management
Framework. Accessed May 23,
2022. https://www.nist.gov/itl/ai-risk-management-framework
20. U.S. Department of Energy. U.S. Department of Energy Establishes
Artificial Intelligence Advancement
Council. U.S. Department of Energy Artificial Intelligence and
Technology Office. April 18, 2022. https://
www.energy.gov/ai/articles/us-department-energy-establishes-artificial-intelligence-advancement-council
21. Department of Defense. U.S Department of Defense Responsible
Artificial Intelligence Strategy and
Implementation Pathway. Jun. 2022.
https://media.defense.gov/2022/Jun/22/2003022604/-1/-1/0/
Department-of-Defense-Responsible-Artificial-Intelligence-Strategy-and-Implementation
Pathway.PDF
22. Director of National Intelligence. Principles of Artificial
Intelligence Ethics for the Intelligence
Community.
https://www.dni.gov/index.php/features/2763-principles-of-artificial-intelligence-ethics-for
the-intelligence-community
64
- source_sentence: >-
How does the Blueprint for an AI Bill of Rights relate to existing laws
and regulations regarding automated systems?
sentences:
- >2
About this Document
The Blueprint for an AI Bill of Rights: Making Automated Systems Work
for the American People was
published by the White House Office of Science and Technology Policy in
October 2022. This framework was
released one year after OSTP announced the launch of a process to
develop “a bill of rights for an AI-powered
world.” Its release follows a year of public engagement to inform this
initiative. The framework is available
online at: https://www.whitehouse.gov/ostp/ai-bill-of-rights
About the Office of Science and Technology Policy
The Office of Science and Technology Policy (OSTP) was established by
the National Science and Technology
Policy, Organization, and Priorities Act of 1976 to provide the
President and others within the Executive Office
of the President with advice on the scientific, engineering, and
technological aspects of the economy, national
security, health, foreign relations, the environment, and the
technological recovery and use of resources, among
other topics. OSTP leads interagency science and technology policy
coordination efforts, assists the Office of
Management and Budget (OMB) with an annual review and analysis of
Federal research and development in
budgets, and serves as a source of scientific and technological analysis
and judgment for the President with
respect to major policies, plans, and programs of the Federal
Government.
Legal Disclaimer
The Blueprint for an AI Bill of Rights: Making Automated Systems Work
for the American People is a white paper
published by the White House Office of Science and Technology Policy. It
is intended to support the
development of policies and practices that protect civil rights and
promote democratic values in the building,
deployment, and governance of automated systems.
The Blueprint for an AI Bill of Rights is non-binding and does not
constitute U.S. government policy. It
does not supersede, modify, or direct an interpretation of any existing
statute, regulation, policy, or
international instrument. It does not constitute binding guidance for
the public or Federal agencies and
therefore does not require compliance with the principles described
herein. It also is not determinative of what
the U.S. government’s position will be in any international negotiation.
Adoption of these principles may not
meet the requirements of existing statutes, regulations, policies, or
international instruments, or the
requirements of the Federal agencies that enforce them. These principles
are not intended to, and do not,
prohibit or limit any lawful activity of a government agency, including
law enforcement, national security, or
intelligence activities.
The appropriate application of the principles set forth in this white
paper depends significantly on the
context in which automated systems are being utilized. In some
circumstances, application of these principles
in whole or in part may not be appropriate given the intended use of
automated systems to achieve government
agency missions. Future sector-specific guidance will likely be
necessary and important for guiding the use of
automated systems in certain settings such as AI systems used as part of
school building security or automated
health diagnostic systems.
The Blueprint for an AI Bill of Rights recognizes that law enforcement
activities require a balancing of
equities, for example, between the protection of sensitive law
enforcement information and the principle of
notice; as such, notice may not be appropriate, or may need to be
adjusted to protect sources, methods, and
other law enforcement equities. Even in contexts where these principles
may not apply in whole or in part,
federal departments and agencies remain subject to judicial, privacy,
and civil liberties oversight as well as
existing policies and safeguards that govern automated systems,
including, for example, Executive Order 13960,
Promoting the Use of Trustworthy Artificial Intelligence in the Federal
Government (December 2020).
This white paper recognizes that national security (which includes
certain law enforcement and
homeland security activities) and defense activities are of increased
sensitivity and interest to our nation’s
adversaries and are often subject to special requirements, such as those
governing classified information and
other protected data. Such activities require alternative, compatible
safeguards through existing policies that
govern automated systems and AI, such as the Department of Defense (DOD)
AI Ethical Principles and
Responsible AI Implementation Pathway and the Intelligence Community
(IC) AI Ethics Principles and
Framework. The implementation of these policies to national security and
defense activities can be informed by
the Blueprint for an AI Bill of Rights where feasible.
The Blueprint for an AI Bill of Rights is not intended to, and does not,
create any legal right, benefit, or
defense, substantive or procedural, enforceable at law or in equity by
any party against the United States, its
departments, agencies, or entities, its officers, employees, or agents,
or any other person, nor does it constitute a
waiver of sovereign immunity.
Copyright Information
This document is a work of the United States Government and is in the
public domain (see 17 U.S.C. §105).
2
- >2
ENDNOTES
12. Expectations about reporting are intended for the entity developing
or using the automated system. The
resulting reports can be provided to the public, regulators, auditors,
industry standards groups, or others
engaged in independent review, and should be made public as much as
possible consistent with law,
regulation, and policy, and noting that intellectual property or law
enforcement considerations may prevent
public release. These reporting expectations are important for
transparency, so the American people can
have confidence that their rights, opportunities, and access as well as
their expectations around
technologies are respected.
13. National Artificial Intelligence Initiative Office. Agency
Inventories of AI Use Cases. Accessed Sept. 8,
2022. https://www.ai.gov/ai-use-case-inventories/
14. National Highway Traffic Safety Administration.
https://www.nhtsa.gov/
15. See, e.g., Charles Pruitt. People Doing What They Do Best: The
Professional Engineers and NHTSA. Public
Administration Review. Vol. 39, No. 4. Jul.-Aug., 1979.
https://www.jstor.org/stable/976213?seq=1
16. The US Department of Transportation has publicly described the
health and other benefits of these
“traffic calming” measures. See, e.g.: U.S. Department of
Transportation. Traffic Calming to Slow Vehicle
Speeds. Accessed Apr. 17, 2022.
https://www.transportation.gov/mission/health/Traffic-Calming-to-Slow
Vehicle-Speeds
17. Karen Hao. Worried about your firm’s AI ethics? These startups are
here to help.
A growing ecosystem of “responsible AI” ventures promise to help
organizations monitor and fix their AI
models. MIT Technology Review. Jan 15., 2021.
https://www.technologyreview.com/2021/01/15/1016183/ai-ethics-startups/;
Disha Sinha. Top Progressive
Companies Building Ethical AI to Look Out for in 2021. Analytics
Insight. June 30, 2021. https://
www.analyticsinsight.net/top-progressive-companies-building-ethical-ai-to-look-out-for
in-2021/
https://www.technologyreview.com/2021/01/15/1016183/ai-ethics-startups/;
Disha Sinha. Top
Progressive Companies Building Ethical AI to Look Out for in 2021.
Analytics Insight. June 30, 2021.
18. Office of Management and Budget. Study to Identify Methods to Assess
Equity: Report to the President.
Aug. 2021.
https://www.whitehouse.gov/wp-content/uploads/2021/08/OMB-Report-on-E013985
Implementation_508-Compliant-Secure-v1.1.pdf
19. National Institute of Standards and Technology. AI Risk Management
Framework. Accessed May 23,
2022. https://www.nist.gov/itl/ai-risk-management-framework
20. U.S. Department of Energy. U.S. Department of Energy Establishes
Artificial Intelligence Advancement
Council. U.S. Department of Energy Artificial Intelligence and
Technology Office. April 18, 2022. https://
www.energy.gov/ai/articles/us-department-energy-establishes-artificial-intelligence-advancement-council
21. Department of Defense. U.S Department of Defense Responsible
Artificial Intelligence Strategy and
Implementation Pathway. Jun. 2022.
https://media.defense.gov/2022/Jun/22/2003022604/-1/-1/0/
Department-of-Defense-Responsible-Artificial-Intelligence-Strategy-and-Implementation
Pathway.PDF
22. Director of National Intelligence. Principles of Artificial
Intelligence Ethics for the Intelligence
Community.
https://www.dni.gov/index.php/features/2763-principles-of-artificial-intelligence-ethics-for
the-intelligence-community
64
- >2
12
CSAM. Even when trained on “clean” data, increasingly capable GAI models
can synthesize or produce
synthetic NCII and CSAM. Websites, mobile apps, and custom-built models
that generate synthetic NCII
have moved from niche internet forums to mainstream, automated, and
scaled online businesses.
Trustworthy AI Characteristics: Fair with Harmful Bias Managed, Safe,
Privacy Enhanced
2.12.
Value Chain and Component Integration
GAI value chains involve many third-party components such as procured
datasets, pre-trained models,
and software libraries. These components might be improperly obtained or
not properly vetted, leading
to diminished transparency or accountability for downstream users. While
this is a risk for traditional AI
systems and some other digital technologies, the risk is exacerbated for
GAI due to the scale of the
training data, which may be too large for humans to vet; the difficulty of
training foundation models,
which leads to extensive reuse of limited numbers of models; and the
extent to which GAI may be
integrated into other devices and services. As GAI systems often involve
many distinct third-party
components and data sources, it may be difficult to attribute issues in a
system’s behavior to any one of
these sources.
Errors in third-party GAI components can also have downstream impacts on
accuracy and robustness.
For example, test datasets commonly used to benchmark or validate models
can contain label errors.
Inaccuracies in these labels can impact the “stability” or robustness of
these benchmarks, which many
GAI practitioners consider during the model selection process.
Trustworthy AI Characteristics: Accountable and Transparent, Explainable
and Interpretable, Fair with
Harmful Bias Managed, Privacy Enhanced, Safe, Secure and Resilient,
Valid and Reliable
3.
Suggested Actions to Manage GAI Risks
The following suggested actions target risks unique to or exacerbated by
GAI.
In addition to the suggested actions below, AI risk management
activities and actions set forth in the AI
RMF 1.0 and Playbook are already applicable for managing GAI risks.
Organizations are encouraged to
apply the activities suggested in the AI RMF and its Playbook when
managing the risk of GAI systems.
Implementation of the suggested actions will vary depending on the type
of risk, characteristics of GAI
systems, stage of the GAI lifecycle, and relevant AI actors involved.
Suggested actions to manage GAI risks can be found in the tables below:
•
The suggested actions are organized by relevant AI RMF subcategories to
streamline these
activities alongside implementation of the AI RMF.
•
Not every subcategory of the AI RMF is included in this document.13
Suggested actions are
listed for only some subcategories.
13 As this document was focused on the GAI PWG efforts and primary
considerations (see Appendix A), AI RMF
subcategories not addressed here may be added later.
- source_sentence: >-
What proactive steps should be taken during the design phase of automated
systems to assess equity and prevent algorithmic discrimination?
sentences:
- >2
WHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS
The expectations for automated systems are meant to serve as a blueprint
for the development of additional
technical standards and practices that are tailored for particular
sectors and contexts.
Any automated system should be tested to help ensure it is free from
algorithmic discrimination before it can be
sold or used. Protection against algorithmic discrimination should
include designing to ensure equity, broadly
construed. Some algorithmic discrimination is already prohibited under
existing anti-discrimination law. The
expectations set out below describe proactive technical and policy steps
that can be taken to not only
reinforce those legal protections but extend beyond them to ensure
equity for underserved communities48
even in circumstances where a specific legal protection may not be
clearly established. These protections
should be instituted throughout the design, development, and deployment
process and are described below
roughly in the order in which they would be instituted.
Protect the public from algorithmic discrimination in a proactive and
ongoing manner
Proactive assessment of equity in design. Those responsible for the
development, use, or oversight of
automated systems should conduct proactive equity assessments in the
design phase of the technology
research and development or during its acquisition to review potential
input data, associated historical
context, accessibility for people with disabilities, and societal goals
to identify potential discrimination and
effects on equity resulting from the introduction of the technology. The
assessed groups should be as inclusive
as possible of the underserved communities mentioned in the equity
definition: Black, Latino, and Indigenous
and Native American persons, Asian Americans and Pacific Islanders and
other persons of color; members of
religious minorities; women, girls, and non-binary people; lesbian, gay,
bisexual, transgender, queer, and inter-
sex (LGBTQI+) persons; older adults; persons with disabilities; persons
who live in rural areas; and persons
otherwise adversely affected by persistent poverty or inequality.
Assessment could include both qualitative
and quantitative evaluations of the system. This equity assessment
should also be considered a core part of the
goals of the consultation conducted as part of the safety and efficacy
review.
Representative and robust data. Any data used as part of system
development or assessment should be
representative of local communities based on the planned deployment
setting and should be reviewed for bias
based on the historical and societal context of the data. Such data
should be sufficiently robust to identify and
help to mitigate biases and potential harms.
Guarding against proxies. Directly using demographic information in the
design, development, or
deployment of an automated system (for purposes other than evaluating a
system for discrimination or using
a system to counter discrimination) runs a high risk of leading to
algorithmic discrimination and should be
avoided. In many cases, attributes that are highly correlated with
demographic features, known as proxies, can
contribute to algorithmic discrimination. In cases where use of the
demographic features themselves would
lead to illegal algorithmic discrimination, reliance on such proxies in
decision-making (such as that facilitated
by an algorithm) may also be prohibited by law. Proactive testing should
be performed to identify proxies by
testing for correlation between demographic information and attributes
in any data used as part of system
design, development, or use. If a proxy is identified, designers,
developers, and deployers should remove the
proxy; if needed, it may be possible to identify alternative attributes
that can be used instead. At a minimum,
organizations should ensure a proxy feature is not given undue weight
and should monitor the system closely
for any resulting algorithmic discrimination.
26
Algorithmic
Discrimination
Protections
- >2
HUMAN ALTERNATIVES,
CONSIDERATION, AND
FALLBACK
WHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS
The expectations for automated systems are meant to serve as a blueprint
for the development of additional
technical standards and practices that are tailored for particular
sectors and contexts.
Equitable. Consideration should be given to ensuring outcomes of the
fallback and escalation system are
equitable when compared to those of the automated system and such that
the fallback and escalation
system provides equitable access to underserved communities.105
Timely. Human consideration and fallback are only useful if they are
conducted and concluded in a
timely manner. The determination of what is timely should be made
relative to the specific automated
system, and the review system should be staffed and regularly assessed
to ensure it is providing timely
consideration and fallback. In time-critical systems, this mechanism
should be immediately available or,
where possible, available before the harm occurs. Time-critical systems
include, but are not limited to,
voting-related systems, automated building access and other access
systems, systems that form a critical
component of healthcare, and systems that have the ability to withhold
wages or otherwise cause
immediate financial penalties.
Effective. The organizational structure surrounding processes for
consideration and fallback should
be designed so that if the human decision-maker charged with reassessing
a decision determines that it
should be overruled, the new decision will be effectively enacted. This
includes ensuring that the new
decision is entered into the automated system throughout its components,
any previous repercussions from
the old decision are also overturned, and safeguards are put in place to
help ensure that future decisions do
not result in the same errors.
Maintained. The human consideration and fallback process and any
associated automated processes
should be maintained and supported as long as the relevant automated
system continues to be in use.
Institute training, assessment, and oversight to combat automation bias
and ensure any
human-based components of a system are effective.
Training and assessment. Anyone administering, interacting with, or
interpreting the outputs of an auto
mated system should receive training in that system, including how to
properly interpret outputs of a system
in light of its intended purpose and in how to mitigate the effects of
automation bias. The training should reoc
cur regularly to ensure it is up to date with the system and to ensure
the system is used appropriately. Assess
ment should be ongoing to ensure that the use of the system with human
involvement provides for appropri
ate results, i.e., that the involvement of people does not invalidate
the system's assessment as safe and effective
or lead to algorithmic discrimination.
Oversight. Human-based systems have the potential for bias, including
automation bias, as well as other
concerns that may limit their effectiveness. The results of assessments
of the efficacy and potential bias of
such human-based systems should be overseen by governance structures
that have the potential to update the
operation of the human-based system in order to mitigate these effects.
50
- >2
Applying The Blueprint for an AI Bill of Rights
SENSITIVE DATA: Data and metadata are sensitive if they pertain to an
individual in a sensitive domain
(defined below); are generated by technologies used in a sensitive
domain; can be used to infer data from a
sensitive domain or sensitive data about an individual (such as
disability-related data, genomic data, biometric
data, behavioral data, geolocation data, data related to interaction
with the criminal justice system, relationship
history and legal status such as custody and divorce information, and
home, work, or school environmental
data); or have the reasonable potential to be used in ways that are
likely to expose individuals to meaningful
harm, such as a loss of privacy or financial harm due to identity theft.
Data and metadata generated by or about
those who are not yet legal adults is also sensitive, even if not
related to a sensitive domain. Such data includes,
but is not limited to, numerical, text, image, audio, or video data.
SENSITIVE DOMAINS: “Sensitive domains” are those in which activities
being conducted can cause material
harms, including significant adverse effects on human rights such as
autonomy and dignity, as well as civil liber
ties and civil rights. Domains that have historically been singled out
as deserving of enhanced data protections
or where such enhanced protections are reasonably expected by the public
include, but are not limited to,
health, family planning and care, employment, education, criminal
justice, and personal finance. In the context
of this framework, such domains are considered sensitive whether or not
the specifics of a system context
would necessitate coverage under existing law, and domains and data that
are considered sensitive are under
stood to change over time based on societal norms and context.
SURVEILLANCE TECHNOLOGY: “Surveillance technology” refers to products or
services marketed for
or that can be lawfully used to detect, monitor, intercept, collect,
exploit, preserve, protect, transmit, and/or
retain data, identifying information, or communications concerning
individuals or groups. This framework
limits its focus to both government and commercial use of surveillance
technologies when juxtaposed with
real-time or subsequent automated analysis and when such systems have a
potential for meaningful impact
on individuals’ or communities’ rights, opportunities, or access.
UNDERSERVED COMMUNITIES: The term “underserved communities” refers to
communities that have
been systematically denied a full opportunity to participate in aspects
of economic, social, and civic life, as
exemplified by the list in the preceding definition of “equity.”
11
model-index:
- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-m
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: Unknown
type: unknown
metrics:
- type: cosine_accuracy@1
value: 0.7
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.9
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.9666666666666667
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.7
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.3
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.19333333333333338
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.10000000000000003
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.7
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.9
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.9666666666666667
name: Cosine Recall@5
- type: cosine_recall@10
value: 1
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.8478532019852957
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.7983333333333333
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.7983333333333333
name: Cosine Map@100
- type: dot_accuracy@1
value: 0.7
name: Dot Accuracy@1
- type: dot_accuracy@3
value: 0.9
name: Dot Accuracy@3
- type: dot_accuracy@5
value: 0.9666666666666667
name: Dot Accuracy@5
- type: dot_accuracy@10
value: 1
name: Dot Accuracy@10
- type: dot_precision@1
value: 0.7
name: Dot Precision@1
- type: dot_precision@3
value: 0.3
name: Dot Precision@3
- type: dot_precision@5
value: 0.19333333333333338
name: Dot Precision@5
- type: dot_precision@10
value: 0.10000000000000003
name: Dot Precision@10
- type: dot_recall@1
value: 0.7
name: Dot Recall@1
- type: dot_recall@3
value: 0.9
name: Dot Recall@3
- type: dot_recall@5
value: 0.9666666666666667
name: Dot Recall@5
- type: dot_recall@10
value: 1
name: Dot Recall@10
- type: dot_ndcg@10
value: 0.8478532019852957
name: Dot Ndcg@10
- type: dot_mrr@10
value: 0.7983333333333333
name: Dot Mrr@10
- type: dot_map@100
value: 0.7983333333333333
name: Dot Map@100
SentenceTransformer based on Snowflake/snowflake-arctic-embed-m
This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Snowflake/snowflake-arctic-embed-m
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("rgtlai/ai-policy-ft")
# Run inference
sentences = [
'What proactive steps should be taken during the design phase of automated systems to assess equity and prevent algorithmic discrimination?',
' \n \n \n \n \n \n \nWHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS\nThe expectations for automated systems are meant to serve as a blueprint for the development of additional \ntechnical standards and practices that are tailored for particular sectors and contexts. \nAny automated system should be tested to help ensure it is free from algorithmic discrimination before it can be \nsold or used. Protection against algorithmic discrimination should include designing to ensure equity, broadly \nconstrued. Some algorithmic discrimination is already prohibited under existing anti-discrimination law. The \nexpectations set out below describe proactive technical and policy steps that can be taken to not only \nreinforce those legal protections but extend beyond them to ensure equity for underserved communities48 \neven in circumstances where a specific legal protection may not be clearly established. These protections \nshould be instituted throughout the design, development, and deployment process and are described below \nroughly in the order in which they would be instituted. \nProtect the public from algorithmic discrimination in a proactive and ongoing manner \nProactive assessment of equity in design. Those responsible for the development, use, or oversight of \nautomated systems should conduct proactive equity assessments in the design phase of the technology \nresearch and development or during its acquisition to review potential input data, associated historical \ncontext, accessibility for people with disabilities, and societal goals to identify potential discrimination and \neffects on equity resulting from the introduction of the technology. The assessed groups should be as inclusive \nas possible of the underserved communities mentioned in the equity definition: Black, Latino, and Indigenous \nand Native American persons, Asian Americans and Pacific Islanders and other persons of color; members of \nreligious minorities; women, girls, and non-binary people; lesbian, gay, bisexual, transgender, queer, and inter-\nsex (LGBTQI+) persons; older adults; persons with disabilities; persons who live in rural areas; and persons \notherwise adversely affected by persistent poverty or inequality. Assessment could include both qualitative \nand quantitative evaluations of the system. This equity assessment should also be considered a core part of the \ngoals of the consultation conducted as part of the safety and efficacy review. \nRepresentative and robust data. Any data used as part of system development or assessment should be \nrepresentative of local communities based on the planned deployment setting and should be reviewed for bias \nbased on the historical and societal context of the data. Such data should be sufficiently robust to identify and \nhelp to mitigate biases and potential harms. \nGuarding against proxies. Directly using demographic information in the design, development, or \ndeployment of an automated system (for purposes other than evaluating a system for discrimination or using \na system to counter discrimination) runs a high risk of leading to algorithmic discrimination and should be \navoided. In many cases, attributes that are highly correlated with demographic features, known as proxies, can \ncontribute to algorithmic discrimination. In cases where use of the demographic features themselves would \nlead to illegal algorithmic discrimination, reliance on such proxies in decision-making (such as that facilitated \nby an algorithm) may also be prohibited by law. Proactive testing should be performed to identify proxies by \ntesting for correlation between demographic information and attributes in any data used as part of system \ndesign, development, or use. If a proxy is identified, designers, developers, and deployers should remove the \nproxy; if needed, it may be possible to identify alternative attributes that can be used instead. At a minimum, \norganizations should ensure a proxy feature is not given undue weight and should monitor the system closely \nfor any resulting algorithmic discrimination. \n26\nAlgorithmic \nDiscrimination \nProtections \n',
' \n \n \nApplying The Blueprint for an AI Bill of Rights \nSENSITIVE DATA: Data and metadata are sensitive if they pertain to an individual in a sensitive domain \n(defined below); are generated by technologies used in a sensitive domain; can be used to infer data from a \nsensitive domain or sensitive data about an individual (such as disability-related data, genomic data, biometric \ndata, behavioral data, geolocation data, data related to interaction with the criminal justice system, relationship \nhistory and legal status such as custody and divorce information, and home, work, or school environmental \ndata); or have the reasonable potential to be used in ways that are likely to expose individuals to meaningful \nharm, such as a loss of privacy or financial harm due to identity theft. Data and metadata generated by or about \nthose who are not yet legal adults is also sensitive, even if not related to a sensitive domain. Such data includes, \nbut is not limited to, numerical, text, image, audio, or video data. \nSENSITIVE DOMAINS: “Sensitive domains” are those in which activities being conducted can cause material \nharms, including significant adverse effects on human rights such as autonomy and dignity, as well as civil liber\xad\nties and civil rights. Domains that have historically been singled out as deserving of enhanced data protections \nor where such enhanced protections are reasonably expected by the public include, but are not limited to, \nhealth, family planning and care, employment, education, criminal justice, and personal finance. In the context \nof this framework, such domains are considered sensitive whether or not the specifics of a system context \nwould necessitate coverage under existing law, and domains and data that are considered sensitive are under\xad\nstood to change over time based on societal norms and context. \nSURVEILLANCE TECHNOLOGY: “Surveillance technology” refers to products or services marketed for \nor that can be lawfully used to detect, monitor, intercept, collect, exploit, preserve, protect, transmit, and/or \nretain data, identifying information, or communications concerning individuals or groups. This framework \nlimits its focus to both government and commercial use of surveillance technologies when juxtaposed with \nreal-time or subsequent automated analysis and when such systems have a potential for meaningful impact \non individuals’ or communities’ rights, opportunities, or access. \nUNDERSERVED COMMUNITIES: The term “underserved communities” refers to communities that have \nbeen systematically denied a full opportunity to participate in aspects of economic, social, and civic life, as \nexemplified by the list in the preceding definition of “equity.” \n11\n',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.7 |
cosine_accuracy@3 | 0.9 |
cosine_accuracy@5 | 0.9667 |
cosine_accuracy@10 | 1.0 |
cosine_precision@1 | 0.7 |
cosine_precision@3 | 0.3 |
cosine_precision@5 | 0.1933 |
cosine_precision@10 | 0.1 |
cosine_recall@1 | 0.7 |
cosine_recall@3 | 0.9 |
cosine_recall@5 | 0.9667 |
cosine_recall@10 | 1.0 |
cosine_ndcg@10 | 0.8479 |
cosine_mrr@10 | 0.7983 |
cosine_map@100 | 0.7983 |
dot_accuracy@1 | 0.7 |
dot_accuracy@3 | 0.9 |
dot_accuracy@5 | 0.9667 |
dot_accuracy@10 | 1.0 |
dot_precision@1 | 0.7 |
dot_precision@3 | 0.3 |
dot_precision@5 | 0.1933 |
dot_precision@10 | 0.1 |
dot_recall@1 | 0.7 |
dot_recall@3 | 0.9 |
dot_recall@5 | 0.9667 |
dot_recall@10 | 1.0 |
dot_ndcg@10 | 0.8479 |
dot_mrr@10 | 0.7983 |
dot_map@100 | 0.7983 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 200 training samples
- Columns:
sentence_0
andsentence_1
- Approximate statistics based on the first 200 samples:
sentence_0 sentence_1 type string string details - min: 12 tokens
- mean: 22.34 tokens
- max: 38 tokens
- min: 21 tokens
- mean: 447.96 tokens
- max: 512 tokens
- Samples:
sentence_0 sentence_1 What is the purpose of the AI Bill of Rights mentioned in the context?
BLUEPRINT FOR AN
AI BILL OF
RIGHTS
MAKING AUTOMATED
SYSTEMS WORK FOR
THE AMERICAN PEOPLE
OCTOBER 2022When was the Blueprint for an AI Bill of Rights published?
BLUEPRINT FOR AN
AI BILL OF
RIGHTS
MAKING AUTOMATED
SYSTEMS WORK FOR
THE AMERICAN PEOPLE
OCTOBER 2022What is the purpose of the Blueprint for an AI Bill of Rights as published by the White House Office of Science and Technology Policy?
About this Document
The Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People was
published by the White House Office of Science and Technology Policy in October 2022. This framework was
released one year after OSTP announced the launch of a process to develop “a bill of rights for an AI-powered
world.” Its release follows a year of public engagement to inform this initiative. The framework is available
online at: https://www.whitehouse.gov/ostp/ai-bill-of-rights
About the Office of Science and Technology Policy
The Office of Science and Technology Policy (OSTP) was established by the National Science and Technology
Policy, Organization, and Priorities Act of 1976 to provide the President and others within the Executive Office
of the President with advice on the scientific, engineering, and technological aspects of the economy, national
security, health, foreign relations, the environment, and the technological recovery and use of resources, among
other topics. OSTP leads interagency science and technology policy coordination efforts, assists the Office of
Management and Budget (OMB) with an annual review and analysis of Federal research and development in
budgets, and serves as a source of scientific and technological analysis and judgment for the President with
respect to major policies, plans, and programs of the Federal Government.
Legal Disclaimer
The Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People is a white paper
published by the White House Office of Science and Technology Policy. It is intended to support the
development of policies and practices that protect civil rights and promote democratic values in the building,
deployment, and governance of automated systems.
The Blueprint for an AI Bill of Rights is non-binding and does not constitute U.S. government policy. It
does not supersede, modify, or direct an interpretation of any existing statute, regulation, policy, or
international instrument. It does not constitute binding guidance for the public or Federal agencies and
therefore does not require compliance with the principles described herein. It also is not determinative of what
the U.S. government’s position will be in any international negotiation. Adoption of these principles may not
meet the requirements of existing statutes, regulations, policies, or international instruments, or the
requirements of the Federal agencies that enforce them. These principles are not intended to, and do not,
prohibit or limit any lawful activity of a government agency, including law enforcement, national security, or
intelligence activities.
The appropriate application of the principles set forth in this white paper depends significantly on the
context in which automated systems are being utilized. In some circumstances, application of these principles
in whole or in part may not be appropriate given the intended use of automated systems to achieve government
agency missions. Future sector-specific guidance will likely be necessary and important for guiding the use of
automated systems in certain settings such as AI systems used as part of school building security or automated
health diagnostic systems.
The Blueprint for an AI Bill of Rights recognizes that law enforcement activities require a balancing of
equities, for example, between the protection of sensitive law enforcement information and the principle of
notice; as such, notice may not be appropriate, or may need to be adjusted to protect sources, methods, and
other law enforcement equities. Even in contexts where these principles may not apply in whole or in part,
federal departments and agencies remain subject to judicial, privacy, and civil liberties oversight as well as
existing policies and safeguards that govern automated systems, including, for example, Executive Order 13960,
Promoting the Use of Trustworthy Artificial Intelligence in the Federal Government (December 2020).
This white paper recognizes that national security (which includes certain law enforcement and
homeland security activities) and defense activities are of increased sensitivity and interest to our nation’s
adversaries and are often subject to special requirements, such as those governing classified information and
other protected data. Such activities require alternative, compatible safeguards through existing policies that
govern automated systems and AI, such as the Department of Defense (DOD) AI Ethical Principles and
Responsible AI Implementation Pathway and the Intelligence Community (IC) AI Ethics Principles and
Framework. The implementation of these policies to national security and defense activities can be informed by
the Blueprint for an AI Bill of Rights where feasible.
The Blueprint for an AI Bill of Rights is not intended to, and does not, create any legal right, benefit, or
defense, substantive or procedural, enforceable at law or in equity by any party against the United States, its
departments, agencies, or entities, its officers, employees, or agents, or any other person, nor does it constitute a
waiver of sovereign immunity.
Copyright Information
This document is a work of the United States Government and is in the public domain (see 17 U.S.C. §105).
2 - Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 16per_device_eval_batch_size
: 16num_train_epochs
: 5multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 5max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseeval_use_gather_object
: Falsebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | cosine_map@100 |
---|---|---|
1.0 | 13 | 0.7303 |
2.0 | 26 | 0.7356 |
3.0 | 39 | 0.7828 |
3.8462 | 50 | 0.7817 |
4.0 | 52 | 0.7817 |
5.0 | 65 | 0.7983 |
Framework Versions
- Python: 3.11.10
- Sentence Transformers: 3.1.1
- Transformers: 4.44.2
- PyTorch: 2.4.1
- Accelerate: 0.34.2
- Datasets: 3.0.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}