metadata

base_model: Snowflake/snowflake-arctic-embed-m
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
  - dot_accuracy@1
  - dot_accuracy@3
  - dot_accuracy@5
  - dot_accuracy@10
  - dot_precision@1
  - dot_precision@3
  - dot_precision@5
  - dot_precision@10
  - dot_recall@1
  - dot_recall@3
  - dot_recall@5
  - dot_recall@10
  - dot_ndcg@10
  - dot_mrr@10
  - dot_map@100
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:200
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: >-
      What measures should be taken to ensure that automated systems are safe
      and effective before deployment?
    sentences:
      - >2
         AI BILL OF RIGHTS
        FFECTIVE SYSTEMS

        ineffective systems. Automated systems should be 

        communities, stakeholders, and domain experts to identify 

        Systems should undergo pre-deployment testing, risk 

        that demonstrate they are safe and effective based on 

        including those beyond the intended use, and adherence to 

        protective measures should include the possibility of not 

        Automated systems should not be designed with an intent 

        reasonably foreseeable possibility of endangering your safety or the
        safety of your community. They should 

        stemming from unintended, yet foreseeable, uses or 
         
         
         
         
          
         
         
        SECTION TITLE

        BLUEPRINT FOR AN

        SAFE AND E 

        You should be protected from unsafe or 

        developed with consultation from diverse 

        concerns, risks, and potential impacts of the system. 

        identification and mitigation, and ongoing monitoring 

        their intended use, mitigation of unsafe outcomes 

        domain-specific standards. Outcomes of these 

        deploying the system or removing a system from use. 

        or 

        be designed to proactively protect you from harms 

        impacts of automated systems. You should be protected from inappropriate
        or irrelevant data use in the 

        design, development, and deployment of automated systems, and from the
        compounded harm of its reuse. 

        Independent evaluation and reporting that confirms that the system is
        safe and effective, including reporting of 

        steps taken to mitigate potential harms, should be performed and the
        results made public whenever possible. 

        ALGORITHMIC DISCRIMINATION PROTECTIONS

        You should not face discrimination by algorithms and systems should be
        used and designed in 

        an equitable way. Algorithmic discrimination occurs when automated
        systems contribute to unjustified 

        different treatment or impacts disfavoring people based on their race,
        color, ethnicity, sex (including 

        pregnancy, childbirth, and related medical conditions, gender identity,
        intersex status, and sexual 

        orientation), religion, age, national origin, disability, veteran
        status, genetic information, or any other 

        classification protected by law. Depending on the specific
        circumstances, such algorithmic discrimination 

        may violate legal protections. Designers, developers, and deployers of
        automated systems should take 

        proactive 

        and 

        continuous 

        measures 

        to 

        protect 

        individuals 

        and 

        communities 

        from algorithmic 

        discrimination and to use and design systems in an equitable way. This
        protection should include proactive 

        equity assessments as part of the system design, use of representative
        data and protection against proxies 

        for demographic features, ensuring accessibility for people with
        disabilities in design and development, 

        pre-deployment and ongoing disparity testing and mitigation, and clear
        organizational oversight. Independent 

        evaluation and plain language reporting in the form of an algorithmic
        impact assessment, including 

        disparity testing results and mitigation information, should be
        performed and made public whenever 

        possible to confirm these protections. 

        5
      - >
        You should be protected from abusive data practices via built-in 

        protections and you should have agency over how data about 

        you is used. You should be protected from violations of privacy through 

        design choices that ensure such protections are included by default,
        including 

        ensuring that data collection conforms to reasonable expectations and
        that 

        only data strictly necessary for the specific context is collected.
        Designers, de

        velopers, and deployers of automated systems should seek your
        permission 

        and respect your decisions regarding collection, use, access, transfer,
        and de

        letion of your data in appropriate ways and to the greatest extent
        possible; 

        where not possible, alternative privacy by design safeguards should be
        used. 

        Systems should not employ user experience and design decisions that
        obfus

        cate user choice or burden users with defaults that are privacy
        invasive. Con

        sent should only be used to justify collection of data in cases where it
        can be 

        appropriately and meaningfully given. Any consent requests should be
        brief, 

        be understandable in plain language, and give you agency over data
        collection 

        and the specific context of use; current hard-to-understand no

        tice-and-choice practices for broad uses of data should be changed.
        Enhanced 

        protections and restrictions for data and inferences related to
        sensitive do

        mains, including health, work, education, criminal justice, and finance,
        and 

        for data pertaining to youth should put you first. In sensitive domains,
        your 

        data and related inferences should only be used for necessary functions,
        and 

        you should be protected by ethical review and use prohibitions. You and
        your 

        communities should be free from unchecked surveillance; surveillance
        tech

        nologies should be subject to heightened oversight that includes at
        least 

        pre-deployment assessment of their potential harms and scope limits to
        pro

        tect privacy and civil liberties. Continuous surveillance and
        monitoring 

        should not be used in education, work, housing, or in other contexts
        where the 

        use of such surveillance technologies is likely to limit rights,
        opportunities, or 

        access. Whenever possible, you should have access to reporting that
        confirms 

        your data decisions have been respected and provides an assessment of
        the 

        potential impact of surveillance technologies on your rights,
        opportunities, or 

        access. 

        DATA PRIVACY

        30
      - >
        APPENDIX

        Lisa Feldman Barrett 

        Madeline Owens 

        Marsha Tudor 

        Microsoft Corporation 

        MITRE Corporation 

        National Association for the 

        Advancement of Colored People 

        Legal Defense and Educational 

        Fund 

        National Association of Criminal 

        Defense Lawyers 

        National Center for Missing & 

        Exploited Children 

        National Fair Housing Alliance 

        National Immigration Law Center 

        NEC Corporation of America 

        New America’s Open Technology 

        Institute 

        New York Civil Liberties Union 

        No Name Provided 

        Notre Dame Technology Ethics 

        Center 

        Office of the Ohio Public Defender 

        Onfido 

        Oosto 

        Orissa Rose 

        Palantir 

        Pangiam 

        Parity Technologies 

        Patrick A. Stewart, Jeffrey K. 

        Mullins, and Thomas J. Greitens 

        Pel Abbott 

        Philadelphia Unemployment 

        Project 

        Project On Government Oversight 

        Recording Industry Association of 

        America 

        Robert Wilkens 

        Ron Hedges 

        Science, Technology, and Public 

        Policy Program at University of 

        Michigan Ann Arbor 

        Security Industry Association 

        Sheila Dean 

        Software & Information Industry 

        Association 

        Stephanie Dinkins and the Future 

        Histories Studio at Stony Brook 

        University 

        TechNet 

        The Alliance for Media Arts and 

        Culture, MIT Open Documentary 

        Lab and Co-Creation Studio, and 

        Immerse 

        The International Brotherhood of 

        Teamsters 

        The Leadership Conference on 

        Civil and Human Rights 

        Thorn 

        U.S. Chamber of Commerce’s 

        Technology Engagement Center 

        Uber Technologies 

        University of Pittsburgh 

        Undergraduate Student 

        Collaborative 

        Upturn 

        US Technology Policy Committee 

        of the Association of Computing 

        Machinery 

        Virginia Puccio 

        Visar Berisha and Julie Liss 

        XR Association 

        XR Safety Initiative 

        • As an additional effort to reach out to stakeholders regarding the
        RFI, OSTP conducted two listening sessions

        for members of the public. The listening sessions together drew upwards
        of 300 participants. The Science and

        Technology Policy Institute produced a synopsis of both the RFI
        submissions and the feedback at the listening

        sessions.115

        61
  - source_sentence: How does the document address algorithmic discrimination protections?
    sentences:
      - >2
         
         
         
         
         
         
         
         
         
         
         
         
        SAFE AND EFFECTIVE 

        SYSTEMS 

        WHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS

        The expectations for automated systems are meant to serve as a blueprint
        for the development of additional 

        technical standards and practices that are tailored for particular
        sectors and contexts. 

        Ongoing monitoring. Automated systems should have ongoing monitoring
        procedures, including recalibra

        tion procedures, in place to ensure that their performance does not fall
        below an acceptable level over time, 

        based on changing real-world conditions or deployment contexts,
        post-deployment modification, or unexpect

        ed conditions. This ongoing monitoring should include continuous
        evaluation of performance metrics and 

        harm assessments, updates of any systems, and retraining of any machine
        learning models as necessary, as well 

        as ensuring that fallback mechanisms are in place to allow reversion to
        a previously working system. Monitor

        ing should take into account the performance of both technical system
        components (the algorithm as well as 

        any hardware components, data inputs, etc.) and human operators. It
        should include mechanisms for testing 

        the actual accuracy of any predictions or recommendations generated by a
        system, not just a human operator’s 

        determination of their accuracy. Ongoing monitoring procedures should
        include manual, human-led monitor

        ing as a check in the event there are shortcomings in automated
        monitoring systems. These monitoring proce

        dures should be in place for the lifespan of the deployed automated
        system. 

        Clear organizational oversight. Entities responsible for the development
        or use of automated systems 

        should lay out clear governance structures and procedures.  This
        includes clearly-stated governance proce

        dures before deploying the system, as well as responsibility of specific
        individuals or entities to oversee ongoing 

        assessment and mitigation. Organizational stakeholders including those
        with oversight of the business process 

        or operation being automated, as well as other organizational divisions
        that may be affected due to the use of 

        the system, should be involved in establishing governance procedures.
        Responsibility should rest high enough 

        in the organization that decisions about resources, mitigation, incident
        response, and potential rollback can be 

        made promptly, with sufficient weight given to risk mitigation
        objectives against competing concerns. Those 

        holding this responsibility should be made aware of any use cases with
        the potential for meaningful impact on 

        people’s rights, opportunities, or access as determined based on risk
        identification procedures.  In some cases, 

        it may be appropriate for an independent ethics review to be conducted
        before deployment. 

        Avoid inappropriate, low-quality, or irrelevant data use and the
        compounded harm of its 

        reuse 

        Relevant and high-quality data. Data used as part of any automated
        system’s creation, evaluation, or 

        deployment should be relevant, of high quality, and tailored to the task
        at hand. Relevancy should be 

        established based on research-backed demonstration of the causal
        influence of the data to the specific use case 

        or justified more generally based on a reasonable expectation of
        usefulness in the domain and/or for the 

        system design or ongoing development. Relevance of data should not be
        established solely by appealing to 

        its historical connection to the outcome. High quality and tailored data
        should be representative of the task at 

        hand and errors from data entry or other sources should be measured and
        limited. Any data used as the target 

        of a prediction process should receive particular attention to the
        quality and validity of the predicted outcome 

        or label to ensure the goal of the automated system is appropriately
        identified and measured. Additionally, 

        justification should be documented for each data attribute and source to
        explain why it is appropriate to use 

        that data to inform the results of the automated system and why such use
        will not violate any applicable laws. 

        In cases of high-dimensional and/or derived attributes, such
        justifications can be provided as overall 

        descriptions of the attribute generation process and appropriateness. 

        19
      - |
        TABLE OF CONTENTS
        FROM PRINCIPLES TO PRACTICE: A TECHNICAL COMPANION TO THE BLUEPRINT 
        FOR AN AI BILL OF RIGHTS 
         
        USING THIS TECHNICAL COMPANION
         
        SAFE AND EFFECTIVE SYSTEMS
         
        ALGORITHMIC DISCRIMINATION PROTECTIONS
         
        DATA PRIVACY
         
        NOTICE AND EXPLANATION
         
        HUMAN ALTERNATIVES, CONSIDERATION, AND FALLBACK
        APPENDIX
         
        EXAMPLES OF AUTOMATED SYSTEMS
         
        LISTENING TO THE AMERICAN PEOPLE
        ENDNOTES 
        12
        14
        15
        23
        30
        40
        46
        53
        53
        55
        63
        13
      - >
        APPENDIX

        Systems that impact the safety of communities such as automated traffic
        control systems, elec 

        -ctrical grid controls, smart city technologies, and industrial
        emissions and environmental

        impact control algorithms; and

        Systems related to access to benefits or services or assignment of
        penalties such as systems that

        support decision-makers who adjudicate benefits such as collating or
        analyzing information or

        matching records, systems which similarly assist in the adjudication of
        administrative or criminal

        penalties, fraud detection algorithms, services or benefits access
        control algorithms, biometric

        systems used as access control, and systems which make benefits or
        services related decisions on a

        fully or partially autonomous basis (such as a determination to revoke
        benefits).

        54
  - source_sentence: >-
      What legislation is referenced in the context that became effective on
      October 3, 2008, regarding biometric information?
    sentences:
      - >2
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
          
         
         
         
         
        HOW THESE PRINCIPLES CAN MOVE INTO PRACTICE

        Real-life examples of how these principles can become reality, through
        laws, policies, and practical 

        technical and sociotechnical approaches to protecting rights,
        opportunities, and access. 

        The federal government is working to combat discrimination in mortgage
        lending. The Depart

        ment of Justice has launched a nationwide initiative to combat
        redlining, which includes reviewing how 

        lenders who may be avoiding serving communities of color are conducting
        targeted marketing and advertising.51 

        This initiative will draw upon strong partnerships across federal
        agencies, including the Consumer Financial 

        Protection Bureau and prudential regulators. The Action Plan to Advance
        Property Appraisal and Valuation 

        Equity includes a commitment from the agencies that oversee mortgage
        lending to include a 

        nondiscrimination standard in the proposed rules for Automated Valuation
        Models.52

        The Equal Employment Opportunity Commission and the Department of
        Justice have clearly 

        laid out how employers’ use of AI and other automated systems can result
        in 

        discrimination against job applicants and employees with disabilities.53
        The documents explain 

        how employers’ use of software that relies on algorithmic
        decision-making may violate existing requirements 

        under Title I of the Americans with Disabilities Act (“ADA”). This
        technical assistance also provides practical 

        tips to employers on how to comply with the ADA, and to job applicants
        and employees who think that their 

        rights may have been violated. 

        Disparity assessments identified harms to Black patients' healthcare
        access. A widely 

        used healthcare algorithm relied on the cost of each patient’s past
        medical care to predict future medical needs, 

        recommending early interventions for the patients deemed most at risk.
        This process discriminated 

        against Black patients, who generally have less access to medical care
        and therefore have generated less cost 

        than white patients with similar illness and need. A landmark study
        documented this pattern and proposed 

        practical ways that were shown to reduce this bias, such as focusing
        specifically on active chronic health 

        conditions or avoidable future costs related to emergency visits and
        hospitalization.54 

        Large employers have developed best practices to scrutinize the data and
        models used 

        for hiring. An industry initiative has developed Algorithmic Bias
        Safeguards for the Workforce, a structured 

        questionnaire that businesses can use proactively when procuring
        software to evaluate workers. It covers 

        specific technical questions such as the training data used, model
        training process, biases identified, and 

        mitigation steps employed.55 

        Standards organizations have developed guidelines to incorporate
        accessibility criteria 

        into technology design processes. The most prevalent in the United
        States is the Access Board’s Section 

        508 regulations,56 which are the technical standards for federal
        information communication technology (software, 

        hardware, and web). Other standards include those issued by the
        International Organization for 

        Standardization,57 and the World Wide Web Consortium Web Content
        Accessibility Guidelines,58 a globally 

        recognized voluntary consensus standard for web content and other
        information and communications 

        technology. 

        NIST has released Special Publication 1270, Towards a Standard for
        Identifying and Managing Bias 

        in Artificial Intelligence.59 The special publication: describes the
        stakes and challenges of bias in artificial 

        intelligence and provides examples of how and why it can chip away at
        public trust; identifies three categories 

        of bias in AI – systemic, statistical, and human – and describes how and
        where they contribute to harms; and 

        describes three broad challenges for mitigating bias – datasets, testing
        and evaluation, and human factors – and 

        introduces preliminary guidance for addressing them. Throughout, the
        special publication takes a socio-

        technical perspective to identifying and managing AI bias. 

        29

        Algorithmic 

        Discrimination 

        Protections 
      - >2
         
         
        ENDNOTES

        85. Mick Dumke and Frank Main. A look inside the watch list Chicago
        police fought to keep secret. The

        Chicago Sun Times. May 18, 2017.

        https://chicago.suntimes.com/2017/5/18/18386116/a-look-inside-the-watch-list-chicago-police-fought

        to-keep-secret

        86. Jay Stanley. Pitfalls of Artificial Intelligence Decisionmaking
        Highlighted In Idaho ACLU Case.

        ACLU. Jun. 2, 2017.

        https://www.aclu.org/blog/privacy-technology/pitfalls-artificial-intelligence-decisionmaking

        highlighted-idaho-aclu-case

        87. Illinois General Assembly. Biometric Information Privacy Act.
        Effective Oct. 3, 2008.

        https://www.ilga.gov/legislation/ilcs/ilcs3.asp?ActID=3004&ChapterID=57

        88. Partnership on AI. ABOUT ML Reference Document. Accessed May 2,
        2022.

        https://partnershiponai.org/paper/about-ml-reference-document/1/

        89. See, e.g., the model cards framework: Margaret Mitchell, Simone Wu,
        Andrew Zaldivar, Parker

        Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah
        Raji, and Timnit Gebru.

        Model Cards for Model Reporting. In Proceedings of the Conference on
        Fairness, Accountability, and

        Transparency (FAT* '19). Association for Computing Machinery, New York,
        NY, USA, 220–229. https://

        dl.acm.org/doi/10.1145/3287560.3287596

        90. Sarah Ammermann. Adverse Action Notice Requirements Under the ECOA
        and the FCRA. Consumer

        Compliance Outlook. Second Quarter 2013.

        https://consumercomplianceoutlook.org/2013/second-quarter/adverse-action-notice-requirements

        under-ecoa-fcra/

        91. Federal Trade Commission. Using Consumer Reports for Credit
        Decisions: What to Know About

        Adverse Action and Risk-Based Pricing Notices. Accessed May 2, 2022.

        https://www.ftc.gov/business-guidance/resources/using-consumer-reports-credit-decisions-what

        know-about-adverse-action-risk-based-pricing-notices#risk

        92. Consumer Financial Protection Bureau. CFPB Acts to Protect the
        Public from Black-Box Credit

        Models Using Complex Algorithms. May 26, 2022.

        https://www.consumerfinance.gov/about-us/newsroom/cfpb-acts-to-protect-the-public-from-black

        box-credit-models-using-complex-algorithms/

        93. Anthony Zaller. California Passes Law Regulating Quotas In
        Warehouses – What Employers Need to

        Know About AB 701. Zaller Law Group California Employment Law Report.
        Sept. 24, 2021.

        https://www.californiaemploymentlawreport.com/2021/09/california-passes-law-regulating-quotas

        in-warehouses-what-employers-need-to-know-about-ab-701/

        94. National Institute of Standards and Technology. AI Fundamental
        Research – Explainability.

        Accessed Jun. 4, 2022.

        https://www.nist.gov/artificial-intelligence/ai-fundamental-research-explainability

        95. DARPA. Explainable Artificial Intelligence (XAI). Accessed July 20,
        2022.

        https://www.darpa.mil/program/explainable-artificial-intelligence

        71
      - >2
         
        ENDNOTES

        12. Expectations about reporting are intended for the entity developing
        or using the automated system. The

        resulting reports can be provided to the public, regulators, auditors,
        industry standards groups, or others

        engaged in independent review, and should be made public as much as
        possible consistent with law,

        regulation, and policy, and noting that intellectual property or law
        enforcement considerations may prevent

        public release. These reporting expectations are important for
        transparency, so the American people can

        have confidence that their rights, opportunities, and access as well as
        their expectations around

        technologies are respected.

        13. National Artificial Intelligence Initiative Office. Agency
        Inventories of AI Use Cases. Accessed Sept. 8,

        2022. https://www.ai.gov/ai-use-case-inventories/

        14. National Highway Traffic Safety Administration.
        https://www.nhtsa.gov/

        15. See, e.g., Charles Pruitt. People Doing What They Do Best: The
        Professional Engineers and NHTSA. Public

        Administration Review. Vol. 39, No. 4. Jul.-Aug., 1979.
        https://www.jstor.org/stable/976213?seq=1

        16. The US Department of Transportation has publicly described the
        health and other benefits of these

        “traffic calming” measures. See, e.g.: U.S. Department of
        Transportation. Traffic Calming to Slow Vehicle

        Speeds. Accessed Apr. 17, 2022.
        https://www.transportation.gov/mission/health/Traffic-Calming-to-Slow

        Vehicle-Speeds

        17. Karen Hao. Worried about your firm’s AI ethics? These startups are
        here to help.

        A growing ecosystem of “responsible AI” ventures promise to help
        organizations monitor and fix their AI

        models. MIT Technology Review. Jan 15., 2021.

        https://www.technologyreview.com/2021/01/15/1016183/ai-ethics-startups/;
        Disha Sinha. Top Progressive

        Companies Building Ethical AI to Look Out for in 2021. Analytics
        Insight. June 30, 2021. https://

        www.analyticsinsight.net/top-progressive-companies-building-ethical-ai-to-look-out-for

        in-2021/
        https://www.technologyreview.com/2021/01/15/1016183/ai-ethics-startups/;
        Disha Sinha. Top

        Progressive Companies Building Ethical AI to Look Out for in 2021.
        Analytics Insight. June 30, 2021.

        18. Office of Management and Budget. Study to Identify Methods to Assess
        Equity: Report to the President.

        Aug. 2021.
        https://www.whitehouse.gov/wp-content/uploads/2021/08/OMB-Report-on-E013985

        Implementation_508-Compliant-Secure-v1.1.pdf

        19. National Institute of Standards and Technology. AI Risk Management
        Framework. Accessed May 23,

        2022. https://www.nist.gov/itl/ai-risk-management-framework

        20. U.S. Department of Energy. U.S. Department of Energy Establishes
        Artificial Intelligence Advancement

        Council. U.S. Department of Energy Artificial Intelligence and
        Technology Office. April 18, 2022. https://

        www.energy.gov/ai/articles/us-department-energy-establishes-artificial-intelligence-advancement-council

        21. Department of Defense. U.S Department of Defense Responsible
        Artificial Intelligence Strategy and

        Implementation Pathway. Jun. 2022.
        https://media.defense.gov/2022/Jun/22/2003022604/-1/-1/0/

        Department-of-Defense-Responsible-Artificial-Intelligence-Strategy-and-Implementation

        Pathway.PDF

        22. Director of National Intelligence. Principles of Artificial
        Intelligence Ethics for the Intelligence

        Community.
        https://www.dni.gov/index.php/features/2763-principles-of-artificial-intelligence-ethics-for

        the-intelligence-community

        64
  - source_sentence: >-
      How does the Blueprint for an AI Bill of Rights relate to existing laws
      and regulations regarding automated systems?
    sentences:
      - >2
         
         
         
         
         
         
         
         
         
         
         
         
         
         
        About this Document 

        The Blueprint for an AI Bill of Rights: Making Automated Systems Work
        for the American People was 

        published by the White House Office of Science and Technology Policy in
        October 2022. This framework was 

        released one year after OSTP announced the launch of a process to
        develop “a bill of rights for an AI-powered 

        world.” Its release follows a year of public engagement to inform this
        initiative. The framework is available 

        online at: https://www.whitehouse.gov/ostp/ai-bill-of-rights 

        About the Office of Science and Technology Policy 

        The Office of Science and Technology Policy (OSTP) was established by
        the National Science and Technology 

        Policy, Organization, and Priorities Act of 1976 to provide the
        President and others within the Executive Office 

        of the President with advice on the scientific, engineering, and
        technological aspects of the economy, national 

        security, health, foreign relations, the environment, and the
        technological recovery and use of resources, among 

        other topics. OSTP leads interagency science and technology policy
        coordination efforts, assists the Office of 

        Management and Budget (OMB) with an annual review and analysis of
        Federal research and development in 

        budgets, and serves as a source of scientific and technological analysis
        and judgment for the President with 

        respect to major policies, plans, and programs of the Federal
        Government. 

        Legal Disclaimer 

        The Blueprint for an AI Bill of Rights: Making Automated Systems Work
        for the American People is a white paper 

        published by the White House Office of Science and Technology Policy. It
        is intended to support the 

        development of policies and practices that protect civil rights and
        promote democratic values in the building, 

        deployment, and governance of automated systems. 

        The Blueprint for an AI Bill of Rights is non-binding and does not
        constitute U.S. government policy. It 

        does not supersede, modify, or direct an interpretation of any existing
        statute, regulation, policy, or 

        international instrument. It does not constitute binding guidance for
        the public or Federal agencies and 

        therefore does not require compliance with the principles described
        herein. It also is not determinative of what 

        the U.S. government’s position will be in any international negotiation.
        Adoption of these principles may not 

        meet the requirements of existing statutes, regulations, policies, or
        international instruments, or the 

        requirements of the Federal agencies that enforce them. These principles
        are not intended to, and do not, 

        prohibit or limit any lawful activity of a government agency, including
        law enforcement, national security, or 

        intelligence activities. 

        The appropriate application of the principles set forth in this white
        paper depends significantly on the 

        context in which automated systems are being utilized. In some
        circumstances, application of these principles 

        in whole or in part may not be appropriate given the intended use of
        automated systems to achieve government 

        agency missions. Future sector-specific guidance will likely be
        necessary and important for guiding the use of 

        automated systems in certain settings such as AI systems used as part of
        school building security or automated 

        health diagnostic systems. 

        The Blueprint for an AI Bill of Rights recognizes that law enforcement
        activities require a balancing of 

        equities, for example, between the protection of sensitive law
        enforcement information and the principle of 

        notice; as such, notice may not be appropriate, or may need to be
        adjusted to protect sources, methods, and 

        other law enforcement equities. Even in contexts where these principles
        may not apply in whole or in part, 

        federal departments and agencies remain subject to judicial, privacy,
        and civil liberties oversight as well as 

        existing policies and safeguards that govern automated systems,
        including, for example, Executive Order 13960, 

        Promoting the Use of Trustworthy Artificial Intelligence in the Federal
        Government (December 2020). 

        This white paper recognizes that national security (which includes
        certain law enforcement and 

        homeland security activities) and defense activities are of increased
        sensitivity and interest to our nation’s 

        adversaries and are often subject to special requirements, such as those
        governing classified information and 

        other protected data. Such activities require alternative, compatible
        safeguards through existing policies that 

        govern automated systems and AI, such as the Department of Defense (DOD)
        AI Ethical Principles and 

        Responsible AI Implementation Pathway and the Intelligence Community
        (IC) AI Ethics Principles and 

        Framework. The implementation of these policies to national security and
        defense activities can be informed by 

        the Blueprint for an AI Bill of Rights where feasible. 

        The Blueprint for an AI Bill of Rights is not intended to, and does not,
        create any legal right, benefit, or 

        defense, substantive or procedural, enforceable at law or in equity by
        any party against the United States, its 

        departments, agencies, or entities, its officers, employees, or agents,
        or any other person, nor does it constitute a 

        waiver of sovereign immunity. 

        Copyright Information 

        This document is a work of the United States Government and is in the
        public domain (see 17 U.S.C. §105). 

        2
      - >2
         
        ENDNOTES

        12. Expectations about reporting are intended for the entity developing
        or using the automated system. The

        resulting reports can be provided to the public, regulators, auditors,
        industry standards groups, or others

        engaged in independent review, and should be made public as much as
        possible consistent with law,

        regulation, and policy, and noting that intellectual property or law
        enforcement considerations may prevent

        public release. These reporting expectations are important for
        transparency, so the American people can

        have confidence that their rights, opportunities, and access as well as
        their expectations around

        technologies are respected.

        13. National Artificial Intelligence Initiative Office. Agency
        Inventories of AI Use Cases. Accessed Sept. 8,

        2022. https://www.ai.gov/ai-use-case-inventories/

        14. National Highway Traffic Safety Administration.
        https://www.nhtsa.gov/

        15. See, e.g., Charles Pruitt. People Doing What They Do Best: The
        Professional Engineers and NHTSA. Public

        Administration Review. Vol. 39, No. 4. Jul.-Aug., 1979.
        https://www.jstor.org/stable/976213?seq=1

        16. The US Department of Transportation has publicly described the
        health and other benefits of these

        “traffic calming” measures. See, e.g.: U.S. Department of
        Transportation. Traffic Calming to Slow Vehicle

        Speeds. Accessed Apr. 17, 2022.
        https://www.transportation.gov/mission/health/Traffic-Calming-to-Slow

        Vehicle-Speeds

        17. Karen Hao. Worried about your firm’s AI ethics? These startups are
        here to help.

        A growing ecosystem of “responsible AI” ventures promise to help
        organizations monitor and fix their AI

        models. MIT Technology Review. Jan 15., 2021.

        https://www.technologyreview.com/2021/01/15/1016183/ai-ethics-startups/;
        Disha Sinha. Top Progressive

        Companies Building Ethical AI to Look Out for in 2021. Analytics
        Insight. June 30, 2021. https://

        www.analyticsinsight.net/top-progressive-companies-building-ethical-ai-to-look-out-for

        in-2021/
        https://www.technologyreview.com/2021/01/15/1016183/ai-ethics-startups/;
        Disha Sinha. Top

        Progressive Companies Building Ethical AI to Look Out for in 2021.
        Analytics Insight. June 30, 2021.

        18. Office of Management and Budget. Study to Identify Methods to Assess
        Equity: Report to the President.

        Aug. 2021.
        https://www.whitehouse.gov/wp-content/uploads/2021/08/OMB-Report-on-E013985

        Implementation_508-Compliant-Secure-v1.1.pdf

        19. National Institute of Standards and Technology. AI Risk Management
        Framework. Accessed May 23,

        2022. https://www.nist.gov/itl/ai-risk-management-framework

        20. U.S. Department of Energy. U.S. Department of Energy Establishes
        Artificial Intelligence Advancement

        Council. U.S. Department of Energy Artificial Intelligence and
        Technology Office. April 18, 2022. https://

        www.energy.gov/ai/articles/us-department-energy-establishes-artificial-intelligence-advancement-council

        21. Department of Defense. U.S Department of Defense Responsible
        Artificial Intelligence Strategy and

        Implementation Pathway. Jun. 2022.
        https://media.defense.gov/2022/Jun/22/2003022604/-1/-1/0/

        Department-of-Defense-Responsible-Artificial-Intelligence-Strategy-and-Implementation

        Pathway.PDF

        22. Director of National Intelligence. Principles of Artificial
        Intelligence Ethics for the Intelligence

        Community.
        https://www.dni.gov/index.php/features/2763-principles-of-artificial-intelligence-ethics-for

        the-intelligence-community

        64
      - >2
         
        12 

        CSAM. Even when trained on “clean” data, increasingly capable GAI models
        can synthesize or produce 

        synthetic NCII and CSAM. Websites, mobile apps, and custom-built models
        that generate synthetic NCII 

        have moved from niche internet forums to mainstream, automated, and
        scaled online businesses.  

        Trustworthy AI Characteristics: Fair with Harmful Bias Managed, Safe,
        Privacy Enhanced 

        2.12. 

        Value Chain and Component Integration 

        GAI value chains involve many third-party components such as procured
        datasets, pre-trained models, 

        and software libraries. These components might be improperly obtained or
        not properly vetted, leading 

        to diminished transparency or accountability for downstream users. While
        this is a risk for traditional AI 

        systems and some other digital technologies, the risk is exacerbated for
        GAI due to the scale of the 

        training data, which may be too large for humans to vet; the diﬃculty of
        training foundation models, 

        which leads to extensive reuse of limited numbers of models; and the
        extent to which GAI may be 

        integrated into other devices and services. As GAI systems often involve
        many distinct third-party 

        components and data sources, it may be diﬃcult to attribute issues in a
        system’s behavior to any one of 

        these sources. 

        Errors in third-party GAI components can also have downstream impacts on
        accuracy and robustness. 

        For example, test datasets commonly used to benchmark or validate models
        can contain label errors. 

        Inaccuracies in these labels can impact the “stability” or robustness of
        these benchmarks, which many 

        GAI practitioners consider during the model selection process.  

        Trustworthy AI Characteristics: Accountable and Transparent, Explainable
        and Interpretable, Fair with 

        Harmful Bias Managed, Privacy Enhanced, Safe, Secure and Resilient,
        Valid and Reliable 

        3. 

        Suggested Actions to Manage GAI Risks 

        The following suggested actions target risks unique to or exacerbated by
        GAI. 

        In addition to the suggested actions below, AI risk management
        activities and actions set forth in the AI 

        RMF 1.0 and Playbook are already applicable for managing GAI risks.
        Organizations are encouraged to 

        apply the activities suggested in the AI RMF and its Playbook when
        managing the risk of GAI systems.  

        Implementation of the suggested actions will vary depending on the type
        of risk, characteristics of GAI 

        systems, stage of the GAI lifecycle, and relevant AI actors involved.  

        Suggested actions to manage GAI risks can be found in the tables below: 

        • 

        The suggested actions are organized by relevant AI RMF subcategories to
        streamline these 

        activities alongside implementation of the AI RMF.  

        • 

        Not every subcategory of the AI RMF is included in this document.13
        Suggested actions are 

        listed for only some subcategories.  
         
         
        13 As this document was focused on the GAI PWG eﬀorts and primary
        considerations (see Appendix A), AI RMF 

        subcategories not addressed here may be added later.  
  - source_sentence: >-
      What proactive steps should be taken during the design phase of automated
      systems to assess equity and prevent algorithmic discrimination?
    sentences:
      - >2
         
         
         
         
         
         
         
        WHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS

        The expectations for automated systems are meant to serve as a blueprint
        for the development of additional 

        technical standards and practices that are tailored for particular
        sectors and contexts. 

        Any automated system should be tested to help ensure it is free from
        algorithmic discrimination before it can be 

        sold or used. Protection against algorithmic discrimination should
        include designing to ensure equity, broadly 

        construed.  Some algorithmic discrimination is already prohibited under
        existing anti-discrimination law. The 

        expectations set out below describe proactive technical and policy steps
        that can be taken to not only 

        reinforce those legal protections but extend beyond them to ensure
        equity for underserved communities48 

        even in circumstances where a specific legal protection may not be
        clearly established. These protections 

        should be instituted throughout the design, development, and deployment
        process and are described below 

        roughly in the order in which they would be instituted. 

        Protect the public from algorithmic discrimination in a proactive and
        ongoing manner 

        Proactive assessment of equity in design. Those responsible for the
        development, use, or oversight of 

        automated systems should conduct proactive equity assessments in the
        design phase of the technology 

        research and development or during its acquisition to review potential
        input data, associated historical 

        context, accessibility for people with disabilities, and societal goals
        to identify potential discrimination and 

        effects on equity resulting from the introduction of the technology. The
        assessed groups should be as inclusive 

        as possible of the underserved communities mentioned in the equity
        definition:  Black, Latino, and Indigenous 

        and Native American persons, Asian Americans and Pacific Islanders and
        other persons of color; members of 

        religious minorities; women, girls, and non-binary people; lesbian, gay,
        bisexual, transgender, queer, and inter-

        sex (LGBTQI+) persons; older adults; persons with disabilities; persons
        who live in rural areas; and persons 

        otherwise adversely affected by persistent poverty or inequality.
        Assessment could include both qualitative 

        and quantitative evaluations of the system. This equity assessment
        should also be considered a core part of the 

        goals of the consultation conducted as part of the safety and efficacy
        review. 

        Representative and robust data. Any data used as part of system
        development or assessment should be 

        representative of local communities based on the planned deployment
        setting and should be reviewed for bias 

        based on the historical and societal context of the data. Such data
        should be sufficiently robust to identify and 

        help to mitigate biases and potential harms. 

        Guarding against proxies.  Directly using demographic information in the
        design, development, or 

        deployment of an automated system (for purposes other than evaluating a
        system for discrimination or using 

        a system to counter discrimination) runs a high risk of leading to
        algorithmic discrimination and should be 

        avoided. In many cases, attributes that are highly correlated with
        demographic features, known as proxies, can 

        contribute to algorithmic discrimination. In cases where use of the
        demographic features themselves would 

        lead to illegal algorithmic discrimination, reliance on such proxies in
        decision-making (such as that facilitated 

        by an algorithm) may also be prohibited by law. Proactive testing should
        be performed to identify proxies by 

        testing for correlation between demographic information and attributes
        in any data used as part of system 

        design, development, or use. If a proxy is identified, designers,
        developers, and deployers should remove the 

        proxy; if needed, it may be possible to identify alternative attributes
        that can be used instead. At a minimum, 

        organizations should ensure a proxy feature is not given undue weight
        and should monitor the system closely 

        for any resulting algorithmic discrimination.   

        26

        Algorithmic 

        Discrimination 

        Protections 
      - >2
         
         
         
         
         
         
         
        HUMAN ALTERNATIVES, 

        CONSIDERATION, AND 

        FALLBACK 

        WHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS

        The expectations for automated systems are meant to serve as a blueprint
        for the development of additional 

        technical standards and practices that are tailored for particular
        sectors and contexts. 

        Equitable. Consideration should be given to ensuring outcomes of the
        fallback and escalation system are 

        equitable when compared to those of the automated system and such that
        the fallback and escalation 

        system provides equitable access to underserved communities.105 

        Timely. Human consideration and fallback are only useful if they are
        conducted and concluded in a 

        timely manner. The determination of what is timely should be made
        relative to the specific automated 

        system, and the review system should be staffed and regularly assessed
        to ensure it is providing timely 

        consideration and fallback. In time-critical systems, this mechanism
        should be immediately available or, 

        where possible, available before the harm occurs. Time-critical systems
        include, but are not limited to, 

        voting-related systems, automated building access and other access
        systems, systems that form a critical 

        component of healthcare, and systems that have the ability to withhold
        wages or otherwise cause 

        immediate financial penalties. 

        Effective. The organizational structure surrounding processes for
        consideration and fallback should 

        be designed so that if the human decision-maker charged with reassessing
        a decision determines that it 

        should be overruled, the new decision will be effectively enacted. This
        includes ensuring that the new 

        decision is entered into the automated system throughout its components,
        any previous repercussions from 

        the old decision are also overturned, and safeguards are put in place to
        help ensure that future decisions do 

        not result in the same errors. 

        Maintained. The human consideration and fallback process and any
        associated automated processes 

        should be maintained and supported as long as the relevant automated
        system continues to be in use. 

        Institute training, assessment, and oversight to combat automation bias
        and ensure any 

        human-based components of a system are effective. 

        Training and assessment. Anyone administering, interacting with, or
        interpreting the outputs of an auto

        mated system should receive training in that system, including how to
        properly interpret outputs of a system 

        in light of its intended purpose and in how to mitigate the effects of
        automation bias. The training should reoc

        cur regularly to ensure it is up to date with the system and to ensure
        the system is used appropriately. Assess

        ment should be ongoing to ensure that the use of the system with human
        involvement provides for appropri

        ate results, i.e., that the involvement of people does not invalidate
        the system's assessment as safe and effective 

        or lead to algorithmic discrimination. 

        Oversight. Human-based systems have the potential for bias, including
        automation bias, as well as other 

        concerns that may limit their effectiveness. The results of assessments
        of the efficacy and potential bias of 

        such human-based systems should be overseen by governance structures
        that have the potential to update the 

        operation of the human-based system in order to mitigate these effects. 

        50
      - >2
         
         
         
        Applying The Blueprint for an AI Bill of Rights 

        SENSITIVE DATA: Data and metadata are sensitive if they pertain to an
        individual in a sensitive domain 

        (defined below); are generated by technologies used in a sensitive
        domain; can be used to infer data from a 

        sensitive domain or sensitive data about an individual (such as
        disability-related data, genomic data, biometric 

        data, behavioral data, geolocation data, data related to interaction
        with the criminal justice system, relationship 

        history and legal status such as custody and divorce information, and
        home, work, or school environmental 

        data); or have the reasonable potential to be used in ways that are
        likely to expose individuals to meaningful 

        harm, such as a loss of privacy or financial harm due to identity theft.
        Data and metadata generated by or about 

        those who are not yet legal adults is also sensitive, even if not
        related to a sensitive domain. Such data includes, 

        but is not limited to, numerical, text, image, audio, or video data. 

        SENSITIVE DOMAINS: “Sensitive domains” are those in which activities
        being conducted can cause material 

        harms, including significant adverse effects on human rights such as
        autonomy and dignity, as well as civil liber

        ties and civil rights. Domains that have historically been singled out
        as deserving of enhanced data protections 

        or where such enhanced protections are reasonably expected by the public
        include, but are not limited to, 

        health, family planning and care, employment, education, criminal
        justice, and personal finance. In the context 

        of this framework, such domains are considered sensitive whether or not
        the specifics of a system context 

        would necessitate coverage under existing law, and domains and data that
        are considered sensitive are under

        stood to change over time based on societal norms and context. 

        SURVEILLANCE TECHNOLOGY: “Surveillance technology” refers to products or
        services marketed for 

        or that can be lawfully used to detect, monitor, intercept, collect,
        exploit, preserve, protect, transmit, and/or 

        retain data, identifying information, or communications concerning
        individuals or groups. This framework 

        limits its focus to both government and commercial use of surveillance
        technologies when juxtaposed with 

        real-time or subsequent automated analysis and when such systems have a
        potential for meaningful impact 

        on individuals’ or communities’ rights, opportunities, or access. 

        UNDERSERVED COMMUNITIES: The term “underserved communities” refers to
        communities that have 

        been systematically denied a full opportunity to participate in aspects
        of economic, social, and civic life, as 

        exemplified by the list in the preceding definition of “equity.” 

        11
model-index:
  - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-m
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: cosine_accuracy@1
            value: 0.7
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.9
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.9666666666666667
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 1
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.7
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.3
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.19333333333333338
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.10000000000000003
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.7
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.9
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.9666666666666667
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 1
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.8478532019852957
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7983333333333333
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7983333333333333
            name: Cosine Map@100
          - type: dot_accuracy@1
            value: 0.7
            name: Dot Accuracy@1
          - type: dot_accuracy@3
            value: 0.9
            name: Dot Accuracy@3
          - type: dot_accuracy@5
            value: 0.9666666666666667
            name: Dot Accuracy@5
          - type: dot_accuracy@10
            value: 1
            name: Dot Accuracy@10
          - type: dot_precision@1
            value: 0.7
            name: Dot Precision@1
          - type: dot_precision@3
            value: 0.3
            name: Dot Precision@3
          - type: dot_precision@5
            value: 0.19333333333333338
            name: Dot Precision@5
          - type: dot_precision@10
            value: 0.10000000000000003
            name: Dot Precision@10
          - type: dot_recall@1
            value: 0.7
            name: Dot Recall@1
          - type: dot_recall@3
            value: 0.9
            name: Dot Recall@3
          - type: dot_recall@5
            value: 0.9666666666666667
            name: Dot Recall@5
          - type: dot_recall@10
            value: 1
            name: Dot Recall@10
          - type: dot_ndcg@10
            value: 0.8478532019852957
            name: Dot Ndcg@10
          - type: dot_mrr@10
            value: 0.7983333333333333
            name: Dot Mrr@10
          - type: dot_map@100
            value: 0.7983333333333333
            name: Dot Map@100

SentenceTransformer based on Snowflake/snowflake-arctic-embed-m

This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: Snowflake/snowflake-arctic-embed-m
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("rgtlai/ai-policy-ft")
# Run inference
sentences = [
    'What proactive steps should be taken during the design phase of automated systems to assess equity and prevent algorithmic discrimination?',
    ' \n \n \n \n \n \n \nWHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS\nThe expectations for automated systems are meant to serve as a blueprint for the development of additional \ntechnical standards and practices that are tailored for particular sectors and contexts. \nAny automated system should be tested to help ensure it is free from algorithmic discrimination before it can be \nsold or used. Protection against algorithmic discrimination should include designing to ensure equity, broadly \nconstrued.  Some algorithmic discrimination is already prohibited under existing anti-discrimination law. The \nexpectations set out below describe proactive technical and policy steps that can be taken to not only \nreinforce those legal protections but extend beyond them to ensure equity for underserved communities48 \neven in circumstances where a specific legal protection may not be clearly established. These protections \nshould be instituted throughout the design, development, and deployment process and are described below \nroughly in the order in which they would be instituted. \nProtect the public from algorithmic discrimination in a proactive and ongoing manner \nProactive assessment of equity in design. Those responsible for the development, use, or oversight of \nautomated systems should conduct proactive equity assessments in the design phase of the technology \nresearch and development or during its acquisition to review potential input data, associated historical \ncontext, accessibility for people with disabilities, and societal goals to identify potential discrimination and \neffects on equity resulting from the introduction of the technology. The assessed groups should be as inclusive \nas possible of the underserved communities mentioned in the equity definition:  Black, Latino, and Indigenous \nand Native American persons, Asian Americans and Pacific Islanders and other persons of color; members of \nreligious minorities; women, girls, and non-binary people; lesbian, gay, bisexual, transgender, queer, and inter-\nsex (LGBTQI+) persons; older adults; persons with disabilities; persons who live in rural areas; and persons \notherwise adversely affected by persistent poverty or inequality. Assessment could include both qualitative \nand quantitative evaluations of the system. This equity assessment should also be considered a core part of the \ngoals of the consultation conducted as part of the safety and efficacy review. \nRepresentative and robust data. Any data used as part of system development or assessment should be \nrepresentative of local communities based on the planned deployment setting and should be reviewed for bias \nbased on the historical and societal context of the data. Such data should be sufficiently robust to identify and \nhelp to mitigate biases and potential harms. \nGuarding against proxies.  Directly using demographic information in the design, development, or \ndeployment of an automated system (for purposes other than evaluating a system for discrimination or using \na system to counter discrimination) runs a high risk of leading to algorithmic discrimination and should be \navoided. In many cases, attributes that are highly correlated with demographic features, known as proxies, can \ncontribute to algorithmic discrimination. In cases where use of the demographic features themselves would \nlead to illegal algorithmic discrimination, reliance on such proxies in decision-making (such as that facilitated \nby an algorithm) may also be prohibited by law. Proactive testing should be performed to identify proxies by \ntesting for correlation between demographic information and attributes in any data used as part of system \ndesign, development, or use. If a proxy is identified, designers, developers, and deployers should remove the \nproxy; if needed, it may be possible to identify alternative attributes that can be used instead. At a minimum, \norganizations should ensure a proxy feature is not given undue weight and should monitor the system closely \nfor any resulting algorithmic discrimination.   \n26\nAlgorithmic \nDiscrimination \nProtections \n',
    ' \n \n \nApplying The Blueprint for an AI Bill of Rights \nSENSITIVE DATA: Data and metadata are sensitive if they pertain to an individual in a sensitive domain \n(defined below); are generated by technologies used in a sensitive domain; can be used to infer data from a \nsensitive domain or sensitive data about an individual (such as disability-related data, genomic data, biometric \ndata, behavioral data, geolocation data, data related to interaction with the criminal justice system, relationship \nhistory and legal status such as custody and divorce information, and home, work, or school environmental \ndata); or have the reasonable potential to be used in ways that are likely to expose individuals to meaningful \nharm, such as a loss of privacy or financial harm due to identity theft. Data and metadata generated by or about \nthose who are not yet legal adults is also sensitive, even if not related to a sensitive domain. Such data includes, \nbut is not limited to, numerical, text, image, audio, or video data. \nSENSITIVE DOMAINS: “Sensitive domains” are those in which activities being conducted can cause material \nharms, including significant adverse effects on human rights such as autonomy and dignity, as well as civil liber\xad\nties and civil rights. Domains that have historically been singled out as deserving of enhanced data protections \nor where such enhanced protections are reasonably expected by the public include, but are not limited to, \nhealth, family planning and care, employment, education, criminal justice, and personal finance. In the context \nof this framework, such domains are considered sensitive whether or not the specifics of a system context \nwould necessitate coverage under existing law, and domains and data that are considered sensitive are under\xad\nstood to change over time based on societal norms and context. \nSURVEILLANCE TECHNOLOGY: “Surveillance technology” refers to products or services marketed for \nor that can be lawfully used to detect, monitor, intercept, collect, exploit, preserve, protect, transmit, and/or \nretain data, identifying information, or communications concerning individuals or groups. This framework \nlimits its focus to both government and commercial use of surveillance technologies when juxtaposed with \nreal-time or subsequent automated analysis and when such systems have a potential for meaningful impact \non individuals’ or communities’ rights, opportunities, or access. \nUNDERSERVED COMMUNITIES: The term “underserved communities” refers to communities that have \nbeen systematically denied a full opportunity to participate in aspects of economic, social, and civic life, as \nexemplified by the list in the preceding definition of “equity.” \n11\n',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.7
cosine_accuracy@3	0.9
cosine_accuracy@5	0.9667
cosine_accuracy@10	1.0
cosine_precision@1	0.7
cosine_precision@3	0.3
cosine_precision@5	0.1933
cosine_precision@10	0.1
cosine_recall@1	0.7
cosine_recall@3	0.9
cosine_recall@5	0.9667
cosine_recall@10	1.0
cosine_ndcg@10	0.8479
cosine_mrr@10	0.7983
cosine_map@100	0.7983
dot_accuracy@1	0.7
dot_accuracy@3	0.9
dot_accuracy@5	0.9667
dot_accuracy@10	1.0
dot_precision@1	0.7
dot_precision@3	0.3
dot_precision@5	0.1933
dot_precision@10	0.1
dot_recall@1	0.7
dot_recall@3	0.9
dot_recall@5	0.9667
dot_recall@10	1.0
dot_ndcg@10	0.8479
dot_mrr@10	0.7983
dot_map@100	0.7983

Training Details

Training Dataset

Unnamed Dataset

Size: 200 training samples
Columns: sentence_0 and sentence_1
Approximate statistics based on the first 200 samples:
sentence_0 sentence_1
type string string
details
min: 12 tokens
mean: 22.34 tokens
max: 38 tokens

min: 21 tokens
mean: 447.96 tokens
max: 512 tokens

	sentence_0	sentence_1
type	string	string
details	min: 12 tokens mean: 22.34 tokens max: 38 tokens	min: 21 tokens mean: 447.96 tokens max: 512 tokens

Samples:

sentence_0	sentence_1
`What is the purpose of the AI Bill of Rights mentioned in the context?`	`BLUEPRINT FOR AN AI BILL OF RIGHTS MAKING AUTOMATED SYSTEMS WORK FOR THE AMERICAN PEOPLE OCTOBER 2022`
`When was the Blueprint for an AI Bill of Rights published?`	`BLUEPRINT FOR AN AI BILL OF RIGHTS MAKING AUTOMATED SYSTEMS WORK FOR THE AMERICAN PEOPLE OCTOBER 2022`
`What is the purpose of the Blueprint for an AI Bill of Rights as published by the White House Office of Science and Technology Policy?`	About this Document The Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People was published by the White House Office of Science and Technology Policy in October 2022. This framework was released one year after OSTP announced the launch of a process to develop “a bill of rights for an AI-powered world.” Its release follows a year of public engagement to inform this initiative. The framework is available online at: https://www.whitehouse.gov/ostp/ai-bill-of-rights About the Office of Science and Technology Policy The Office of Science and Technology Policy (OSTP) was established by the National Science and Technology Policy, Organization, and Priorities Act of 1976 to provide the President and others within the Executive Office of the President with advice on the scientific, engineering, and technological aspects of the economy, national security, health, foreign relations, the environment, and the technological recovery and use of resources, among other topics. OSTP leads interagency science and technology policy coordination efforts, assists the Office of Management and Budget (OMB) with an annual review and analysis of Federal research and development in budgets, and serves as a source of scientific and technological analysis and judgment for the President with respect to major policies, plans, and programs of the Federal Government. Legal Disclaimer The Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People is a white paper published by the White House Office of Science and Technology Policy. It is intended to support the development of policies and practices that protect civil rights and promote democratic values in the building, deployment, and governance of automated systems. The Blueprint for an AI Bill of Rights is non-binding and does not constitute U.S. government policy. It does not supersede, modify, or direct an interpretation of any existing statute, regulation, policy, or international instrument. It does not constitute binding guidance for the public or Federal agencies and therefore does not require compliance with the principles described herein. It also is not determinative of what the U.S. government’s position will be in any international negotiation. Adoption of these principles may not meet the requirements of existing statutes, regulations, policies, or international instruments, or the requirements of the Federal agencies that enforce them. These principles are not intended to, and do not, prohibit or limit any lawful activity of a government agency, including law enforcement, national security, or intelligence activities. The appropriate application of the principles set forth in this white paper depends significantly on the context in which automated systems are being utilized. In some circumstances, application of these principles in whole or in part may not be appropriate given the intended use of automated systems to achieve government agency missions. Future sector-specific guidance will likely be necessary and important for guiding the use of automated systems in certain settings such as AI systems used as part of school building security or automated health diagnostic systems. The Blueprint for an AI Bill of Rights recognizes that law enforcement activities require a balancing of equities, for example, between the protection of sensitive law enforcement information and the principle of notice; as such, notice may not be appropriate, or may need to be adjusted to protect sources, methods, and other law enforcement equities. Even in contexts where these principles may not apply in whole or in part, federal departments and agencies remain subject to judicial, privacy, and civil liberties oversight as well as existing policies and safeguards that govern automated systems, including, for example, Executive Order 13960, Promoting the Use of Trustworthy Artificial Intelligence in the Federal Government (December 2020). This white paper recognizes that national security (which includes certain law enforcement and homeland security activities) and defense activities are of increased sensitivity and interest to our nation’s adversaries and are often subject to special requirements, such as those governing classified information and other protected data. Such activities require alternative, compatible safeguards through existing policies that govern automated systems and AI, such as the Department of Defense (DOD) AI Ethical Principles and Responsible AI Implementation Pathway and the Intelligence Community (IC) AI Ethics Principles and Framework. The implementation of these policies to national security and defense activities can be informed by the Blueprint for an AI Bill of Rights where feasible. The Blueprint for an AI Bill of Rights is not intended to, and does not, create any legal right, benefit, or defense, substantive or procedural, enforceable at law or in equity by any party against the United States, its departments, agencies, or entities, its officers, employees, or agents, or any other person, nor does it constitute a waiver of sovereign immunity. Copyright Information This document is a work of the United States Government and is in the public domain (see 17 U.S.C. §105). 2

Loss: MatryoshkaLoss with these parameters:

{
    "loss": "MultipleNegativesRankingLoss",
    "matryoshka_dims": [
        768,
        512,
        256,
        128,
        64
    ],
    "matryoshka_weights": [
        1,
        1,
        1,
        1,
        1
    ],
    "n_dims_per_step": -1
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
num_train_epochs: 5
multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1
num_train_epochs: 5
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
eval_use_gather_object: False
batch_sampler: batch_sampler
multi_dataset_batch_sampler: round_robin

Training Logs

Epoch	Step	cosine_map@100
1.0	13	0.7303
2.0	26	0.7356
3.0	39	0.7828
3.8462	50	0.7817
4.0	52	0.7817
5.0	65	0.7983

Framework Versions

Python: 3.11.10
Sentence Transformers: 3.1.1
Transformers: 4.44.2
PyTorch: 2.4.1
Accelerate: 0.34.2
Datasets: 3.0.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}