ai-policy-ft / README.md
rgtlai's picture
Add new SentenceTransformer model.
ffe92c9 verified
metadata
base_model: Snowflake/snowflake-arctic-embed-m
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
  - dot_accuracy@1
  - dot_accuracy@3
  - dot_accuracy@5
  - dot_accuracy@10
  - dot_precision@1
  - dot_precision@3
  - dot_precision@5
  - dot_precision@10
  - dot_recall@1
  - dot_recall@3
  - dot_recall@5
  - dot_recall@10
  - dot_ndcg@10
  - dot_mrr@10
  - dot_map@100
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:200
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: >-
      What measures should be taken to ensure that automated systems are safe
      and effective before deployment?
    sentences:
      - >2
         AI BILL OF RIGHTS
        FFECTIVE SYSTEMS

        ineffective systems. Automated systems should be 

        communities, stakeholders, and domain experts to identify 

        Systems should undergo pre-deployment testing, risk 

        that demonstrate they are safe and effective based on 

        including those beyond the intended use, and adherence to 

        protective measures should include the possibility of not 

        Automated systems should not be designed with an intent 

        reasonably foreseeable possibility of endangering your safety or the
        safety of your community. They should 

        stemming from unintended, yet foreseeable, uses or 
         
         
         
         
          
         
         
        SECTION TITLE

        BLUEPRINT FOR AN

        SAFE AND E 

        You should be protected from unsafe or 

        developed with consultation from diverse 

        concerns, risks, and potential impacts of the system. 

        identification and mitigation, and ongoing monitoring 

        their intended use, mitigation of unsafe outcomes 

        domain-specific standards. Outcomes of these 

        deploying the system or removing a system from use. 

        or 

        be designed to proactively protect you from harms 

        impacts of automated systems. You should be protected from inappropriate
        or irrelevant data use in the 

        design, development, and deployment of automated systems, and from the
        compounded harm of its reuse. 

        Independent evaluation and reporting that confirms that the system is
        safe and effective, including reporting of 

        steps taken to mitigate potential harms, should be performed and the
        results made public whenever possible. 

        ALGORITHMIC DISCRIMINATION PROTECTIONS

        You should not face discrimination by algorithms and systems should be
        used and designed in 

        an equitable way. Algorithmic discrimination occurs when automated
        systems contribute to unjustified 

        different treatment or impacts disfavoring people based on their race,
        color, ethnicity, sex (including 

        pregnancy, childbirth, and related medical conditions, gender identity,
        intersex status, and sexual 

        orientation), religion, age, national origin, disability, veteran
        status, genetic information, or any other 

        classification protected by law. Depending on the specific
        circumstances, such algorithmic discrimination 

        may violate legal protections. Designers, developers, and deployers of
        automated systems should take 

        proactive 

        and 

        continuous 

        measures 

        to 

        protect 

        individuals 

        and 

        communities 

        from algorithmic 

        discrimination and to use and design systems in an equitable way. This
        protection should include proactive 

        equity assessments as part of the system design, use of representative
        data and protection against proxies 

        for demographic features, ensuring accessibility for people with
        disabilities in design and development, 

        pre-deployment and ongoing disparity testing and mitigation, and clear
        organizational oversight. Independent 

        evaluation and plain language reporting in the form of an algorithmic
        impact assessment, including 

        disparity testing results and mitigation information, should be
        performed and made public whenever 

        possible to confirm these protections. 

        5
      - >
        You should be protected from abusive data practices via built-in 

        protections and you should have agency over how data about 

        you is used. You should be protected from violations of privacy through 

        design choices that ensure such protections are included by default,
        including 

        ensuring that data collection conforms to reasonable expectations and
        that 

        only data strictly necessary for the specific context is collected.
        Designers, de­

        velopers, and deployers of automated systems should seek your
        permission 

        and respect your decisions regarding collection, use, access, transfer,
        and de­

        letion of your data in appropriate ways and to the greatest extent
        possible; 

        where not possible, alternative privacy by design safeguards should be
        used. 

        Systems should not employ user experience and design decisions that
        obfus­

        cate user choice or burden users with defaults that are privacy
        invasive. Con­

        sent should only be used to justify collection of data in cases where it
        can be 

        appropriately and meaningfully given. Any consent requests should be
        brief, 

        be understandable in plain language, and give you agency over data
        collection 

        and the specific context of use; current hard-to-understand no­

        tice-and-choice practices for broad uses of data should be changed.
        Enhanced 

        protections and restrictions for data and inferences related to
        sensitive do­

        mains, including health, work, education, criminal justice, and finance,
        and 

        for data pertaining to youth should put you first. In sensitive domains,
        your 

        data and related inferences should only be used for necessary functions,
        and 

        you should be protected by ethical review and use prohibitions. You and
        your 

        communities should be free from unchecked surveillance; surveillance
        tech­

        nologies should be subject to heightened oversight that includes at
        least 

        pre-deployment assessment of their potential harms and scope limits to
        pro­

        tect privacy and civil liberties. Continuous surveillance and
        monitoring 

        should not be used in education, work, housing, or in other contexts
        where the 

        use of such surveillance technologies is likely to limit rights,
        opportunities, or 

        access. Whenever possible, you should have access to reporting that
        confirms 

        your data decisions have been respected and provides an assessment of
        the 

        potential impact of surveillance technologies on your rights,
        opportunities, or 

        access. 

        DATA PRIVACY

        30
      - >
        APPENDIX

        Lisa Feldman Barrett 

        Madeline Owens 

        Marsha Tudor 

        Microsoft Corporation 

        MITRE Corporation 

        National Association for the 

        Advancement of Colored People 

        Legal Defense and Educational 

        Fund 

        National Association of Criminal 

        Defense Lawyers 

        National Center for Missing & 

        Exploited Children 

        National Fair Housing Alliance 

        National Immigration Law Center 

        NEC Corporation of America 

        New America’s Open Technology 

        Institute 

        New York Civil Liberties Union 

        No Name Provided 

        Notre Dame Technology Ethics 

        Center 

        Office of the Ohio Public Defender 

        Onfido 

        Oosto 

        Orissa Rose 

        Palantir 

        Pangiam 

        Parity Technologies 

        Patrick A. Stewart, Jeffrey K. 

        Mullins, and Thomas J. Greitens 

        Pel Abbott 

        Philadelphia Unemployment 

        Project 

        Project On Government Oversight 

        Recording Industry Association of 

        America 

        Robert Wilkens 

        Ron Hedges 

        Science, Technology, and Public 

        Policy Program at University of 

        Michigan Ann Arbor 

        Security Industry Association 

        Sheila Dean 

        Software & Information Industry 

        Association 

        Stephanie Dinkins and the Future 

        Histories Studio at Stony Brook 

        University 

        TechNet 

        The Alliance for Media Arts and 

        Culture, MIT Open Documentary 

        Lab and Co-Creation Studio, and 

        Immerse 

        The International Brotherhood of 

        Teamsters 

        The Leadership Conference on 

        Civil and Human Rights 

        Thorn 

        U.S. Chamber of Commerce’s 

        Technology Engagement Center 

        Uber Technologies 

        University of Pittsburgh 

        Undergraduate Student 

        Collaborative 

        Upturn 

        US Technology Policy Committee 

        of the Association of Computing 

        Machinery 

        Virginia Puccio 

        Visar Berisha and Julie Liss 

        XR Association 

        XR Safety Initiative 

         As an additional effort to reach out to stakeholders regarding the
        RFI, OSTP conducted two listening sessions

        for members of the public. The listening sessions together drew upwards
        of 300 participants. The Science and

        Technology Policy Institute produced a synopsis of both the RFI
        submissions and the feedback at the listening

        sessions.115

        61
  - source_sentence: How does the document address algorithmic discrimination protections?
    sentences:
      - >2
         
         
         
         
         
         
         
         
         
         
         
         
        SAFE AND EFFECTIVE 

        SYSTEMS 

        WHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS

        The expectations for automated systems are meant to serve as a blueprint
        for the development of additional 

        technical standards and practices that are tailored for particular
        sectors and contexts. 

        Ongoing monitoring. Automated systems should have ongoing monitoring
        procedures, including recalibra­

        tion procedures, in place to ensure that their performance does not fall
        below an acceptable level over time, 

        based on changing real-world conditions or deployment contexts,
        post-deployment modification, or unexpect­

        ed conditions. This ongoing monitoring should include continuous
        evaluation of performance metrics and 

        harm assessments, updates of any systems, and retraining of any machine
        learning models as necessary, as well 

        as ensuring that fallback mechanisms are in place to allow reversion to
        a previously working system. Monitor­

        ing should take into account the performance of both technical system
        components (the algorithm as well as 

        any hardware components, data inputs, etc.) and human operators. It
        should include mechanisms for testing 

        the actual accuracy of any predictions or recommendations generated by a
        system, not just a human operator’s 

        determination of their accuracy. Ongoing monitoring procedures should
        include manual, human-led monitor­

        ing as a check in the event there are shortcomings in automated
        monitoring systems. These monitoring proce­

        dures should be in place for the lifespan of the deployed automated
        system. 

        Clear organizational oversight. Entities responsible for the development
        or use of automated systems 

        should lay out clear governance structures and procedures.  This
        includes clearly-stated governance proce­

        dures before deploying the system, as well as responsibility of specific
        individuals or entities to oversee ongoing 

        assessment and mitigation. Organizational stakeholders including those
        with oversight of the business process 

        or operation being automated, as well as other organizational divisions
        that may be affected due to the use of 

        the system, should be involved in establishing governance procedures.
        Responsibility should rest high enough 

        in the organization that decisions about resources, mitigation, incident
        response, and potential rollback can be 

        made promptly, with sufficient weight given to risk mitigation
        objectives against competing concerns. Those 

        holding this responsibility should be made aware of any use cases with
        the potential for meaningful impact on 

        people’s rights, opportunities, or access as determined based on risk
        identification procedures.  In some cases, 

        it may be appropriate for an independent ethics review to be conducted
        before deployment. 

        Avoid inappropriate, low-quality, or irrelevant data use and the
        compounded harm of its 

        reuse 

        Relevant and high-quality data. Data used as part of any automated
        system’s creation, evaluation, or 

        deployment should be relevant, of high quality, and tailored to the task
        at hand. Relevancy should be 

        established based on research-backed demonstration of the causal
        influence of the data to the specific use case 

        or justified more generally based on a reasonable expectation of
        usefulness in the domain and/or for the 

        system design or ongoing development. Relevance of data should not be
        established solely by appealing to 

        its historical connection to the outcome. High quality and tailored data
        should be representative of the task at 

        hand and errors from data entry or other sources should be measured and
        limited. Any data used as the target 

        of a prediction process should receive particular attention to the
        quality and validity of the predicted outcome 

        or label to ensure the goal of the automated system is appropriately
        identified and measured. Additionally, 

        justification should be documented for each data attribute and source to
        explain why it is appropriate to use 

        that data to inform the results of the automated system and why such use
        will not violate any applicable laws. 

        In cases of high-dimensional and/or derived attributes, such
        justifications can be provided as overall 

        descriptions of the attribute generation process and appropriateness. 

        19
      - |
        TABLE OF CONTENTS
        FROM PRINCIPLES TO PRACTICE: A TECHNICAL COMPANION TO THE BLUEPRINT 
        FOR AN AI BILL OF RIGHTS 
         
        USING THIS TECHNICAL COMPANION
         
        SAFE AND EFFECTIVE SYSTEMS
         
        ALGORITHMIC DISCRIMINATION PROTECTIONS
         
        DATA PRIVACY
         
        NOTICE AND EXPLANATION
         
        HUMAN ALTERNATIVES, CONSIDERATION, AND FALLBACK
        APPENDIX
         
        EXAMPLES OF AUTOMATED SYSTEMS
         
        LISTENING TO THE AMERICAN PEOPLE
        ENDNOTES 
        12
        14
        15
        23
        30
        40
        46
        53
        53
        55
        63
        13
      - >
        APPENDIX

        Systems that impact the safety of communities such as automated traffic
        control systems, elec 

        -ctrical grid controls, smart city technologies, and industrial
        emissions and environmental

        impact control algorithms; and

        Systems related to access to benefits or services or assignment of
        penalties such as systems that

        support decision-makers who adjudicate benefits such as collating or
        analyzing information or

        matching records, systems which similarly assist in the adjudication of
        administrative or criminal

        penalties, fraud detection algorithms, services or benefits access
        control algorithms, biometric

        systems used as access control, and systems which make benefits or
        services related decisions on a

        fully or partially autonomous basis (such as a determination to revoke
        benefits).

        54
  - source_sentence: >-
      What legislation is referenced in the context that became effective on
      October 3, 2008, regarding biometric information?
    sentences:
      - >2
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
          
         
         
         
         
        HOW THESE PRINCIPLES CAN MOVE INTO PRACTICE

        Real-life examples of how these principles can become reality, through
        laws, policies, and practical 

        technical and sociotechnical approaches to protecting rights,
        opportunities, and access. 

        The federal government is working to combat discrimination in mortgage
        lending. The Depart­

        ment of Justice has launched a nationwide initiative to combat
        redlining, which includes reviewing how 

        lenders who may be avoiding serving communities of color are conducting
        targeted marketing and advertising.51 

        This initiative will draw upon strong partnerships across federal
        agencies, including the Consumer Financial 

        Protection Bureau and prudential regulators. The Action Plan to Advance
        Property Appraisal and Valuation 

        Equity includes a commitment from the agencies that oversee mortgage
        lending to include a 

        nondiscrimination standard in the proposed rules for Automated Valuation
        Models.52

        The Equal Employment Opportunity Commission and the Department of
        Justice have clearly 

        laid out how employers’ use of AI and other automated systems can result
        in 

        discrimination against job applicants and employees with disabilities.53
        The documents explain 

        how employers’ use of software that relies on algorithmic
        decision-making may violate existing requirements 

        under Title I of the Americans with Disabilities Act (“ADA”). This
        technical assistance also provides practical 

        tips to employers on how to comply with the ADA, and to job applicants
        and employees who think that their 

        rights may have been violated. 

        Disparity assessments identified harms to Black patients' healthcare
        access. A widely 

        used healthcare algorithm relied on the cost of each patient’s past
        medical care to predict future medical needs, 

        recommending early interventions for the patients deemed most at risk.
        This process discriminated 

        against Black patients, who generally have less access to medical care
        and therefore have generated less cost 

        than white patients with similar illness and need. A landmark study
        documented this pattern and proposed 

        practical ways that were shown to reduce this bias, such as focusing
        specifically on active chronic health 

        conditions or avoidable future costs related to emergency visits and
        hospitalization.54 

        Large employers have developed best practices to scrutinize the data and
        models used 

        for hiring. An industry initiative has developed Algorithmic Bias
        Safeguards for the Workforce, a structured 

        questionnaire that businesses can use proactively when procuring
        software to evaluate workers. It covers 

        specific technical questions such as the training data used, model
        training process, biases identified, and 

        mitigation steps employed.55 

        Standards organizations have developed guidelines to incorporate
        accessibility criteria 

        into technology design processes. The most prevalent in the United
        States is the Access Board’s Section 

        508 regulations,56 which are the technical standards for federal
        information communication technology (software, 

        hardware, and web). Other standards include those issued by the
        International Organization for 

        Standardization,57 and the World Wide Web Consortium Web Content
        Accessibility Guidelines,58 a globally 

        recognized voluntary consensus standard for web content and other
        information and communications 

        technology. 

        NIST has released Special Publication 1270, Towards a Standard for
        Identifying and Managing Bias 

        in Artificial Intelligence.59 The special publication: describes the
        stakes and challenges of bias in artificial 

        intelligence and provides examples of how and why it can chip away at
        public trust; identifies three categories 

        of bias in AI  systemic, statistical, and human  and describes how and
        where they contribute to harms; and 

        describes three broad challenges for mitigating bias  datasets, testing
        and evaluation, and human factors  and 

        introduces preliminary guidance for addressing them. Throughout, the
        special publication takes a socio-

        technical perspective to identifying and managing AI bias. 

        29

        Algorithmic 

        Discrimination 

        Protections 
      - >2
         
         
        ENDNOTES

        85. Mick Dumke and Frank Main. A look inside the watch list Chicago
        police fought to keep secret. The

        Chicago Sun Times. May 18, 2017.

        https://chicago.suntimes.com/2017/5/18/18386116/a-look-inside-the-watch-list-chicago-police-fought­

        to-keep-secret

        86. Jay Stanley. Pitfalls of Artificial Intelligence Decisionmaking
        Highlighted In Idaho ACLU Case.

        ACLU. Jun. 2, 2017.

        https://www.aclu.org/blog/privacy-technology/pitfalls-artificial-intelligence-decisionmaking­

        highlighted-idaho-aclu-case

        87. Illinois General Assembly. Biometric Information Privacy Act.
        Effective Oct. 3, 2008.

        https://www.ilga.gov/legislation/ilcs/ilcs3.asp?ActID=3004&ChapterID=57

        88. Partnership on AI. ABOUT ML Reference Document. Accessed May 2,
        2022.

        https://partnershiponai.org/paper/about-ml-reference-document/1/

        89. See, e.g., the model cards framework: Margaret Mitchell, Simone Wu,
        Andrew Zaldivar, Parker

        Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah
        Raji, and Timnit Gebru.

        Model Cards for Model Reporting. In Proceedings of the Conference on
        Fairness, Accountability, and

        Transparency (FAT* '19). Association for Computing Machinery, New York,
        NY, USA, 220–229. https://

        dl.acm.org/doi/10.1145/3287560.3287596

        90. Sarah Ammermann. Adverse Action Notice Requirements Under the ECOA
        and the FCRA. Consumer

        Compliance Outlook. Second Quarter 2013.

        https://consumercomplianceoutlook.org/2013/second-quarter/adverse-action-notice-requirements­

        under-ecoa-fcra/

        91. Federal Trade Commission. Using Consumer Reports for Credit
        Decisions: What to Know About

        Adverse Action and Risk-Based Pricing Notices. Accessed May 2, 2022.

        https://www.ftc.gov/business-guidance/resources/using-consumer-reports-credit-decisions-what­

        know-about-adverse-action-risk-based-pricing-notices#risk

        92. Consumer Financial Protection Bureau. CFPB Acts to Protect the
        Public from Black-Box Credit

        Models Using Complex Algorithms. May 26, 2022.

        https://www.consumerfinance.gov/about-us/newsroom/cfpb-acts-to-protect-the-public-from-black­

        box-credit-models-using-complex-algorithms/

        93. Anthony Zaller. California Passes Law Regulating Quotas In
        Warehouses – What Employers Need to

        Know About AB 701. Zaller Law Group California Employment Law Report.
        Sept. 24, 2021.

        https://www.californiaemploymentlawreport.com/2021/09/california-passes-law-regulating-quotas­

        in-warehouses-what-employers-need-to-know-about-ab-701/

        94. National Institute of Standards and Technology. AI Fundamental
        Research – Explainability.

        Accessed Jun. 4, 2022.

        https://www.nist.gov/artificial-intelligence/ai-fundamental-research-explainability

        95. DARPA. Explainable Artificial Intelligence (XAI). Accessed July 20,
        2022.

        https://www.darpa.mil/program/explainable-artificial-intelligence

        71
      - >2
         
        ENDNOTES

        12. Expectations about reporting are intended for the entity developing
        or using the automated system. The

        resulting reports can be provided to the public, regulators, auditors,
        industry standards groups, or others

        engaged in independent review, and should be made public as much as
        possible consistent with law,

        regulation, and policy, and noting that intellectual property or law
        enforcement considerations may prevent

        public release. These reporting expectations are important for
        transparency, so the American people can

        have confidence that their rights, opportunities, and access as well as
        their expectations around

        technologies are respected.

        13. National Artificial Intelligence Initiative Office. Agency
        Inventories of AI Use Cases. Accessed Sept. 8,

        2022. https://www.ai.gov/ai-use-case-inventories/

        14. National Highway Traffic Safety Administration.
        https://www.nhtsa.gov/

        15. See, e.g., Charles Pruitt. People Doing What They Do Best: The
        Professional Engineers and NHTSA. Public

        Administration Review. Vol. 39, No. 4. Jul.-Aug., 1979.
        https://www.jstor.org/stable/976213?seq=1

        16. The US Department of Transportation has publicly described the
        health and other benefits of these

        “traffic calming” measures. See, e.g.: U.S. Department of
        Transportation. Traffic Calming to Slow Vehicle

        Speeds. Accessed Apr. 17, 2022.
        https://www.transportation.gov/mission/health/Traffic-Calming-to-Slow­

        Vehicle-Speeds

        17. Karen Hao. Worried about your firm’s AI ethics? These startups are
        here to help.

        A growing ecosystem of “responsible AI” ventures promise to help
        organizations monitor and fix their AI

        models. MIT Technology Review. Jan 15., 2021.

        https://www.technologyreview.com/2021/01/15/1016183/ai-ethics-startups/;
        Disha Sinha. Top Progressive

        Companies Building Ethical AI to Look Out for in 2021. Analytics
        Insight. June 30, 2021. https://

        www.analyticsinsight.net/top-progressive-companies-building-ethical-ai-to-look-out-for­

        in-2021/
        https://www.technologyreview.com/2021/01/15/1016183/ai-ethics-startups/;
        Disha Sinha. Top

        Progressive Companies Building Ethical AI to Look Out for in 2021.
        Analytics Insight. June 30, 2021.

        18. Office of Management and Budget. Study to Identify Methods to Assess
        Equity: Report to the President.

        Aug. 2021.
        https://www.whitehouse.gov/wp-content/uploads/2021/08/OMB-Report-on-E013985­

        Implementation_508-Compliant-Secure-v1.1.pdf

        19. National Institute of Standards and Technology. AI Risk Management
        Framework. Accessed May 23,

        2022. https://www.nist.gov/itl/ai-risk-management-framework

        20. U.S. Department of Energy. U.S. Department of Energy Establishes
        Artificial Intelligence Advancement

        Council. U.S. Department of Energy Artificial Intelligence and
        Technology Office. April 18, 2022. https://

        www.energy.gov/ai/articles/us-department-energy-establishes-artificial-intelligence-advancement-council

        21. Department of Defense. U.S Department of Defense Responsible
        Artificial Intelligence Strategy and

        Implementation Pathway. Jun. 2022.
        https://media.defense.gov/2022/Jun/22/2003022604/-1/-1/0/

        Department-of-Defense-Responsible-Artificial-Intelligence-Strategy-and-Implementation­

        Pathway.PDF

        22. Director of National Intelligence. Principles of Artificial
        Intelligence Ethics for the Intelligence

        Community.
        https://www.dni.gov/index.php/features/2763-principles-of-artificial-intelligence-ethics-for­

        the-intelligence-community

        64
  - source_sentence: >-
      How does the Blueprint for an AI Bill of Rights relate to existing laws
      and regulations regarding automated systems?
    sentences:
      - >2
         
         
         
         
         
         
         
         
         
         
         
         
         
         
        About this Document 

        The Blueprint for an AI Bill of Rights: Making Automated Systems Work
        for the American People was 

        published by the White House Office of Science and Technology Policy in
        October 2022. This framework was 

        released one year after OSTP announced the launch of a process to
        develop “a bill of rights for an AI-powered 

        world.” Its release follows a year of public engagement to inform this
        initiative. The framework is available 

        online at: https://www.whitehouse.gov/ostp/ai-bill-of-rights 

        About the Office of Science and Technology Policy 

        The Office of Science and Technology Policy (OSTP) was established by
        the National Science and Technology 

        Policy, Organization, and Priorities Act of 1976 to provide the
        President and others within the Executive Office 

        of the President with advice on the scientific, engineering, and
        technological aspects of the economy, national 

        security, health, foreign relations, the environment, and the
        technological recovery and use of resources, among 

        other topics. OSTP leads interagency science and technology policy
        coordination efforts, assists the Office of 

        Management and Budget (OMB) with an annual review and analysis of
        Federal research and development in 

        budgets, and serves as a source of scientific and technological analysis
        and judgment for the President with 

        respect to major policies, plans, and programs of the Federal
        Government. 

        Legal Disclaimer 

        The Blueprint for an AI Bill of Rights: Making Automated Systems Work
        for the American People is a white paper 

        published by the White House Office of Science and Technology Policy. It
        is intended to support the 

        development of policies and practices that protect civil rights and
        promote democratic values in the building, 

        deployment, and governance of automated systems. 

        The Blueprint for an AI Bill of Rights is non-binding and does not
        constitute U.S. government policy. It 

        does not supersede, modify, or direct an interpretation of any existing
        statute, regulation, policy, or 

        international instrument. It does not constitute binding guidance for
        the public or Federal agencies and 

        therefore does not require compliance with the principles described
        herein. It also is not determinative of what 

        the U.S. government’s position will be in any international negotiation.
        Adoption of these principles may not 

        meet the requirements of existing statutes, regulations, policies, or
        international instruments, or the 

        requirements of the Federal agencies that enforce them. These principles
        are not intended to, and do not, 

        prohibit or limit any lawful activity of a government agency, including
        law enforcement, national security, or 

        intelligence activities. 

        The appropriate application of the principles set forth in this white
        paper depends significantly on the 

        context in which automated systems are being utilized. In some
        circumstances, application of these principles 

        in whole or in part may not be appropriate given the intended use of
        automated systems to achieve government 

        agency missions. Future sector-specific guidance will likely be
        necessary and important for guiding the use of 

        automated systems in certain settings such as AI systems used as part of
        school building security or automated 

        health diagnostic systems. 

        The Blueprint for an AI Bill of Rights recognizes that law enforcement
        activities require a balancing of 

        equities, for example, between the protection of sensitive law
        enforcement information and the principle of 

        notice; as such, notice may not be appropriate, or may need to be
        adjusted to protect sources, methods, and 

        other law enforcement equities. Even in contexts where these principles
        may not apply in whole or in part, 

        federal departments and agencies remain subject to judicial, privacy,
        and civil liberties oversight as well as 

        existing policies and safeguards that govern automated systems,
        including, for example, Executive Order 13960, 

        Promoting the Use of Trustworthy Artificial Intelligence in the Federal
        Government (December 2020). 

        This white paper recognizes that national security (which includes
        certain law enforcement and 

        homeland security activities) and defense activities are of increased
        sensitivity and interest to our nation’s 

        adversaries and are often subject to special requirements, such as those
        governing classified information and 

        other protected data. Such activities require alternative, compatible
        safeguards through existing policies that 

        govern automated systems and AI, such as the Department of Defense (DOD)
        AI Ethical Principles and 

        Responsible AI Implementation Pathway and the Intelligence Community
        (IC) AI Ethics Principles and 

        Framework. The implementation of these policies to national security and
        defense activities can be informed by 

        the Blueprint for an AI Bill of Rights where feasible. 

        The Blueprint for an AI Bill of Rights is not intended to, and does not,
        create any legal right, benefit, or 

        defense, substantive or procedural, enforceable at law or in equity by
        any party against the United States, its 

        departments, agencies, or entities, its officers, employees, or agents,
        or any other person, nor does it constitute a 

        waiver of sovereign immunity. 

        Copyright Information 

        This document is a work of the United States Government and is in the
        public domain (see 17 U.S.C. §105). 

        2
      - >2
         
        ENDNOTES

        12. Expectations about reporting are intended for the entity developing
        or using the automated system. The

        resulting reports can be provided to the public, regulators, auditors,
        industry standards groups, or others

        engaged in independent review, and should be made public as much as
        possible consistent with law,

        regulation, and policy, and noting that intellectual property or law
        enforcement considerations may prevent

        public release. These reporting expectations are important for
        transparency, so the American people can

        have confidence that their rights, opportunities, and access as well as
        their expectations around

        technologies are respected.

        13. National Artificial Intelligence Initiative Office. Agency
        Inventories of AI Use Cases. Accessed Sept. 8,

        2022. https://www.ai.gov/ai-use-case-inventories/

        14. National Highway Traffic Safety Administration.
        https://www.nhtsa.gov/

        15. See, e.g., Charles Pruitt. People Doing What They Do Best: The
        Professional Engineers and NHTSA. Public

        Administration Review. Vol. 39, No. 4. Jul.-Aug., 1979.
        https://www.jstor.org/stable/976213?seq=1

        16. The US Department of Transportation has publicly described the
        health and other benefits of these

        “traffic calming” measures. See, e.g.: U.S. Department of
        Transportation. Traffic Calming to Slow Vehicle

        Speeds. Accessed Apr. 17, 2022.
        https://www.transportation.gov/mission/health/Traffic-Calming-to-Slow­

        Vehicle-Speeds

        17. Karen Hao. Worried about your firm’s AI ethics? These startups are
        here to help.

        A growing ecosystem of “responsible AI” ventures promise to help
        organizations monitor and fix their AI

        models. MIT Technology Review. Jan 15., 2021.

        https://www.technologyreview.com/2021/01/15/1016183/ai-ethics-startups/;
        Disha Sinha. Top Progressive

        Companies Building Ethical AI to Look Out for in 2021. Analytics
        Insight. June 30, 2021. https://

        www.analyticsinsight.net/top-progressive-companies-building-ethical-ai-to-look-out-for­

        in-2021/
        https://www.technologyreview.com/2021/01/15/1016183/ai-ethics-startups/;
        Disha Sinha. Top

        Progressive Companies Building Ethical AI to Look Out for in 2021.
        Analytics Insight. June 30, 2021.

        18. Office of Management and Budget. Study to Identify Methods to Assess
        Equity: Report to the President.

        Aug. 2021.
        https://www.whitehouse.gov/wp-content/uploads/2021/08/OMB-Report-on-E013985­

        Implementation_508-Compliant-Secure-v1.1.pdf

        19. National Institute of Standards and Technology. AI Risk Management
        Framework. Accessed May 23,

        2022. https://www.nist.gov/itl/ai-risk-management-framework

        20. U.S. Department of Energy. U.S. Department of Energy Establishes
        Artificial Intelligence Advancement

        Council. U.S. Department of Energy Artificial Intelligence and
        Technology Office. April 18, 2022. https://

        www.energy.gov/ai/articles/us-department-energy-establishes-artificial-intelligence-advancement-council

        21. Department of Defense. U.S Department of Defense Responsible
        Artificial Intelligence Strategy and

        Implementation Pathway. Jun. 2022.
        https://media.defense.gov/2022/Jun/22/2003022604/-1/-1/0/

        Department-of-Defense-Responsible-Artificial-Intelligence-Strategy-and-Implementation­

        Pathway.PDF

        22. Director of National Intelligence. Principles of Artificial
        Intelligence Ethics for the Intelligence

        Community.
        https://www.dni.gov/index.php/features/2763-principles-of-artificial-intelligence-ethics-for­

        the-intelligence-community

        64
      - >2
         
        12 

        CSAM. Even when trained on “clean” data, increasingly capable GAI models
        can synthesize or produce 

        synthetic NCII and CSAM. Websites, mobile apps, and custom-built models
        that generate synthetic NCII 

        have moved from niche internet forums to mainstream, automated, and
        scaled online businesses.  

        Trustworthy AI Characteristics: Fair with Harmful Bias Managed, Safe,
        Privacy Enhanced 

        2.12. 

        Value Chain and Component Integration 

        GAI value chains involve many third-party components such as procured
        datasets, pre-trained models, 

        and software libraries. These components might be improperly obtained or
        not properly vetted, leading 

        to diminished transparency or accountability for downstream users. While
        this is a risk for traditional AI 

        systems and some other digital technologies, the risk is exacerbated for
        GAI due to the scale of the 

        training data, which may be too large for humans to vet; the difficulty of
        training foundation models, 

        which leads to extensive reuse of limited numbers of models; and the
        extent to which GAI may be 

        integrated into other devices and services. As GAI systems often involve
        many distinct third-party 

        components and data sources, it may be difficult to attribute issues in a
        system’s behavior to any one of 

        these sources. 

        Errors in third-party GAI components can also have downstream impacts on
        accuracy and robustness. 

        For example, test datasets commonly used to benchmark or validate models
        can contain label errors. 

        Inaccuracies in these labels can impact the “stability” or robustness of
        these benchmarks, which many 

        GAI practitioners consider during the model selection process.  

        Trustworthy AI Characteristics: Accountable and Transparent, Explainable
        and Interpretable, Fair with 

        Harmful Bias Managed, Privacy Enhanced, Safe, Secure and Resilient,
        Valid and Reliable 

        3. 

        Suggested Actions to Manage GAI Risks 

        The following suggested actions target risks unique to or exacerbated by
        GAI. 

        In addition to the suggested actions below, AI risk management
        activities and actions set forth in the AI 

        RMF 1.0 and Playbook are already applicable for managing GAI risks.
        Organizations are encouraged to 

        apply the activities suggested in the AI RMF and its Playbook when
        managing the risk of GAI systems.  

        Implementation of the suggested actions will vary depending on the type
        of risk, characteristics of GAI 

        systems, stage of the GAI lifecycle, and relevant AI actors involved.  

        Suggested actions to manage GAI risks can be found in the tables below: 



        The suggested actions are organized by relevant AI RMF subcategories to
        streamline these 

        activities alongside implementation of the AI RMF.  



        Not every subcategory of the AI RMF is included in this document.13
        Suggested actions are 

        listed for only some subcategories.  
         
         
        13 As this document was focused on the GAI PWG efforts and primary
        considerations (see Appendix A), AI RMF 

        subcategories not addressed here may be added later.  
  - source_sentence: >-
      What proactive steps should be taken during the design phase of automated
      systems to assess equity and prevent algorithmic discrimination?
    sentences:
      - >2
         
         
         
         
         
         
         
        WHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS

        The expectations for automated systems are meant to serve as a blueprint
        for the development of additional 

        technical standards and practices that are tailored for particular
        sectors and contexts. 

        Any automated system should be tested to help ensure it is free from
        algorithmic discrimination before it can be 

        sold or used. Protection against algorithmic discrimination should
        include designing to ensure equity, broadly 

        construed.  Some algorithmic discrimination is already prohibited under
        existing anti-discrimination law. The 

        expectations set out below describe proactive technical and policy steps
        that can be taken to not only 

        reinforce those legal protections but extend beyond them to ensure
        equity for underserved communities48 

        even in circumstances where a specific legal protection may not be
        clearly established. These protections 

        should be instituted throughout the design, development, and deployment
        process and are described below 

        roughly in the order in which they would be instituted. 

        Protect the public from algorithmic discrimination in a proactive and
        ongoing manner 

        Proactive assessment of equity in design. Those responsible for the
        development, use, or oversight of 

        automated systems should conduct proactive equity assessments in the
        design phase of the technology 

        research and development or during its acquisition to review potential
        input data, associated historical 

        context, accessibility for people with disabilities, and societal goals
        to identify potential discrimination and 

        effects on equity resulting from the introduction of the technology. The
        assessed groups should be as inclusive 

        as possible of the underserved communities mentioned in the equity
        definition:  Black, Latino, and Indigenous 

        and Native American persons, Asian Americans and Pacific Islanders and
        other persons of color; members of 

        religious minorities; women, girls, and non-binary people; lesbian, gay,
        bisexual, transgender, queer, and inter-

        sex (LGBTQI+) persons; older adults; persons with disabilities; persons
        who live in rural areas; and persons 

        otherwise adversely affected by persistent poverty or inequality.
        Assessment could include both qualitative 

        and quantitative evaluations of the system. This equity assessment
        should also be considered a core part of the 

        goals of the consultation conducted as part of the safety and efficacy
        review. 

        Representative and robust data. Any data used as part of system
        development or assessment should be 

        representative of local communities based on the planned deployment
        setting and should be reviewed for bias 

        based on the historical and societal context of the data. Such data
        should be sufficiently robust to identify and 

        help to mitigate biases and potential harms. 

        Guarding against proxies.  Directly using demographic information in the
        design, development, or 

        deployment of an automated system (for purposes other than evaluating a
        system for discrimination or using 

        a system to counter discrimination) runs a high risk of leading to
        algorithmic discrimination and should be 

        avoided. In many cases, attributes that are highly correlated with
        demographic features, known as proxies, can 

        contribute to algorithmic discrimination. In cases where use of the
        demographic features themselves would 

        lead to illegal algorithmic discrimination, reliance on such proxies in
        decision-making (such as that facilitated 

        by an algorithm) may also be prohibited by law. Proactive testing should
        be performed to identify proxies by 

        testing for correlation between demographic information and attributes
        in any data used as part of system 

        design, development, or use. If a proxy is identified, designers,
        developers, and deployers should remove the 

        proxy; if needed, it may be possible to identify alternative attributes
        that can be used instead. At a minimum, 

        organizations should ensure a proxy feature is not given undue weight
        and should monitor the system closely 

        for any resulting algorithmic discrimination.   

        26

        Algorithmic 

        Discrimination 

        Protections 
      - >2
         
         
         
         
         
         
         
        HUMAN ALTERNATIVES, 

        CONSIDERATION, AND 

        FALLBACK 

        WHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS

        The expectations for automated systems are meant to serve as a blueprint
        for the development of additional 

        technical standards and practices that are tailored for particular
        sectors and contexts. 

        Equitable. Consideration should be given to ensuring outcomes of the
        fallback and escalation system are 

        equitable when compared to those of the automated system and such that
        the fallback and escalation 

        system provides equitable access to underserved communities.105 

        Timely. Human consideration and fallback are only useful if they are
        conducted and concluded in a 

        timely manner. The determination of what is timely should be made
        relative to the specific automated 

        system, and the review system should be staffed and regularly assessed
        to ensure it is providing timely 

        consideration and fallback. In time-critical systems, this mechanism
        should be immediately available or, 

        where possible, available before the harm occurs. Time-critical systems
        include, but are not limited to, 

        voting-related systems, automated building access and other access
        systems, systems that form a critical 

        component of healthcare, and systems that have the ability to withhold
        wages or otherwise cause 

        immediate financial penalties. 

        Effective. The organizational structure surrounding processes for
        consideration and fallback should 

        be designed so that if the human decision-maker charged with reassessing
        a decision determines that it 

        should be overruled, the new decision will be effectively enacted. This
        includes ensuring that the new 

        decision is entered into the automated system throughout its components,
        any previous repercussions from 

        the old decision are also overturned, and safeguards are put in place to
        help ensure that future decisions do 

        not result in the same errors. 

        Maintained. The human consideration and fallback process and any
        associated automated processes 

        should be maintained and supported as long as the relevant automated
        system continues to be in use. 

        Institute training, assessment, and oversight to combat automation bias
        and ensure any 

        human-based components of a system are effective. 

        Training and assessment. Anyone administering, interacting with, or
        interpreting the outputs of an auto­

        mated system should receive training in that system, including how to
        properly interpret outputs of a system 

        in light of its intended purpose and in how to mitigate the effects of
        automation bias. The training should reoc­

        cur regularly to ensure it is up to date with the system and to ensure
        the system is used appropriately. Assess­

        ment should be ongoing to ensure that the use of the system with human
        involvement provides for appropri­

        ate results, i.e., that the involvement of people does not invalidate
        the system's assessment as safe and effective 

        or lead to algorithmic discrimination. 

        Oversight. Human-based systems have the potential for bias, including
        automation bias, as well as other 

        concerns that may limit their effectiveness. The results of assessments
        of the efficacy and potential bias of 

        such human-based systems should be overseen by governance structures
        that have the potential to update the 

        operation of the human-based system in order to mitigate these effects. 

        50
      - >2
         
         
         
        Applying The Blueprint for an AI Bill of Rights 

        SENSITIVE DATA: Data and metadata are sensitive if they pertain to an
        individual in a sensitive domain 

        (defined below); are generated by technologies used in a sensitive
        domain; can be used to infer data from a 

        sensitive domain or sensitive data about an individual (such as
        disability-related data, genomic data, biometric 

        data, behavioral data, geolocation data, data related to interaction
        with the criminal justice system, relationship 

        history and legal status such as custody and divorce information, and
        home, work, or school environmental 

        data); or have the reasonable potential to be used in ways that are
        likely to expose individuals to meaningful 

        harm, such as a loss of privacy or financial harm due to identity theft.
        Data and metadata generated by or about 

        those who are not yet legal adults is also sensitive, even if not
        related to a sensitive domain. Such data includes, 

        but is not limited to, numerical, text, image, audio, or video data. 

        SENSITIVE DOMAINS: “Sensitive domains” are those in which activities
        being conducted can cause material 

        harms, including significant adverse effects on human rights such as
        autonomy and dignity, as well as civil liber­

        ties and civil rights. Domains that have historically been singled out
        as deserving of enhanced data protections 

        or where such enhanced protections are reasonably expected by the public
        include, but are not limited to, 

        health, family planning and care, employment, education, criminal
        justice, and personal finance. In the context 

        of this framework, such domains are considered sensitive whether or not
        the specifics of a system context 

        would necessitate coverage under existing law, and domains and data that
        are considered sensitive are under­

        stood to change over time based on societal norms and context. 

        SURVEILLANCE TECHNOLOGY: “Surveillance technology” refers to products or
        services marketed for 

        or that can be lawfully used to detect, monitor, intercept, collect,
        exploit, preserve, protect, transmit, and/or 

        retain data, identifying information, or communications concerning
        individuals or groups. This framework 

        limits its focus to both government and commercial use of surveillance
        technologies when juxtaposed with 

        real-time or subsequent automated analysis and when such systems have a
        potential for meaningful impact 

        on individuals’ or communities’ rights, opportunities, or access. 

        UNDERSERVED COMMUNITIES: The term “underserved communities” refers to
        communities that have 

        been systematically denied a full opportunity to participate in aspects
        of economic, social, and civic life, as 

        exemplified by the list in the preceding definition of “equity.” 

        11
model-index:
  - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-m
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: cosine_accuracy@1
            value: 0.7
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.9
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.9666666666666667
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 1
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.7
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.3
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.19333333333333338
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.10000000000000003
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.7
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.9
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.9666666666666667
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 1
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.8478532019852957
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7983333333333333
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7983333333333333
            name: Cosine Map@100
          - type: dot_accuracy@1
            value: 0.7
            name: Dot Accuracy@1
          - type: dot_accuracy@3
            value: 0.9
            name: Dot Accuracy@3
          - type: dot_accuracy@5
            value: 0.9666666666666667
            name: Dot Accuracy@5
          - type: dot_accuracy@10
            value: 1
            name: Dot Accuracy@10
          - type: dot_precision@1
            value: 0.7
            name: Dot Precision@1
          - type: dot_precision@3
            value: 0.3
            name: Dot Precision@3
          - type: dot_precision@5
            value: 0.19333333333333338
            name: Dot Precision@5
          - type: dot_precision@10
            value: 0.10000000000000003
            name: Dot Precision@10
          - type: dot_recall@1
            value: 0.7
            name: Dot Recall@1
          - type: dot_recall@3
            value: 0.9
            name: Dot Recall@3
          - type: dot_recall@5
            value: 0.9666666666666667
            name: Dot Recall@5
          - type: dot_recall@10
            value: 1
            name: Dot Recall@10
          - type: dot_ndcg@10
            value: 0.8478532019852957
            name: Dot Ndcg@10
          - type: dot_mrr@10
            value: 0.7983333333333333
            name: Dot Mrr@10
          - type: dot_map@100
            value: 0.7983333333333333
            name: Dot Map@100

SentenceTransformer based on Snowflake/snowflake-arctic-embed-m

This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Snowflake/snowflake-arctic-embed-m
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("rgtlai/ai-policy-ft")
# Run inference
sentences = [
    'What proactive steps should be taken during the design phase of automated systems to assess equity and prevent algorithmic discrimination?',
    ' \n \n \n \n \n \n \nWHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS\nThe expectations for automated systems are meant to serve as a blueprint for the development of additional \ntechnical standards and practices that are tailored for particular sectors and contexts. \nAny automated system should be tested to help ensure it is free from algorithmic discrimination before it can be \nsold or used. Protection against algorithmic discrimination should include designing to ensure equity, broadly \nconstrued.  Some algorithmic discrimination is already prohibited under existing anti-discrimination law. The \nexpectations set out below describe proactive technical and policy steps that can be taken to not only \nreinforce those legal protections but extend beyond them to ensure equity for underserved communities48 \neven in circumstances where a specific legal protection may not be clearly established. These protections \nshould be instituted throughout the design, development, and deployment process and are described below \nroughly in the order in which they would be instituted. \nProtect the public from algorithmic discrimination in a proactive and ongoing manner \nProactive assessment of equity in design. Those responsible for the development, use, or oversight of \nautomated systems should conduct proactive equity assessments in the design phase of the technology \nresearch and development or during its acquisition to review potential input data, associated historical \ncontext, accessibility for people with disabilities, and societal goals to identify potential discrimination and \neffects on equity resulting from the introduction of the technology. The assessed groups should be as inclusive \nas possible of the underserved communities mentioned in the equity definition:  Black, Latino, and Indigenous \nand Native American persons, Asian Americans and Pacific Islanders and other persons of color; members of \nreligious minorities; women, girls, and non-binary people; lesbian, gay, bisexual, transgender, queer, and inter-\nsex (LGBTQI+) persons; older adults; persons with disabilities; persons who live in rural areas; and persons \notherwise adversely affected by persistent poverty or inequality. Assessment could include both qualitative \nand quantitative evaluations of the system. This equity assessment should also be considered a core part of the \ngoals of the consultation conducted as part of the safety and efficacy review. \nRepresentative and robust data. Any data used as part of system development or assessment should be \nrepresentative of local communities based on the planned deployment setting and should be reviewed for bias \nbased on the historical and societal context of the data. Such data should be sufficiently robust to identify and \nhelp to mitigate biases and potential harms. \nGuarding against proxies.  Directly using demographic information in the design, development, or \ndeployment of an automated system (for purposes other than evaluating a system for discrimination or using \na system to counter discrimination) runs a high risk of leading to algorithmic discrimination and should be \navoided. In many cases, attributes that are highly correlated with demographic features, known as proxies, can \ncontribute to algorithmic discrimination. In cases where use of the demographic features themselves would \nlead to illegal algorithmic discrimination, reliance on such proxies in decision-making (such as that facilitated \nby an algorithm) may also be prohibited by law. Proactive testing should be performed to identify proxies by \ntesting for correlation between demographic information and attributes in any data used as part of system \ndesign, development, or use. If a proxy is identified, designers, developers, and deployers should remove the \nproxy; if needed, it may be possible to identify alternative attributes that can be used instead. At a minimum, \norganizations should ensure a proxy feature is not given undue weight and should monitor the system closely \nfor any resulting algorithmic discrimination.   \n26\nAlgorithmic \nDiscrimination \nProtections \n',
    ' \n \n \nApplying The Blueprint for an AI Bill of Rights \nSENSITIVE DATA: Data and metadata are sensitive if they pertain to an individual in a sensitive domain \n(defined below); are generated by technologies used in a sensitive domain; can be used to infer data from a \nsensitive domain or sensitive data about an individual (such as disability-related data, genomic data, biometric \ndata, behavioral data, geolocation data, data related to interaction with the criminal justice system, relationship \nhistory and legal status such as custody and divorce information, and home, work, or school environmental \ndata); or have the reasonable potential to be used in ways that are likely to expose individuals to meaningful \nharm, such as a loss of privacy or financial harm due to identity theft. Data and metadata generated by or about \nthose who are not yet legal adults is also sensitive, even if not related to a sensitive domain. Such data includes, \nbut is not limited to, numerical, text, image, audio, or video data. \nSENSITIVE DOMAINS: “Sensitive domains” are those in which activities being conducted can cause material \nharms, including significant adverse effects on human rights such as autonomy and dignity, as well as civil liber\xad\nties and civil rights. Domains that have historically been singled out as deserving of enhanced data protections \nor where such enhanced protections are reasonably expected by the public include, but are not limited to, \nhealth, family planning and care, employment, education, criminal justice, and personal finance. In the context \nof this framework, such domains are considered sensitive whether or not the specifics of a system context \nwould necessitate coverage under existing law, and domains and data that are considered sensitive are under\xad\nstood to change over time based on societal norms and context. \nSURVEILLANCE TECHNOLOGY: “Surveillance technology” refers to products or services marketed for \nor that can be lawfully used to detect, monitor, intercept, collect, exploit, preserve, protect, transmit, and/or \nretain data, identifying information, or communications concerning individuals or groups. This framework \nlimits its focus to both government and commercial use of surveillance technologies when juxtaposed with \nreal-time or subsequent automated analysis and when such systems have a potential for meaningful impact \non individuals’ or communities’ rights, opportunities, or access. \nUNDERSERVED COMMUNITIES: The term “underserved communities” refers to communities that have \nbeen systematically denied a full opportunity to participate in aspects of economic, social, and civic life, as \nexemplified by the list in the preceding definition of “equity.” \n11\n',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.7
cosine_accuracy@3 0.9
cosine_accuracy@5 0.9667
cosine_accuracy@10 1.0
cosine_precision@1 0.7
cosine_precision@3 0.3
cosine_precision@5 0.1933
cosine_precision@10 0.1
cosine_recall@1 0.7
cosine_recall@3 0.9
cosine_recall@5 0.9667
cosine_recall@10 1.0
cosine_ndcg@10 0.8479
cosine_mrr@10 0.7983
cosine_map@100 0.7983
dot_accuracy@1 0.7
dot_accuracy@3 0.9
dot_accuracy@5 0.9667
dot_accuracy@10 1.0
dot_precision@1 0.7
dot_precision@3 0.3
dot_precision@5 0.1933
dot_precision@10 0.1
dot_recall@1 0.7
dot_recall@3 0.9
dot_recall@5 0.9667
dot_recall@10 1.0
dot_ndcg@10 0.8479
dot_mrr@10 0.7983
dot_map@100 0.7983

Training Details

Training Dataset

Unnamed Dataset

  • Size: 200 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 200 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 12 tokens
    • mean: 22.34 tokens
    • max: 38 tokens
    • min: 21 tokens
    • mean: 447.96 tokens
    • max: 512 tokens
  • Samples:
    sentence_0 sentence_1
    What is the purpose of the AI Bill of Rights mentioned in the context?









    BLUEPRINT FOR AN
    AI BILL OF
    RIGHTS
    MAKING AUTOMATED
    SYSTEMS WORK FOR
    THE AMERICAN PEOPLE
    OCTOBER 2022
    When was the Blueprint for an AI Bill of Rights published?









    BLUEPRINT FOR AN
    AI BILL OF
    RIGHTS
    MAKING AUTOMATED
    SYSTEMS WORK FOR
    THE AMERICAN PEOPLE
    OCTOBER 2022
    What is the purpose of the Blueprint for an AI Bill of Rights as published by the White House Office of Science and Technology Policy?













    About this Document
    The Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People was
    published by the White House Office of Science and Technology Policy in October 2022. This framework was
    released one year after OSTP announced the launch of a process to develop “a bill of rights for an AI-powered
    world.” Its release follows a year of public engagement to inform this initiative. The framework is available
    online at: https://www.whitehouse.gov/ostp/ai-bill-of-rights
    About the Office of Science and Technology Policy
    The Office of Science and Technology Policy (OSTP) was established by the National Science and Technology
    Policy, Organization, and Priorities Act of 1976 to provide the President and others within the Executive Office
    of the President with advice on the scientific, engineering, and technological aspects of the economy, national
    security, health, foreign relations, the environment, and the technological recovery and use of resources, among
    other topics. OSTP leads interagency science and technology policy coordination efforts, assists the Office of
    Management and Budget (OMB) with an annual review and analysis of Federal research and development in
    budgets, and serves as a source of scientific and technological analysis and judgment for the President with
    respect to major policies, plans, and programs of the Federal Government.
    Legal Disclaimer
    The Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People is a white paper
    published by the White House Office of Science and Technology Policy. It is intended to support the
    development of policies and practices that protect civil rights and promote democratic values in the building,
    deployment, and governance of automated systems.
    The Blueprint for an AI Bill of Rights is non-binding and does not constitute U.S. government policy. It
    does not supersede, modify, or direct an interpretation of any existing statute, regulation, policy, or
    international instrument. It does not constitute binding guidance for the public or Federal agencies and
    therefore does not require compliance with the principles described herein. It also is not determinative of what
    the U.S. government’s position will be in any international negotiation. Adoption of these principles may not
    meet the requirements of existing statutes, regulations, policies, or international instruments, or the
    requirements of the Federal agencies that enforce them. These principles are not intended to, and do not,
    prohibit or limit any lawful activity of a government agency, including law enforcement, national security, or
    intelligence activities.
    The appropriate application of the principles set forth in this white paper depends significantly on the
    context in which automated systems are being utilized. In some circumstances, application of these principles
    in whole or in part may not be appropriate given the intended use of automated systems to achieve government
    agency missions. Future sector-specific guidance will likely be necessary and important for guiding the use of
    automated systems in certain settings such as AI systems used as part of school building security or automated
    health diagnostic systems.
    The Blueprint for an AI Bill of Rights recognizes that law enforcement activities require a balancing of
    equities, for example, between the protection of sensitive law enforcement information and the principle of
    notice; as such, notice may not be appropriate, or may need to be adjusted to protect sources, methods, and
    other law enforcement equities. Even in contexts where these principles may not apply in whole or in part,
    federal departments and agencies remain subject to judicial, privacy, and civil liberties oversight as well as
    existing policies and safeguards that govern automated systems, including, for example, Executive Order 13960,
    Promoting the Use of Trustworthy Artificial Intelligence in the Federal Government (December 2020).
    This white paper recognizes that national security (which includes certain law enforcement and
    homeland security activities) and defense activities are of increased sensitivity and interest to our nation’s
    adversaries and are often subject to special requirements, such as those governing classified information and
    other protected data. Such activities require alternative, compatible safeguards through existing policies that
    govern automated systems and AI, such as the Department of Defense (DOD) AI Ethical Principles and
    Responsible AI Implementation Pathway and the Intelligence Community (IC) AI Ethics Principles and
    Framework. The implementation of these policies to national security and defense activities can be informed by
    the Blueprint for an AI Bill of Rights where feasible.
    The Blueprint for an AI Bill of Rights is not intended to, and does not, create any legal right, benefit, or
    defense, substantive or procedural, enforceable at law or in equity by any party against the United States, its
    departments, agencies, or entities, its officers, employees, or agents, or any other person, nor does it constitute a
    waiver of sovereign immunity.
    Copyright Information
    This document is a work of the United States Government and is in the public domain (see 17 U.S.C. §105).
    2
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 5
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step cosine_map@100
1.0 13 0.7303
2.0 26 0.7356
3.0 39 0.7828
3.8462 50 0.7817
4.0 52 0.7817
5.0 65 0.7983

Framework Versions

  • Python: 3.11.10
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.2
  • PyTorch: 2.4.1
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}