metadata
base_model: Snowflake/snowflake-arctic-embed-m
library_name: sentence-transformers
metrics:
- pearson_cosine
- spearman_cosine
- pearson_manhattan
- spearman_manhattan
- pearson_euclidean
- spearman_euclidean
- pearson_dot
- spearman_dot
- pearson_max
- spearman_max
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:40
- loss:CosineSimilarityLoss
widget:
- source_sentence: What role does NIST play in establishing AI standards?
sentences:
- >-
provides examples and concrete steps for communities, industry,
governments, and others to take in order to
build these protections into policy, practice, or the technological
design process.
Taken together, the technical protections and practices laid out in the
Blueprint for an AI Bill of Rights can help
guard the American public against many of the potential and actual harms
identified by researchers, technolo
- >-
provides examples and concrete steps for communities, industry,
governments, and others to take in order to
build these protections into policy, practice, or the technological
design process.
Taken together, the technical protections and practices laid out in the
Blueprint for an AI Bill of Rights can help
guard the American public against many of the potential and actual harms
identified by researchers, technolo
- >-
Acknowledgments: This report was accomplished with the many helpful
comments and contributions
from the community, including the NIST Generative AI Public Working
Group, and NIST staff and guest
researchers: Chloe Autio, Jesse Dunietz, Patrick Hall, Shomik Jain,
Kamie Roberts, Reva Schwartz, Martin
Stanley, and Elham Tabassi.
NIST Technical Series Policies
Copyright, Use, and Licensing Statements
NIST Technical Series Publication Identifier Syntax
Publication History
- source_sentence: What are the implications of AI in decision-making processes?
sentences:
- >-
The measures taken to realize the vision set forward in this framework
should be proportionate
with the extent and nature of the harm, or risk of harm, to people's
rights, opportunities, and
access.
RELATIONSHIP TO EXISTING LAW AND POLICY
The Blueprint for an AI Bill of Rights is an exercise in envisioning a
future where the American public is
protected from the potential harms, and can fully enjoy the benefits, of
automated systems. It describes princi
- >-
state of the science of AI measurement and safety today. This document
focuses on risks for which there
is an existing empirical evidence base at the time this profile was
written; for example, speculative risks
that may potentially arise in more advanced, future GAI systems are not
considered. Future updates may
incorporate additional risks or provide further details on the risks
identified below.
- >-
development of automated systems that adhere to and advance their
safety, security and
effectiveness. Multiple NSF programs support research that directly
addresses many of these principles:
the National AI Research Institutes23 support research on all aspects of
safe, trustworthy, fair, and explainable
AI algorithms and systems; the Cyber Physical Systems24 program supports
research on developing safe
- source_sentence: >-
How are AI systems validated for safety and fairness according to NIST
standards?
sentences:
- >-
tion and advises on implementation of the DOE AI Strategy and addresses
issues and/or escalations on the
ethical use and development of AI systems.20 The Department of Defense
has adopted Artificial Intelligence
Ethical Principles, and tenets for Responsible Artificial Intelligence
specifically tailored to its national
security and defense activities.21 Similarly, the U.S. Intelligence
Community (IC) has developed the Principles
- >-
GOVERN 1.1: Legal and regulatory requirements involving AI are
understood, managed, and documented.
Action ID
Suggested Action
GAI Risks
GV-1.1-001 Align GAI development and use with applicable laws and
regulations, including
those related to data privacy, copyright and intellectual property law.
Data Privacy; Harmful Bias and
Homogenization; Intellectual
Property
AI Actor Tasks: Governance and Oversight
- >-
more than a decade, is also helping to fulfill the 2023 Executive Order
on Safe, Secure, and Trustworthy
AI. NIST established the U.S. AI Safety Institute and the companion AI
Safety Institute Consortium to
continue the efforts set in motion by the E.O. to build the science
necessary for safe, secure, and
trustworthy development and use of AI.
Acknowledgments: This report was accomplished with the many helpful
comments and contributions
- source_sentence: How does the AI Bill of Rights protect individual privacy?
sentences:
- >-
match the statistical properties of real-world data without disclosing
personally
identifiable information or contributing to homogenization.
Data Privacy; Intellectual Property;
Information Integrity;
Confabulation; Harmful Bias and
Homogenization
AI Actor Tasks: AI Deployment, AI Impact Assessment, Governance and
Oversight, Operation and Monitoring
MANAGE 2.3: Procedures are followed to respond to and recover from a
previously unknown risk when it is identified.
Action ID
- >-
the principles described in the Blueprint for an AI Bill of Rights may
be necessary to comply with existing law,
conform to the practicalities of a specific use case, or balance
competing public interests. In particular, law
enforcement, and other regulatory contexts may require government actors
to protect civil rights, civil liberties,
and privacy in a manner consistent with, but using alternate mechanisms
to, the specific principles discussed in
- >-
civil rights, civil liberties, and privacy. The Blueprint for an AI Bill
of Rights includes this Foreword, the five
principles, notes on Applying the The Blueprint for an AI Bill of
Rights, and a Technical Companion that gives
concrete steps that can be taken by many kinds of organizations—from
governments at all levels to companies of
all sizes—to uphold these values. Experts from across the private
sector, governments, and international
- source_sentence: How does the AI Bill of Rights protect individual privacy?
sentences:
- >-
57
National Institute of Standards and Technology (2023) AI Risk Management
Framework, Appendix B:
How AI Risks Differ from Traditional Software Risks.
https://airc.nist.gov/AI_RMF_Knowledge_Base/AI_RMF/Appendices/Appendix_B
National Institute of Standards and Technology (2023) AI RMF Playbook.
https://airc.nist.gov/AI_RMF_Knowledge_Base/Playbook
National Institue of Standards and Technology (2023) Framing Risk
- >-
principles for managing information about individuals have been
incorporated into data privacy laws and
policies across the globe.5 The Blueprint for an AI Bill of Rights
embraces elements of the FIPPs that are
particularly relevant to automated systems, without articulating a
specific set of FIPPs or scoping
applicability or the interests served to a single particular domain,
like privacy, civil rights and civil liberties,
- >-
harmful
uses.
The
NIST
framework
will
consider
and
encompass
principles
such
as
transparency, accountability, and fairness during pre-design, design and
development, deployment, use,
and testing and evaluation of AI technologies and systems. It is
expected to be released in the winter of 2022-23.
21
model-index:
- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-m
results:
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: val
type: val
metrics:
- type: pearson_cosine
value: 0.6585006489314952
name: Pearson Cosine
- type: spearman_cosine
value: 0.7
name: Spearman Cosine
- type: pearson_manhattan
value: 0.582665729755017
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.6
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.6722783219807118
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.7
name: Spearman Euclidean
- type: pearson_dot
value: 0.6585002582595083
name: Pearson Dot
- type: spearman_dot
value: 0.7
name: Spearman Dot
- type: pearson_max
value: 0.6722783219807118
name: Pearson Max
- type: spearman_max
value: 0.7
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: test
type: test
metrics:
- type: pearson_cosine
value: 0.7463407966146629
name: Pearson Cosine
- type: spearman_cosine
value: 0.7999999999999999
name: Spearman Cosine
- type: pearson_manhattan
value: 0.7475379067038609
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.7999999999999999
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.7592380598802199
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.7999999999999999
name: Spearman Euclidean
- type: pearson_dot
value: 0.7463412670178408
name: Pearson Dot
- type: spearman_dot
value: 0.7999999999999999
name: Spearman Dot
- type: pearson_max
value: 0.7592380598802199
name: Pearson Max
- type: spearman_max
value: 0.7999999999999999
name: Spearman Max
SentenceTransformer based on Snowflake/snowflake-arctic-embed-m
This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Snowflake/snowflake-arctic-embed-m
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("gmedrano/snowflake-arctic-embed-m-finetuned")
# Run inference
sentences = [
'How does the AI Bill of Rights protect individual privacy?',
'principles for managing information about individuals have been incorporated into data privacy laws and \npolicies across the globe.5 The Blueprint for an AI Bill of Rights embraces elements of the FIPPs that are \nparticularly relevant to automated systems, without articulating a specific set of FIPPs or scoping \napplicability or the interests served to a single particular domain, like privacy, civil rights and civil liberties,',
'harmful \nuses. \nThe \nNIST \nframework \nwill \nconsider \nand \nencompass \nprinciples \nsuch \nas \ntransparency, accountability, and fairness during pre-design, design and development, deployment, use, \nand testing and evaluation of AI technologies and systems. It is expected to be released in the winter of 2022-23. \n21',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Semantic Similarity
- Dataset:
val
- Evaluated with
EmbeddingSimilarityEvaluator
Metric | Value |
---|---|
pearson_cosine | 0.6585 |
spearman_cosine | 0.7 |
pearson_manhattan | 0.5827 |
spearman_manhattan | 0.6 |
pearson_euclidean | 0.6723 |
spearman_euclidean | 0.7 |
pearson_dot | 0.6585 |
spearman_dot | 0.7 |
pearson_max | 0.6723 |
spearman_max | 0.7 |
Semantic Similarity
- Dataset:
test
- Evaluated with
EmbeddingSimilarityEvaluator
Metric | Value |
---|---|
pearson_cosine | 0.7463 |
spearman_cosine | 0.8 |
pearson_manhattan | 0.7475 |
spearman_manhattan | 0.8 |
pearson_euclidean | 0.7592 |
spearman_euclidean | 0.8 |
pearson_dot | 0.7463 |
spearman_dot | 0.8 |
pearson_max | 0.7592 |
spearman_max | 0.8 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 40 training samples
- Columns:
sentence_0
,sentence_1
, andlabel
- Approximate statistics based on the first 40 samples:
sentence_0 sentence_1 label type string string float details - min: 12 tokens
- mean: 14.43 tokens
- max: 18 tokens
- min: 41 tokens
- mean: 80.55 tokens
- max: 117 tokens
- min: 0.53
- mean: 0.61
- max: 0.76
- Samples:
sentence_0 sentence_1 label What should business leaders understand about AI risk management?
57
National Institute of Standards and Technology (2023) AI Risk Management Framework, Appendix B:
How AI Risks Differ from Traditional Software Risks.
https://airc.nist.gov/AI_RMF_Knowledge_Base/AI_RMF/Appendices/Appendix_B
National Institute of Standards and Technology (2023) AI RMF Playbook.
https://airc.nist.gov/AI_RMF_Knowledge_Base/Playbook
National Institue of Standards and Technology (2023) Framing Risk0.5692041097520776
What kind of data protection measures are required under current AI regulations?
GOVERN 1.1: Legal and regulatory requirements involving AI are understood, managed, and documented.
Action ID
Suggested Action
GAI Risks
GV-1.1-001 Align GAI development and use with applicable laws and regulations, including
those related to data privacy, copyright and intellectual property law.
Data Privacy; Harmful Bias and
Homogenization; Intellectual
Property
AI Actor Tasks: Governance and Oversight0.5830958798587019
What are the implications of AI in decision-making processes?
state of the science of AI measurement and safety today. This document focuses on risks for which there
is an existing empirical evidence base at the time this profile was written; for example, speculative risks
that may potentially arise in more advanced, future GAI systems are not considered. Future updates may
incorporate additional risks or provide further details on the risks identified below.0.5317174553776045
- Loss:
CosineSimilarityLoss
with these parameters:{ "loss_fct": "torch.nn.modules.loss.MSELoss" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 16per_device_eval_batch_size
: 16multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 3max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseeval_use_gather_object
: Falsebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | test_spearman_max | val_spearman_max |
---|---|---|---|
1.0 | 3 | - | 0.6 |
2.0 | 6 | - | 0.7 |
3.0 | 9 | 0.8000 | 0.7 |
Framework Versions
- Python: 3.11.9
- Sentence Transformers: 3.1.1
- Transformers: 4.44.2
- PyTorch: 2.2.2
- Accelerate: 0.34.2
- Datasets: 3.0.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}