pierreinalco's picture
Add new SentenceTransformer model.
5c3e1e8 verified
metadata
language:
  - en
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dataset_size:1K<n<10K
  - loss:CosineSimilarityLoss
base_model: distilbert/distilbert-base-uncased
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
widget:
  - source_sentence: A man shoots a man.
    sentences:
      - The target was being shot with bullets.
      - Two women compete in a contest.
      - Kittens are eating from dishes.
  - source_sentence: A man is spitting.
    sentences:
      - A man is crying.
      - The cougar is chasing the bear.
      - A slow loris hanging on a cord.
  - source_sentence: A man jumping rope
    sentences:
      - The man without a shirt is jumping.
      - Suicide bomber strikes in Syria
      - Two women sitting in lawn chairs.
  - source_sentence: A woman is reading.
    sentences:
      - The woman is pencilling on eye shadow.
      - Bombings kill 19 people in Iraq
      - A man with his dog on the beach.
  - source_sentence: A cat is on a robot.
    sentences:
      - A cat is pouncing on a trampoline.
      - Two men are standing in a room.
      - The two men are wearing jeans.
pipeline_tag: sentence-similarity
model-index:
  - name: SentenceTransformer based on distilbert/distilbert-base-uncased
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev
          type: sts-dev
        metrics:
          - type: pearson_cosine
            value: 0.8697331501677178
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8685180246535534
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8437823469562609
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8453821992823211
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8437006142247849
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8452387041309848
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.8131236162716029
            name: Pearson Dot
          - type: spearman_dot
            value: 0.8170086888260258
            name: Spearman Dot
          - type: pearson_max
            value: 0.8697331501677178
            name: Pearson Max
          - type: spearman_max
            value: 0.8685180246535534
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.8385072210953088
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8381420978910276
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8314551294633353
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8311067092857745
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8321704746684249
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.831638857612135
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.7504803996099798
            name: Pearson Dot
          - type: spearman_dot
            value: 0.7471600293342772
            name: Spearman Dot
          - type: pearson_max
            value: 0.8385072210953088
            name: Pearson Max
          - type: spearman_max
            value: 0.8381420978910276
            name: Spearman Max

SentenceTransformer based on distilbert/distilbert-base-uncased

This is a sentence-transformers model finetuned from distilbert/distilbert-base-uncased on the sentence-transformers/stsb dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("pierreinalco/distilbert-base-uncased-sts")
# Run inference
sentences = [
    'A cat is on a robot.',
    'A cat is pouncing on a trampoline.',
    'Two men are standing in a room.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.8697
spearman_cosine 0.8685
pearson_manhattan 0.8438
spearman_manhattan 0.8454
pearson_euclidean 0.8437
spearman_euclidean 0.8452
pearson_dot 0.8131
spearman_dot 0.817
pearson_max 0.8697
spearman_max 0.8685

Semantic Similarity

Metric Value
pearson_cosine 0.8385
spearman_cosine 0.8381
pearson_manhattan 0.8315
spearman_manhattan 0.8311
pearson_euclidean 0.8322
spearman_euclidean 0.8316
pearson_dot 0.7505
spearman_dot 0.7472
pearson_max 0.8385
spearman_max 0.8381

Training Details

Training Dataset

sentence-transformers/stsb

  • Dataset: sentence-transformers/stsb at ab7a5ac
  • Size: 5,749 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 10.0 tokens
    • max: 28 tokens
    • min: 5 tokens
    • mean: 9.95 tokens
    • max: 25 tokens
    • min: 0.0
    • mean: 0.54
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A plane is taking off. An air plane is taking off. 1.0
    A man is playing a large flute. A man is playing a flute. 0.76
    A man is spreading shreded cheese on a pizza. A man is spreading shredded cheese on an uncooked pizza. 0.76
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

sentence-transformers/stsb

  • Dataset: sentence-transformers/stsb at ab7a5ac
  • Size: 1,500 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 5 tokens
    • mean: 15.1 tokens
    • max: 45 tokens
    • min: 6 tokens
    • mean: 15.11 tokens
    • max: 53 tokens
    • min: 0.0
    • mean: 0.47
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A man with a hard hat is dancing. A man wearing a hard hat is dancing. 1.0
    A young child is riding a horse. A child is riding a horse. 0.95
    A man is feeding a mouse to a snake. The man is feeding a mouse to the snake. 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 4
  • warmup_ratio: 0.1

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss sts-dev_spearman_cosine sts-test_spearman_cosine
0.2778 100 0.0836 0.0399 0.8029 -
0.5556 200 0.0334 0.0301 0.8481 -
0.8333 300 0.0282 0.0262 0.8578 -
1.1111 400 0.0204 0.0273 0.8593 -
1.3889 500 0.0138 0.0281 0.8589 -
1.6667 600 0.0118 0.0276 0.8566 -
1.9444 700 0.0128 0.0263 0.8614 -
2.2222 800 0.0077 0.0259 0.8685 -
2.5 900 0.0057 0.0254 0.8661 -
2.7778 1000 0.0059 0.0261 0.8677 -
3.0556 1100 0.0054 0.0258 0.8682 -
3.3333 1200 0.0039 0.0261 0.8668 -
3.6111 1300 0.0039 0.0261 0.8678 -
3.8889 1400 0.0037 0.0259 0.8685 -
4.0 1440 - - - 0.8381

Framework Versions

  • Python: 3.11.9
  • Sentence Transformers: 3.0.0
  • Transformers: 4.41.1
  • PyTorch: 2.3.0
  • Accelerate: 0.30.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}