Edit model card

SentenceTransformer based on intfloat/multilingual-e5-small

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-small. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/multilingual-e5-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("srikarvar/fine_tuned_model_11")
# Run inference
sentences = [
    'What is the time now?',
    'Current time',
    'Guide to starting a small business',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.9213
cosine_accuracy_threshold 0.8385
cosine_f1 0.9404
cosine_f1_threshold 0.8385
cosine_precision 0.9371
cosine_recall 0.9437
cosine_ap 0.9872
dot_accuracy 0.9213
dot_accuracy_threshold 0.8385
dot_f1 0.9404
dot_f1_threshold 0.8385
dot_precision 0.9371
dot_recall 0.9437
dot_ap 0.9872
manhattan_accuracy 0.9167
manhattan_accuracy_threshold 8.6584
manhattan_f1 0.9392
manhattan_f1_threshold 9.5941
manhattan_precision 0.9026
manhattan_recall 0.9789
manhattan_ap 0.9872
euclidean_accuracy 0.9213
euclidean_accuracy_threshold 0.5683
euclidean_f1 0.9404
euclidean_f1_threshold 0.5683
euclidean_precision 0.9371
euclidean_recall 0.9437
euclidean_ap 0.9872
max_accuracy 0.9213
max_accuracy_threshold 8.6584
max_f1 0.9404
max_f1_threshold 9.5941
max_precision 0.9371
max_recall 0.9789
max_ap 0.9872

Binary Classification

Metric Value
cosine_accuracy 0.9306
cosine_accuracy_threshold 0.857
cosine_f1 0.9485
cosine_f1_threshold 0.8532
cosine_precision 0.9262
cosine_recall 0.9718
cosine_ap 0.9898
dot_accuracy 0.9306
dot_accuracy_threshold 0.857
dot_f1 0.9485
dot_f1_threshold 0.8532
dot_precision 0.9262
dot_recall 0.9718
dot_ap 0.9898
manhattan_accuracy 0.9352
manhattan_accuracy_threshold 8.2998
manhattan_f1 0.9517
manhattan_f1_threshold 8.2998
manhattan_precision 0.9324
manhattan_recall 0.9718
manhattan_ap 0.9895
euclidean_accuracy 0.9306
euclidean_accuracy_threshold 0.5348
euclidean_f1 0.9485
euclidean_f1_threshold 0.5419
euclidean_precision 0.9262
euclidean_recall 0.9718
euclidean_ap 0.9898
max_accuracy 0.9352
max_accuracy_threshold 8.2998
max_f1 0.9517
max_f1_threshold 8.2998
max_precision 0.9324
max_recall 0.9718
max_ap 0.9898

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,936 training samples
  • Columns: label, sentence1, and sentence2
  • Approximate statistics based on the first 1000 samples:
    label sentence1 sentence2
    type int string string
    details
    • 0: ~35.30%
    • 1: ~64.70%
    • min: 6 tokens
    • mean: 16.19 tokens
    • max: 98 tokens
    • min: 4 tokens
    • mean: 15.75 tokens
    • max: 98 tokens
  • Samples:
    label sentence1 sentence2
    1 How do I apply for a credit card? How do I get a credit card?
    1 What is the function of a learning rate scheduler? How does a learning rate scheduler optimize training?
    0 What is the speed of a rocket? What is the speed of a jet plane?
  • Loss: OnlineContrastiveLoss

Evaluation Dataset

Unnamed Dataset

  • Size: 216 evaluation samples
  • Columns: label, sentence1, and sentence2
  • Approximate statistics based on the first 216 samples:
    label sentence1 sentence2
    type int string string
    details
    • 0: ~34.26%
    • 1: ~65.74%
    • min: 6 tokens
    • mean: 15.87 tokens
    • max: 87 tokens
    • min: 4 tokens
    • mean: 15.61 tokens
    • max: 86 tokens
  • Samples:
    label sentence1 sentence2
    0 What is the freezing point of ethanol? What is the boiling point of ethanol?
    0 Healthy habits Unhealthy habits
    0 What is the difference between omnivores and herbivores? What is the difference between omnivores, carnivores, and herbivores?
  • Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • gradient_accumulation_steps: 2
  • num_train_epochs: 4
  • warmup_ratio: 0.1
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss pair-class-dev_max_ap pair-class-test_max_ap
0 0 - - 0.8705 -
0.3279 10 1.3831 - - -
0.6557 20 0.749 - - -
0.9836 30 0.5578 0.2991 0.9862 -
1.3115 40 0.3577 - - -
1.6393 50 0.2594 - - -
1.9672 60 0.2119 - - -
2.0 61 - 0.2753 0.9898 -
2.2951 70 0.17 - - -
2.6230 80 0.1126 - - -
2.9508 90 0.0538 - - -
2.9836 91 - 0.3222 0.9864 -
3.2787 100 0.1423 - - -
3.6066 110 0.066 - - -
3.9344 120 0.0486 0.3237 0.9872 0.9898
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.0
  • Transformers: 4.41.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.34.2
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
20
Safetensors
Model size
118M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for srikarvar/fine_tuned_model_11

Finetuned
(51)
this model

Evaluation results