Edit model card

SentenceTransformer based on sentence-transformers/paraphrase-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-mpnet-base-v2 on the csv dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Gurveer05/paraphrase-mpnet-eedi-2024")
# Run inference
sentences = [
    'Question:\nUnderstand key loci terms like equidistant and perpendicular. A set of axes: x-axis from -4 to 4, y-axis from -4 to 4. A red line is drawn from (-2,2) to (2,2). The red line is _________ to the y axis..\nAnswer: Equidistant',
    'Does not know the meaning of perpendicular',
    'Believes squaring a negative number just changes the sign',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

csv

  • Dataset: csv
  • Size: 2,940 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 20 tokens
    • mean: 61.48 tokens
    • max: 312 tokens
    • min: 4 tokens
    • mean: 14.98 tokens
    • max: 38 tokens
  • Samples:
    sentence1 sentence2
    Question:
    Find missing angles in a scalene triangle. What is the size of angle p ? A triangle with angles labelled 49 degrees, 51 degrees and p [not to scale].
    Answer: Not enough information
    Does not know that angles in a triangle sum to 180 degrees
    Question:
    Solve quadratic equations using balancing. A student wishes to solve the equation below.

    Which of the following is a correct next step?
    (
    (d+3)^2-25=0
    )

    Step 1
    (
    (d+3)^2=25
    ).
    Answer: d+3=12.5
    Believes the inverse of square rooting is halving
    Question:
    Find missing angles using angles around a point. What is the size of angle x ? Angles around a point split into two parts, one is labelled 290 degrees and the other x.
    Answer: 45°
    Does not know that angles around a point sum to 360
  • Loss: MultipleNegativesSymmetricRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • num_train_epochs: 20
  • fp16: True
  • load_best_model_at_end: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 20
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
0.25 23 1.4213
0.5 46 1.0924
0.75 69 0.9586
1.0 92 0.8914
1.25 115 0.579
1.5 138 0.5327
1.75 161 0.4746
2.0 184 0.4323
2.25 207 0.3155
2.5 230 0.2763
2.75 253 0.2408
3.0 276 0.2677
3.25 299 0.1763
3.5 322 0.1815
3.75 345 0.1536
4.0 368 0.1789
4.25 391 0.1331
4.5 414 0.119
4.75 437 0.1183
5.0 460 0.1423
5.25 483 0.0979
5.5 506 0.0894
5.75 529 0.0816
6.0 552 0.0853
6.25 575 0.0779
6.5 598 0.0632
6.75 621 0.0618
7.0 644 0.0798
7.25 667 0.0536
7.5 690 0.0615
7.75 713 0.0473
8.0 736 0.0536
8.25 759 0.0392
8.5 782 0.0551
8.75 805 0.0405
9.0 828 0.0519
9.25 851 0.0299
9.5 874 0.0355
9.75 897 0.0337
10.0 920 0.0324
10.25 943 0.0283
10.5 966 0.0293
10.75 989 0.0248
11.0 1012 0.0281
11.25 1035 0.0142
11.5 1058 0.022
11.75 1081 0.0159
12.0 1104 0.0188
12.25 1127 0.0078
12.5 1150 0.0142
12.75 1173 0.0148
13.0 1196 0.0126
13.25 1219 0.0077
13.5 1242 0.0115
13.75 1265 0.0119
14.0 1288 0.0086
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.2
  • PyTorch: 2.4.1+cu121
  • Accelerate: 0.34.2
  • Datasets: 2.19.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
2
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Gurveer05/paraphrase-mpnet-eedi-2024

Finetuned
(241)
this model