Edit model card

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2 on the csv dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • csv

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Gurveer05/all-MiniLM-eedi-2024")
# Run inference
sentences = [
    'Construct:  Solve coordinate geometry questions involving ratio.\n\nQuestion:  A straight line on squared paper. Points P, Q and R lie on this line. The leftmost end of the line is labelled P. If you travel right 4 squares and up 1 square you get to point Q. If you then travel 8 squares right and 2 squares up from Q you reach point R. What is the ratio of  P Q: P R  ?\n\nOptions:\nA. 1: 12\nB. 1: 4\nC. 1: 2\nD. 1: 3\n\nCorrect Answer: 1: 3\n\nIncorrect Answer: 1: 2\n\nPredicted Misconception: Misunderstanding the ratio calculation by not considering the correct horizontal and vertical distances between points P, Q, and R.',
    'May have estimated when using ratios with geometry',
    'Thinks x = y is an axis',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

csv

  • Dataset: csv
  • Size: 12,210 training samples
  • Columns: qa_pair_text, MisconceptionName, and negative
  • Approximate statistics based on the first 1000 samples:
    qa_pair_text MisconceptionName negative
    type string string string
    details
    • min: 54 tokens
    • mean: 121.45 tokens
    • max: 256 tokens
    • min: 4 tokens
    • mean: 15.16 tokens
    • max: 39 tokens
    • min: 7 tokens
    • mean: 14.49 tokens
    • max: 40 tokens
  • Samples:
    qa_pair_text MisconceptionName negative
    Construct: Construct frequency tables.

    Question: Dave has recorded the number of pets his classmates have in the frequency table on the right.
    Number of pets
    Frequency
    0
    4
    1
    Construct: Convert between any other time periods.

    Question: To work out how many hours in a year you could do...

    Options:
    A. 365 x 7
    B. 365 x 60
    C. 365 x 12
    D. 365 x 24

    Correct Answer: 365 x 24

    Incorrect Answer: 365 x 60

    Predicted Misconception: Multiplying days by hours per minute instead of hours per day.
    Answers as if there are 60 hours in a day Confuses an equation with an expression
    Construct: Given information about one part, work out other parts.

    Question: Jess and Heena share some sweets in the ratio 3;: 5 .
    Jess gets 15 sweets.
    How many sweets does Heena get?

    Options:
    A. 17
    B. 9
    C. 5
    D. 25

    Correct Answer: 25

    Incorrect Answer: 17

    Predicted Misconception: Misunderstanding the direct proportionality between the ratio and actual quantities.
    Thinks a difference of one part in a ratio means the quantities will differ by one unit Believes dividing two positives will give a negative answer
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

csv

  • Dataset: csv
  • Size: 9,640 evaluation samples
  • Columns: qa_pair_text, MisconceptionName, and negative
  • Approximate statistics based on the first 1000 samples:
    qa_pair_text MisconceptionName negative
    type string string string
    details
    • min: 56 tokens
    • mean: 119.35 tokens
    • max: 256 tokens
    • min: 6 tokens
    • mean: 14.51 tokens
    • max: 39 tokens
    • min: 6 tokens
    • mean: 13.86 tokens
    • max: 40 tokens
  • Samples:
    qa_pair_text MisconceptionName negative
    Construct: Identify when rounding a calculation will give an over or under approximation.

    Question: Tom and Katie are discussing how to estimate the answer to
    [
    38.8745 / 7.9302
    ]

    Tom says 40 / 7.9302 would give an overestimate.

    Katie says 38.8745 / 8 would give an overestimate.

    Who is correct?

    Options:
    A. Only Tom
    B. Only Katie
    C. Both Tom and Katie
    D. Neither is correct

    Correct Answer: Only Tom

    Incorrect Answer: Neither is correct

    Predicted Misconception: Rounding both numbers up leads to an overestimate.
    Believes that the larger the dividend, the smaller the answer. Does not know how to calculate the mean
    Construct: Substitute negative integer values into expressions involving no powers or roots.

    Question: Amy is trying to work out the distance between these two points: (1,-6) and (-5,2) She labels them like this: x_1
    y_1 x_2
    Construct: Round numbers to three or more decimal places.

    Question: What is 20.15349 rounded to 3 decimal places?

    Options:
    A. 20.153
    B. 20.15
    C. 20.154
    D. 20.253

    Correct Answer: 20.153

    Incorrect Answer: 20.154

    Predicted Misconception: Rounding up the fourth decimal place without considering the fifth decimal place.
    Rounds up instead of down When dividing decimals, does not realize that the order and position of the digits (relative to each other) has to remain constant.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • gradient_accumulation_steps: 8
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • num_train_epochs: 40
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {'num_cycles': 20}
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: {'use_reentrant': False}
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 8
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 40
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {'num_cycles': 20}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: {'use_reentrant': False}
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss
0.5026 12 2.2789 -
1.0052 24 2.1642 1.9746
1.4974 36 2.0463 -
2.0 48 1.8955 1.6808
2.4921 60 1.7692 -
2.9948 72 1.6528 1.4532
3.4869 84 1.5298 -
3.9895 96 1.4338 1.2853
4.4817 108 1.3374 -
4.9843 120 1.3084 1.2465
5.4764 132 1.2921 -
5.9791 144 1.2143 1.1766
6.4712 156 1.1689 -
6.9738 168 1.1656 1.1518
7.4660 180 1.1172 -
7.9686 192 1.0737 1.1080
8.4607 204 1.0373 -
8.9634 216 1.0445 1.0874
9.4555 228 0.9707 -
9.9581 240 0.9644 1.0649
10.4503 252 0.9252 -
10.9529 264 0.9211 1.0367
11.4450 276 0.8645 -
11.9476 288 0.8635 1.0297
12.4398 300 0.8279 -
12.9424 312 0.819 1.0161
13.4346 324 0.7684 -
13.9372 336 0.7842 1.0016
14.4293 348 0.7448 -
14.9319 360 0.7321 0.9951
15.4241 372 0.7064 -
15.9267 384 0.7161 0.9835
16.4188 396 0.6692 -
16.9215 408 0.6594 0.9774
17.4136 420 0.6405 -
17.9162 432 0.638 0.9723
18.4084 444 0.6 -
18.9110 456 0.6122 0.9706
19.4031 468 0.5763 -
19.9058 480 0.5787 0.9732
20.3979 492 0.5432 -
20.9005 504 0.5599 0.9618
21.3927 516 0.5245 -
21.8953 528 0.5278 0.9626
22.3874 540 0.4989 -
22.8901 552 0.509 0.9583
23.3822 564 0.4674 -
23.8848 576 0.4854 0.9573
24.3770 588 0.4619 -
24.8796 600 0.4631 0.9615
25.3717 612 0.4339 -
25.8743 624 0.4427 0.9593
26.3665 636 0.4225 -
26.8691 648 0.4245 0.9694
27.3613 660 0.3936 -
27.8639 672 0.4168 0.9586
28.3560 684 0.3835 -
28.8586 696 0.3921 0.9629
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.0
  • PyTorch: 2.4.0
  • Accelerate: 0.33.0
  • Datasets: 2.19.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
4
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Gurveer05/all-MiniLM-eedi-2024

Finetuned
(133)
this model