DashReza7's picture
Add new SentenceTransformer model.
df0a677 verified
|
raw
history blame
No virus
22 kB
metadata
base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
datasets:
  - sentence-transformers/quora-duplicates
language:
  - en
library_name: sentence-transformers
metrics:
  - cosine_accuracy
  - cosine_accuracy_threshold
  - cosine_f1
  - cosine_f1_threshold
  - cosine_precision
  - cosine_recall
  - cosine_ap
  - dot_accuracy
  - dot_accuracy_threshold
  - dot_f1
  - dot_f1_threshold
  - dot_precision
  - dot_recall
  - dot_ap
  - manhattan_accuracy
  - manhattan_accuracy_threshold
  - manhattan_f1
  - manhattan_f1_threshold
  - manhattan_precision
  - manhattan_recall
  - manhattan_ap
  - euclidean_accuracy
  - euclidean_accuracy_threshold
  - euclidean_f1
  - euclidean_f1_threshold
  - euclidean_precision
  - euclidean_recall
  - euclidean_ap
  - max_accuracy
  - max_accuracy_threshold
  - max_f1
  - max_f1_threshold
  - max_precision
  - max_recall
  - max_ap
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:323432
  - loss:OnlineContrastiveLoss
widget:
  - source_sentence: >-
      How do I have a successful career in animation industry with all distance
      mode of education (from schooling)?
    sentences:
      - >-
        The LINE app is blocked in China. I bought a VPN, but it's still not
        working. Can someone help me?
      - What is independent?
      - How do I find all distance education schools in any city?
  - source_sentence: How can I get the funding for my startup without revealing my idea?
    sentences:
      - How has demonetization affected big business people like Mukesh Ambani?
      - How should I go about getting funding for my idea?
      - What are the advantages and disadvantages of studying an MBBS in China?
  - source_sentence: >-
      I am an okay looking young women but I am always feeling ugly since I'm
      not extremely beautiful. How can I stop those thoughts?
    sentences:
      - >-
        Whenever I think about my failures in life, I always feel that I lack
        some qualities. But which are those qualities, I am not able to find
        out. How can I find which qualities I lack?
      - What songs make you cry?
      - What does histrionic personality disorder feel like physically to you?
  - source_sentence: >-
      What do you think of Prime Minister Narendra Modi's decision to introduce
      new INR 500 and INR 2000 currency notes?
    sentences:
      - >-
        What do you think of the decision by the Indian Government to replace
        1000 notes with 2000 notes?
      - How do you find volume from density and mass?
      - What are the consequences of having a blood sugar level over 300?
  - source_sentence: Why do complementary angles have to be adjacent?
    sentences:
      - What is an AEG airsoft gun?
      - How can I get rid of my bad habits?
      - Can two adjacent angles be complementary?
model-index:
  - name: >-
      SentenceTransformer based on
      sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
    results:
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: cosine_accuracy
            value: 0.8683618194860125
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.7981455326080322
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.8292439905343131
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.7598952651023865
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.7746589487768696
            name: Cosine Precision
          - type: cosine_recall
            value: 0.8921046460992195
            name: Cosine Recall
          - type: cosine_ap
            value: 0.8822291610822541
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.8359964382003018
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 17.112058639526367
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.7914425390403506
            name: Dot F1
          - type: dot_f1_threshold
            value: 16.083341598510742
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.7294350282485875
            name: Dot Precision
          - type: dot_recall
            value: 0.8649716946370549
            name: Dot Recall
          - type: dot_ap
            value: 0.8438654629805356
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.8568230725469341
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 46.94310760498047
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.8144082547946494
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 50.51482391357422
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.7656268427880646
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.8698288279234918
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.8636170591577621
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.8568849093472507
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 3.0017127990722656
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.8143016129285076
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 3.2429399490356445
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.7652309686542541
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.8700968076910194
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.8637642883474006
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.8683618194860125
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 46.94310760498047
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.8292439905343131
            name: Max F1
          - type: max_f1_threshold
            value: 50.51482391357422
            name: Max F1 Threshold
          - type: max_precision
            value: 0.7746589487768696
            name: Max Precision
          - type: max_recall
            value: 0.8921046460992195
            name: Max Recall
          - type: max_ap
            value: 0.8822291610822541
            name: Max Ap

SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 on the sentence-transformers/quora-duplicates dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("DashReza7/paraphrase-multilingual-MiniLM-L12-v2_QuoraDuplicateDetection_FINETUNED")
# Run inference
sentences = [
    'Why do complementary angles have to be adjacent?',
    'Can two adjacent angles be complementary?',
    'How can I get rid of my bad habits?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.8684
cosine_accuracy_threshold 0.7981
cosine_f1 0.8292
cosine_f1_threshold 0.7599
cosine_precision 0.7747
cosine_recall 0.8921
cosine_ap 0.8822
dot_accuracy 0.836
dot_accuracy_threshold 17.1121
dot_f1 0.7914
dot_f1_threshold 16.0833
dot_precision 0.7294
dot_recall 0.865
dot_ap 0.8439
manhattan_accuracy 0.8568
manhattan_accuracy_threshold 46.9431
manhattan_f1 0.8144
manhattan_f1_threshold 50.5148
manhattan_precision 0.7656
manhattan_recall 0.8698
manhattan_ap 0.8636
euclidean_accuracy 0.8569
euclidean_accuracy_threshold 3.0017
euclidean_f1 0.8143
euclidean_f1_threshold 3.2429
euclidean_precision 0.7652
euclidean_recall 0.8701
euclidean_ap 0.8638
max_accuracy 0.8684
max_accuracy_threshold 46.9431
max_f1 0.8292
max_f1_threshold 50.5148
max_precision 0.7747
max_recall 0.8921
max_ap 0.8822

Training Details

Training Dataset

sentence-transformers/quora-duplicates

  • Dataset: sentence-transformers/quora-duplicates at 451a485
  • Size: 323,432 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 6 tokens
    • mean: 16.39 tokens
    • max: 80 tokens
    • min: 4 tokens
    • mean: 16.2 tokens
    • max: 71 tokens
    • 0: ~62.10%
    • 1: ~37.90%
  • Samples:
    sentence1 sentence2 label
    Which are the best compilers for C language (for Windows 10)? Which is the best open source C/C++ compiler for Windows? 0
    How much does YouTube pay per 1000 views in India? How much does youtube pay per 1000 views? 0
    What parts do I need to build my own PC? I want to build a new computer. What parts do I need? 1
  • Loss: OnlineContrastiveLoss

Evaluation Dataset

sentence-transformers/quora-duplicates

  • Dataset: sentence-transformers/quora-duplicates at 451a485
  • Size: 80,858 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 6 tokens
    • mean: 16.48 tokens
    • max: 79 tokens
    • min: 6 tokens
    • mean: 16.76 tokens
    • max: 101 tokens
    • 0: ~63.90%
    • 1: ~36.10%
  • Samples:
    sentence1 sentence2 label
    How many stories got busted on Quora while being anonymous? Can what I say on Quora anonymously be used against me legally? 0
    What are innovative mechanical component designs? What is the Innovation design? 0
    What is the best way to learn phrasal verbs? Why should I learn phrasal verbs? 1
  • Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss max_ap
0.0791 100 - 8.0607 0.8164
0.1582 200 - 7.3012 0.8445
0.2373 300 - 6.9626 0.8582
0.3165 400 - 6.7901 0.8639
0.3956 500 7.5229 6.6498 0.8694
0.4747 600 - 6.5315 0.8736
0.5538 700 - 6.4686 0.8766
0.6329 800 - 6.4027 0.8787
0.7120 900 - 6.3108 0.8797
0.7911 1000 6.4636 6.2862 0.8812
0.8703 1100 - 6.2449 0.8818
0.9494 1200 - 6.2344 0.8822

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.4
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}