Edit model card

BGE SITGES CAT

This is a sentence-transformers model finetuned from projecte-aina/ST-NLI-ca_paraphrase-multilingual-mpnet-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("adriansanz/SITGES-aina4")
# Run inference
sentences = [
    "Mitjançant aquest tràmit la persona interessada posa en coneixement de l'Ajuntament de Sitges l'inici d'un espectacle públic o activitat recreativa de caràcter extraordinari...",
    'Quin és el paper de la persona interessada en la llicència per a espectacles públics o activitats recreatives de caràcter extraordinari?',
    "Quin és el paper del Registre de Sol·licitants d'Habitatge amb Protecció Oficial en la gestió d'habitatges?",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.0733
cosine_accuracy@3 0.1573
cosine_accuracy@5 0.2177
cosine_accuracy@10 0.3944
cosine_precision@1 0.0733
cosine_precision@3 0.0524
cosine_precision@5 0.0435
cosine_precision@10 0.0394
cosine_recall@1 0.0733
cosine_recall@3 0.1573
cosine_recall@5 0.2177
cosine_recall@10 0.3944
cosine_ndcg@10 0.2013
cosine_mrr@10 0.1439
cosine_map@100 0.171

Information Retrieval

Metric Value
cosine_accuracy@1 0.0733
cosine_accuracy@3 0.1509
cosine_accuracy@5 0.2177
cosine_accuracy@10 0.3944
cosine_precision@1 0.0733
cosine_precision@3 0.0503
cosine_precision@5 0.0435
cosine_precision@10 0.0394
cosine_recall@1 0.0733
cosine_recall@3 0.1509
cosine_recall@5 0.2177
cosine_recall@10 0.3944
cosine_ndcg@10 0.2016
cosine_mrr@10 0.1444
cosine_map@100 0.1716

Information Retrieval

Metric Value
cosine_accuracy@1 0.0733
cosine_accuracy@3 0.1487
cosine_accuracy@5 0.2112
cosine_accuracy@10 0.4009
cosine_precision@1 0.0733
cosine_precision@3 0.0496
cosine_precision@5 0.0422
cosine_precision@10 0.0401
cosine_recall@1 0.0733
cosine_recall@3 0.1487
cosine_recall@5 0.2112
cosine_recall@10 0.4009
cosine_ndcg@10 0.2021
cosine_mrr@10 0.1434
cosine_map@100 0.1697

Information Retrieval

Metric Value
cosine_accuracy@1 0.069
cosine_accuracy@3 0.1466
cosine_accuracy@5 0.2177
cosine_accuracy@10 0.3815
cosine_precision@1 0.069
cosine_precision@3 0.0489
cosine_precision@5 0.0435
cosine_precision@10 0.0381
cosine_recall@1 0.069
cosine_recall@3 0.1466
cosine_recall@5 0.2177
cosine_recall@10 0.3815
cosine_ndcg@10 0.1954
cosine_mrr@10 0.1398
cosine_map@100 0.166

Information Retrieval

Metric Value
cosine_accuracy@1 0.056
cosine_accuracy@3 0.1379
cosine_accuracy@5 0.194
cosine_accuracy@10 0.3685
cosine_precision@1 0.056
cosine_precision@3 0.046
cosine_precision@5 0.0388
cosine_precision@10 0.0369
cosine_recall@1 0.056
cosine_recall@3 0.1379
cosine_recall@5 0.194
cosine_recall@10 0.3685
cosine_ndcg@10 0.1823
cosine_mrr@10 0.1269
cosine_map@100 0.1543

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 6
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: False
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 6
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: False
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.3065 5 3.3947 - - - - - -
0.6130 10 2.6401 - - - - - -
0.9195 15 2.0152 - - - - - -
0.9808 16 - 1.3404 0.1639 0.1577 0.1694 0.1503 0.1638
1.2261 20 1.4542 - - - - - -
1.5326 25 1.0135 - - - - - -
1.8391 30 0.8437 - - - - - -
1.9617 32 - 0.9436 0.1556 0.1596 0.1600 0.1467 0.1701
2.1456 35 0.7676 - - - - - -
2.4521 40 0.5126 - - - - - -
2.7586 45 0.4358 - - - - - -
2.9425 48 - 0.7852 0.1650 0.1693 0.1720 0.1511 0.1686
3.0651 50 0.4192 - - - - - -
3.3716 55 0.3429 - - - - - -
3.6782 60 0.3025 - - - - - -
3.9847 65 0.2863 0.7401 0.1646 0.1706 0.1759 0.1480 0.1694
4.2912 70 0.2474 - - - - - -
4.5977 75 0.2324 - - - - - -
4.9042 80 0.2344 - - - - - -
4.9655 81 - 0.7217 0.1663 0.1699 0.1767 0.1512 0.1696
5.2107 85 0.2181 - - - - - -
5.5172 90 0.2116 - - - - - -
5.8238 95 0.1926 - - - - - -
5.8851 96 - 0.7154 0.166 0.1697 0.1716 0.1543 0.171
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.3
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
0
Safetensors
Model size
278M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for adriansanz/SITGES-aina4_v2

Evaluation results