Add new SentenceTransformer model.

aaf5adb verified about 2 months ago

24 kB

	---
	base_model: intfloat/multilingual-e5-small
	language:
	- multilingual
	library_name: sentence-transformers
	license: apache-2.0
	metrics:
	- cosine_accuracy@1
	- cosine_accuracy@3
	- cosine_accuracy@5
	- cosine_accuracy@10
	- cosine_precision@1
	- cosine_precision@3
	- cosine_precision@5
	- cosine_precision@10
	- cosine_recall@1
	- cosine_recall@3
	- cosine_recall@5
	- cosine_recall@10
	- cosine_ndcg@10
	- cosine_mrr@10
	- cosine_map@100
	pipeline_tag: sentence-similarity
	tags:
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- generated_from_trainer
	- dataset_size:94
	- loss:MatryoshkaLoss
	- loss:MultipleNegativesRankingLoss
	widget:
	- source_sentence: 서울여자대학교 수시모집 지원자에게 필요한 최초합격자 발표 정보는 다음과 같습니다. 최초합격자 발표는 2024년 11월
	8일부터 12월 13일까지입니다. 합격자는 본교 입학처 홈페이지에서 합격 여부를 확인하여야 하며, 등록기간 내에 등록을 마쳐야 합니다.
	sentences:
	- SWU의 SI(Social Innovation)교육에 대해 알려줘.
	- 학교생활기록부 교과성적 반영방법을 설명해 주세요.
	- 서울여자대학교 수시모집 지원자에게 필요한 최초합격자 발표 정보를 알려줘.
	- source_sentence: 고등학교 졸업(예정)자의 경우 학교생활기록부 제출 방법은 다음과 같습니다. 원본 대조필 및 학교장 직인 날인 후
	제출하여야 합니다. 외국 고등학교 졸업(예정)자의 경우는 한국어나 영어로 번역 공증받은 문서를 제출하여야 합니다.
	sentences:
	- 언론영상학부-저널리즘전공의 졸업 후 진로는 무엇입니까?
	- 서울여자대학교에 있는 박물관학전공의 교육 내용을 설명해줘.
	- 고등학교 졸업(예정)자의 경우 학교생활기록부 제출 방법을 설명해줘.
	- source_sentence: 심리·인지과학학부-인지학습과학전공의 졸업 후 진로는 교육프로그램 개발자, 교육기업 데이터 분석 업무, 인지학습 치료사,
	인지행동 치료사, 교육컨설턴트, 국가연구소, 이러닝 관련 산업분야 등입니다.
	sentences:
	- 서울여자대학교에 있는 예술심리치료전공의 목표를 설명해줘.
	- 서울여자대학교 수시모집 지원자에게 필요한 교과성적 산출 방법을 설명해줘.
	- 심리·인지과학학부-인지학습과학전공의 졸업 후 진로를 설명하세요.
	- source_sentence: 2024학년도 서울여자대학교 수시모집 지원자에게 필요한 정보는 다음과 같습니다. 수시모집 지원기간은 2024년 9월
	10일부터 9월 13일까지입니다. 지원자는 인터넷 입학원서접수 사이트에 접속하여 원서접수를 완료해야 하며, 전형료 결제는 신용카드, 계좌이체
	등으로 가능합니다. 또한, 지원자는 제출서류를 등기우편으로 제출하여야 하며, 서류제출 마감일은 2024년 9월 13일입니다.
	sentences:
	- 박물관학전공의 교육 목표는 무엇입니까?
	- 2024학년도 서울여자대학교 수시모집 지원자에게 필요한 정보를 알려줘.
	- 학생부종합 전형으로 지원할 수 있는 전형의 유형을 모두 알려줘
	- source_sentence: 학교생활기록부 교과성적 대체 점수(비교내신) 대상자는 논술(논술우수자전형), 실기/실적(실기우수자전형_체육) 지원자
	중 고등학교 졸업학력 검정고시 출신 지원자 및 교과성적 산출 불가자입니다.
	sentences:
	- 고등학교 학교생활기록부 제출 방법을 설명하세요.
	- 청소년학전공의 교육 내용은 무엇입니까?
	- 학교생활기록부 교과성적 대체 점수(비교내신) 대상자를 알려줘.
	model-index:
	- name: Multilingual base SWU Matryoshka
	results:
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 256
	type: dim_256
	metrics:
	- type: cosine_accuracy@1
	value: 0.6363636363636364
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.9090909090909091
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 1.0
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 1.0
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.6363636363636364
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.30303030303030304
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.2
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.1
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.6363636363636364
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.9090909090909091
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 1.0
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 1.0
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.8475878017079786
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.7954545454545454
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.7954545454545454
	name: Cosine Map@100
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 128
	type: dim_128
	metrics:
	- type: cosine_accuracy@1
	value: 0.6363636363636364
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.9090909090909091
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 1.0
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 1.0
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.6363636363636364
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.30303030303030304
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.2
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.1
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.6363636363636364
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.9090909090909091
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 1.0
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 1.0
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.8475878017079786
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.7954545454545454
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.7954545454545454
	name: Cosine Map@100
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 64
	type: dim_64
	metrics:
	- type: cosine_accuracy@1
	value: 0.6363636363636364
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.9090909090909091
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 1.0
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 1.0
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.6363636363636364
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.30303030303030304
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.2
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.1
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.6363636363636364
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.9090909090909091
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 1.0
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 1.0
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.8356850968378461
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.7803030303030302
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.7803030303030302
	name: Cosine Map@100
	---

	# Multilingual base SWU Matryoshka

	This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) on the json dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

	## Model Details

	### Model Description
	- Model Type: Sentence Transformer
	- Base model: [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) <!-- at revision fd1525a9fd15316a2d503bf26ab031a61d056e98 -->
	- Maximum Sequence Length: 512 tokens
	- Output Dimensionality: 384 tokens
	- Similarity Function: Cosine Similarity
	- Training Dataset:
	- json
	- Language: multilingual
	- License: apache-2.0

	### Model Sources

	- Documentation: [Sentence Transformers Documentation](https://sbert.net)
	- Repository: [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
	- Hugging Face: [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

	### Full Model Architecture

	```
	SentenceTransformer(
	(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
	(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
	(2): Normalize()
	)
	```

	## Usage

	### Direct Usage (Sentence Transformers)

	First install the Sentence Transformers library:

	```bash
	pip install -U sentence-transformers
	```

	Then you can load this model and run inference.
	```python
	from sentence_transformers import SentenceTransformer

	# Download from the 🤗 Hub
	model = SentenceTransformer("ValentinaKim/Multilingual-base-SWU-Matryoshka")
	# Run inference
	sentences = [
	'학교생활기록부 교과성적 대체 점수(비교내신) 대상자는 논술(논술우수자전형), 실기/실적(실기우수자전형_체육) 지원자 중 고등학교 졸업학력 검정고시 출신 지원자 및 교과성적 산출 불가자입니다.',
	'학교생활기록부 교과성적 대체 점수(비교내신) 대상자를 알려줘.',
	'청소년학전공의 교육 내용은 무엇입니까?',
	]
	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# [3, 384]

	# Get the similarity scores for the embeddings
	similarities = model.similarity(embeddings, embeddings)
	print(similarities.shape)
	# [3, 3]
	```

	<!--
	### Direct Usage (Transformers)

	<details><summary>Click to see the direct usage in Transformers</summary>

	</details>
	-->

	<!--
	### Downstream Usage (Sentence Transformers)

	You can finetune this model on your own dataset.

	<details><summary>Click to expand</summary>

	</details>
	-->

	<!--
	### Out-of-Scope Use

	List how the model may foreseeably be misused and address what users ought not to do with the model.
	-->

	## Evaluation

	### Metrics

	#### Information Retrieval
	* Dataset: `dim_256`
	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| cosine_accuracy@1 \| 0.6364 \|
	\| cosine_accuracy@3 \| 0.9091 \|
	\| cosine_accuracy@5 \| 1.0 \|
	\| cosine_accuracy@10 \| 1.0 \|
	\| cosine_precision@1 \| 0.6364 \|
	\| cosine_precision@3 \| 0.303 \|
	\| cosine_precision@5 \| 0.2 \|
	\| cosine_precision@10 \| 0.1 \|
	\| cosine_recall@1 \| 0.6364 \|
	\| cosine_recall@3 \| 0.9091 \|
	\| cosine_recall@5 \| 1.0 \|
	\| cosine_recall@10 \| 1.0 \|
	\| cosine_ndcg@10 \| 0.8476 \|
	\| cosine_mrr@10 \| 0.7955 \|
	\| cosine_map@100 \| 0.7955 \|

	#### Information Retrieval
	* Dataset: `dim_128`
	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| cosine_accuracy@1 \| 0.6364 \|
	\| cosine_accuracy@3 \| 0.9091 \|
	\| cosine_accuracy@5 \| 1.0 \|
	\| cosine_accuracy@10 \| 1.0 \|
	\| cosine_precision@1 \| 0.6364 \|
	\| cosine_precision@3 \| 0.303 \|
	\| cosine_precision@5 \| 0.2 \|
	\| cosine_precision@10 \| 0.1 \|
	\| cosine_recall@1 \| 0.6364 \|
	\| cosine_recall@3 \| 0.9091 \|
	\| cosine_recall@5 \| 1.0 \|
	\| cosine_recall@10 \| 1.0 \|
	\| cosine_ndcg@10 \| 0.8476 \|
	\| cosine_mrr@10 \| 0.7955 \|
	\| cosine_map@100 \| 0.7955 \|

	#### Information Retrieval
	* Dataset: `dim_64`
	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| cosine_accuracy@1 \| 0.6364 \|
	\| cosine_accuracy@3 \| 0.9091 \|
	\| cosine_accuracy@5 \| 1.0 \|
	\| cosine_accuracy@10 \| 1.0 \|
	\| cosine_precision@1 \| 0.6364 \|
	\| cosine_precision@3 \| 0.303 \|
	\| cosine_precision@5 \| 0.2 \|
	\| cosine_precision@10 \| 0.1 \|
	\| cosine_recall@1 \| 0.6364 \|
	\| cosine_recall@3 \| 0.9091 \|
	\| cosine_recall@5 \| 1.0 \|
	\| cosine_recall@10 \| 1.0 \|
	\| cosine_ndcg@10 \| 0.8357 \|
	\| cosine_mrr@10 \| 0.7803 \|
	\| cosine_map@100 \| 0.7803 \|

	<!--
	## Bias, Risks and Limitations

	What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.
	-->

	<!--
	### Recommendations

	What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.
	-->

	## Training Details

	### Training Dataset

	#### json

	* Dataset: json
	* Size: 94 training samples
	* Columns: <code>positive</code> and <code>anchor</code>
	* Approximate statistics based on the first 94 samples:
	\| \| positive \| anchor \|
	\|:--------\|:------------------------------------------------------------------------------------\|:-----------------------------------------------------------------------------------\|
	\| type \| string \| string \|
	\| details \| <ul><li>min: 24 tokens</li><li>mean: 89.93 tokens</li><li>max: 272 tokens</li></ul> \| <ul><li>min: 10 tokens</li><li>mean: 19.18 tokens</li><li>max: 35 tokens</li></ul> \|
	* Samples:
	\| positive \| anchor \|
	\|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|:-----------------------------------------------------\|
	\| <code>서울여자대학교 수시모집에서 평가하는 요소는 다음과 같습니다. 1. 서류 평가(학업역량 40%, 진로역량 35%, 공동체역량 25%) 2. 면접 평가(인성 및 의사소통능력, 발전가능성) 3. 학교생활기록부에 학교폭력 관련 기재사항이 있을 경우, 정성평가로 반영합니다.</code> \| <code>서울여자대학교 수시모집에서 평가하는 요소를 알려줘.</code> \|
	\| <code>서울여자대학교 학생부종합전형 지원자에게 필요한 지원자격 정보는 다음과 같습니다. 지원자격은 기초생활수급자, 차상위계층, 한부모가족 지원대상자, 국가보훈대상자, 자립지원 대상 아동, 농어촌학생 등입니다. 각 지원자격에 따라 필요한 제출서류가 다르므로, 지원자격에 따라 필요한 제출서류를 확인하여야 합니다.</code> \| <code>서울여자대학교 학생부종합전형 지원자에게 필요한 지원자격 정보를 알려줘.</code> \|
	\| <code>SWU의 SI(Social Innovation)교육은 사회적 가치 확산을 위해 혁신적인 방법론을 적용하여 긍정적인 사회 변화를 유도하는 서울여자대학교만의 차별화된 교육입니다. 바롬종합설계프로젝트는 유네스코한국위원회가 인증한 유네스코지속가능발전교육공식프로젝트입니다.</code> \| <code>SWU의 SI(Social Innovation)교육에 대해 알려줘.</code> \|
	* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
	```json
	{
	"loss": "MultipleNegativesRankingLoss",
	"matryoshka_dims": [
	256,
	128,
	64
	],
	"matryoshka_weights": [
	1,
	1,
	1
	],
	"n_dims_per_step": -1
	}
	```

	### Training Hyperparameters
	#### Non-Default Hyperparameters

	- `eval_strategy`: epoch
	- `gradient_accumulation_steps`: 16
	- `learning_rate`: 2e-05
	- `num_train_epochs`: 4
	- `lr_scheduler_type`: cosine
	- `warmup_ratio`: 0.1
	- `tf32`: False
	- `load_best_model_at_end`: True
	- `optim`: adamw_torch_fused
	- `batch_sampler`: no_duplicates

	#### All Hyperparameters
	<details><summary>Click to expand</summary>

	- `overwrite_output_dir`: False
	- `do_predict`: False
	- `eval_strategy`: epoch
	- `prediction_loss_only`: True
	- `per_device_train_batch_size`: 8
	- `per_device_eval_batch_size`: 8
	- `per_gpu_train_batch_size`: None
	- `per_gpu_eval_batch_size`: None
	- `gradient_accumulation_steps`: 16
	- `eval_accumulation_steps`: None
	- `learning_rate`: 2e-05
	- `weight_decay`: 0.0
	- `adam_beta1`: 0.9
	- `adam_beta2`: 0.999
	- `adam_epsilon`: 1e-08
	- `max_grad_norm`: 1.0
	- `num_train_epochs`: 4
	- `max_steps`: -1
	- `lr_scheduler_type`: cosine
	- `lr_scheduler_kwargs`: {}
	- `warmup_ratio`: 0.1
	- `warmup_steps`: 0
	- `log_level`: passive
	- `log_level_replica`: warning
	- `log_on_each_node`: True
	- `logging_nan_inf_filter`: True
	- `save_safetensors`: True
	- `save_on_each_node`: False
	- `save_only_model`: False
	- `restore_callback_states_from_checkpoint`: False
	- `no_cuda`: False
	- `use_cpu`: False
	- `use_mps_device`: False
	- `seed`: 42
	- `data_seed`: None
	- `jit_mode_eval`: False
	- `use_ipex`: False
	- `bf16`: False
	- `fp16`: False
	- `fp16_opt_level`: O1
	- `half_precision_backend`: auto
	- `bf16_full_eval`: False
	- `fp16_full_eval`: False
	- `tf32`: False
	- `local_rank`: 0
	- `ddp_backend`: None
	- `tpu_num_cores`: None
	- `tpu_metrics_debug`: False
	- `debug`: []
	- `dataloader_drop_last`: False
	- `dataloader_num_workers`: 0
	- `dataloader_prefetch_factor`: None
	- `past_index`: -1
	- `disable_tqdm`: False
	- `remove_unused_columns`: True
	- `label_names`: None
	- `load_best_model_at_end`: True
	- `ignore_data_skip`: False
	- `fsdp`: []
	- `fsdp_min_num_params`: 0
	- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
	- `fsdp_transformer_layer_cls_to_wrap`: None
	- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
	- `deepspeed`: None
	- `label_smoothing_factor`: 0.0
	- `optim`: adamw_torch_fused
	- `optim_args`: None
	- `adafactor`: False
	- `group_by_length`: False
	- `length_column_name`: length
	- `ddp_find_unused_parameters`: None
	- `ddp_bucket_cap_mb`: None
	- `ddp_broadcast_buffers`: False
	- `dataloader_pin_memory`: True
	- `dataloader_persistent_workers`: False
	- `skip_memory_metrics`: True
	- `use_legacy_prediction_loop`: False
	- `push_to_hub`: False
	- `resume_from_checkpoint`: None
	- `hub_model_id`: None
	- `hub_strategy`: every_save
	- `hub_private_repo`: False
	- `hub_always_push`: False
	- `gradient_checkpointing`: False
	- `gradient_checkpointing_kwargs`: None
	- `include_inputs_for_metrics`: False
	- `eval_do_concat_batches`: True
	- `fp16_backend`: auto
	- `push_to_hub_model_id`: None
	- `push_to_hub_organization`: None
	- `mp_parameters`:
	- `auto_find_batch_size`: False
	- `full_determinism`: False
	- `torchdynamo`: None
	- `ray_scope`: last
	- `ddp_timeout`: 1800
	- `torch_compile`: False
	- `torch_compile_backend`: None
	- `torch_compile_mode`: None
	- `dispatch_batches`: None
	- `split_batches`: None
	- `include_tokens_per_second`: False
	- `include_num_input_tokens_seen`: False
	- `neftune_noise_alpha`: None
	- `optim_target_modules`: None
	- `batch_eval_metrics`: False
	- `batch_sampler`: no_duplicates
	- `multi_dataset_batch_sampler`: proportional

	</details>

	### Training Logs
	\| Epoch \| Step \| dim_128_cosine_map@100 \| dim_256_cosine_map@100 \| dim_64_cosine_map@100 \|
	\|:-------:\|:-----:\|:----------------------:\|:----------------------:\|:---------------------:\|
	\| 1.0 \| 1 \| 0.7955 \| 0.7955 \| 0.7803 \|
	\| 2.0 \| 2 \| 0.7955 \| 0.7955 \| 0.7803 \|
	\| 3.0 \| 4 \| 0.7955 \| 0.7955 \| 0.7803 \|
	\| 1.0 \| 1 \| 0.7955 \| 0.7955 \| 0.7803 \|
	\| 2.0 \| 2 \| 0.7955 \| 0.7955 \| 0.7803 \|
	\| 3.0 \| 4 \| 0.7955 \| 0.7955 \| 0.7803 \|

	* The bold row denotes the saved checkpoint.

	### Framework Versions
	- Python: 3.10.14
	- Sentence Transformers: 3.1.1
	- Transformers: 4.41.2
	- PyTorch: 2.1.2+cu121
	- Accelerate: 0.34.2
	- Datasets: 2.19.1
	- Tokenizers: 0.19.1

	## Citation

	### BibTeX

	#### Sentence Transformers
	```bibtex
	@inproceedings{reimers-2019-sentence-bert,
	title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
	author = "Reimers, Nils and Gurevych, Iryna",
	booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
	month = "11",
	year = "2019",
	publisher = "Association for Computational Linguistics",
	url = "https://arxiv.org/abs/1908.10084",
	}
	```

	#### MatryoshkaLoss
	```bibtex
	@misc{kusupati2024matryoshka,
	title={Matryoshka Representation Learning},
	author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
	year={2024},
	eprint={2205.13147},
	archivePrefix={arXiv},
	primaryClass={cs.LG}
	}
	```

	#### MultipleNegativesRankingLoss
	```bibtex
	@misc{henderson2017efficient,
	title={Efficient Natural Language Response Suggestion for Smart Reply},
	author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
	year={2017},
	eprint={1705.00652},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```

	<!--
	## Glossary

	Clearly define terms in order to be accessible across audiences.
	-->

	<!--
	## Model Card Authors

	Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.
	-->

	<!--
	## Model Card Contact

	Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.
	-->