SentenceTransformer based on Geotrend/bert-base-sw-cased

This is a sentence-transformers model finetuned from Geotrend/bert-base-sw-cased on the Mollel/swahili-n_li-triplet-swh-eng dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: Geotrend/bert-base-sw-cased
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity
Training Dataset:
- Mollel/swahili-n_li-triplet-swh-eng

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Mollel/MultiLinguSwahili-bert-base-sw-cased-nli-matryoshka")
# Run inference
sentences = [
    'Mwanamume na mwanamke wachanga waliovaa mikoba wanaweka au kuondoa kitu kutoka kwenye mti mweupe wa zamani, huku watu wengine wamesimama au wameketi nyuma.',
    'mwanamume na mwanamke wenye mikoba',
    'tai huruka',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Dataset: sts-test-768
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.6937
spearman_cosine	0.6873
pearson_manhattan	0.6672
spearman_manhattan	0.6578
pearson_euclidean	0.6672
spearman_euclidean	0.6578
pearson_dot	0.5235
spearman_dot	0.5126
pearson_max	0.6937
spearman_max	0.6873

Semantic Similarity

Dataset: sts-test-512
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.6899
spearman_cosine	0.6847
pearson_manhattan	0.6678
spearman_manhattan	0.658
pearson_euclidean	0.6673
spearman_euclidean	0.6573
pearson_dot	0.4953
spearman_dot	0.4872
pearson_max	0.6899
spearman_max	0.6847

Semantic Similarity

Dataset: sts-test-256
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.6873
spearman_cosine	0.6817
pearson_manhattan	0.6674
spearman_manhattan	0.6558
pearson_euclidean	0.6675
spearman_euclidean	0.656
pearson_dot	0.4566
spearman_dot	0.4533
pearson_max	0.6873
spearman_max	0.6817

Semantic Similarity

Dataset: sts-test-128
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.6836
spearman_cosine	0.6795
pearson_manhattan	0.6664
spearman_manhattan	0.6535
pearson_euclidean	0.6664
spearman_euclidean	0.6537
pearson_dot	0.431
spearman_dot	0.4315
pearson_max	0.6836
spearman_max	0.6795

Semantic Similarity

Dataset: sts-test-64
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.6715
spearman_cosine	0.6691
pearson_manhattan	0.6571
spearman_manhattan	0.6456
pearson_euclidean	0.6599
spearman_euclidean	0.6472
pearson_dot	0.3676
spearman_dot	0.3678
pearson_max	0.6715
spearman_max	0.6691

Training Details

Training Dataset

Mollel/swahili-n_li-triplet-swh-eng

Dataset: Mollel/swahili-n_li-triplet-swh-eng
Size: 1,115,700 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 9 tokens mean: 16.73 tokens max: 71 tokens	min: 6 tokens mean: 19.74 tokens max: 45 tokens	min: 6 tokens mean: 19.0 tokens max: 49 tokens

Samples:

anchor	positive	negative
`A person on a horse jumps over a broken down airplane.`	`A person is outdoors, on a horse.`	`A person is at a diner, ordering an omelette.`
`Mtu aliyepanda farasi anaruka juu ya ndege iliyovunjika.`	`Mtu yuko nje, juu ya farasi.`	`Mtu yuko kwenye mkahawa, akiagiza omelette.`
`Children smiling and waving at camera`	`There are children present`	`The kids are frowning`

Loss: MatryoshkaLoss with these parameters:

{
    "loss": "MultipleNegativesRankingLoss",
    "matryoshka_dims": [
        768,
        512,
        256,
        128,
        64
    ],
    "matryoshka_weights": [
        1,
        1,
        1,
        1,
        1
    ],
    "n_dims_per_step": -1
}

Evaluation Dataset

Mollel/swahili-n_li-triplet-swh-eng

Dataset: Mollel/swahili-n_li-triplet-swh-eng
Size: 13,168 evaluation samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 7 tokens mean: 28.25 tokens max: 82 tokens	min: 5 tokens mean: 14.16 tokens max: 55 tokens	min: 5 tokens mean: 15.55 tokens max: 46 tokens

Samples:

anchor	positive	negative
`Two women are embracing while holding to go packages.`	`Two woman are holding packages.`	`The men are fighting outside a deli.`
`Wanawake wawili wanakumbatiana huku wakishikilia vifurushi vya kwenda.`	`Wanawake wawili wanashikilia vifurushi.`	`Wanaume hao wanapigana nje ya duka la vyakula vitamu.`
`Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.`	`Two kids in numbered jerseys wash their hands.`	`Two kids in jackets walk to school.`

Loss: MatryoshkaLoss with these parameters:

{
    "loss": "MultipleNegativesRankingLoss",
    "matryoshka_dims": [
        768,
        512,
        256,
        128,
        64
    ],
    "matryoshka_weights": [
        1,
        1,
        1,
        1,
        1
    ],
    "n_dims_per_step": -1
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 32
per_device_eval_batch_size: 32
learning_rate: 2e-05
num_train_epochs: 1
warmup_ratio: 0.1
bf16: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
prediction_loss_only: True
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	sts-test-128_spearman_cosine	sts-test-256_spearman_cosine	sts-test-512_spearman_cosine	sts-test-64_spearman_cosine	sts-test-768_spearman_cosine
0.0057	100	19.9104	-	-	-	-	-
0.0115	200	15.4038	-	-	-	-	-
0.0172	300	12.4565	-	-	-	-	-
0.0229	400	11.8633	-	-	-	-	-
0.0287	500	11.0601	-	-	-	-	-
0.0344	600	9.7725	-	-	-	-	-
0.0402	700	8.8549	-	-	-	-	-
0.0459	800	8.0831	-	-	-	-	-
0.0516	900	7.9941	-	-	-	-	-
0.0574	1000	7.6537	-	-	-	-	-
0.0631	1100	7.9303	-	-	-	-	-
0.0688	1200	7.5246	-	-	-	-	-
0.0746	1300	7.7754	-	-	-	-	-
0.0803	1400	7.668	-	-	-	-	-
0.0860	1500	6.7171	-	-	-	-	-
0.0918	1600	6.347	-	-	-	-	-
0.0975	1700	6.0	-	-	-	-	-
0.1033	1800	6.4314	-	-	-	-	-
0.1090	1900	6.7947	-	-	-	-	-
0.1147	2000	6.9316	-	-	-	-	-
0.1205	2100	6.6304	-	-	-	-	-
0.1262	2200	6.132	-	-	-	-	-
0.1319	2300	5.8953	-	-	-	-	-
0.1377	2400	5.6954	-	-	-	-	-
0.1434	2500	5.6832	-	-	-	-	-
0.1491	2600	5.2266	-	-	-	-	-
0.1549	2700	5.0678	-	-	-	-	-
0.1606	2800	5.4733	-	-	-	-	-
0.1664	2900	6.0899	-	-	-	-	-
0.1721	3000	6.332	-	-	-	-	-
0.1778	3100	6.4937	-	-	-	-	-
0.1836	3200	6.2242	-	-	-	-	-
0.1893	3300	5.8023	-	-	-	-	-
0.1950	3400	5.0745	-	-	-	-	-
0.2008	3500	5.5806	-	-	-	-	-
0.2065	3600	5.5191	-	-	-	-	-
0.2122	3700	5.3849	-	-	-	-	-
0.2180	3800	5.4828	-	-	-	-	-
0.2237	3900	5.9982	-	-	-	-	-
0.2294	4000	5.6842	-	-	-	-	-
0.2352	4100	5.1627	-	-	-	-	-
0.2409	4200	5.154	-	-	-	-	-
0.2467	4300	5.7932	-	-	-	-	-
0.2524	4400	5.5758	-	-	-	-	-
0.2581	4500	5.5212	-	-	-	-	-
0.2639	4600	5.5692	-	-	-	-	-
0.2696	4700	5.2699	-	-	-	-	-
0.2753	4800	5.4919	-	-	-	-	-
0.2811	4900	5.0754	-	-	-	-	-
0.2868	5000	5.1514	-	-	-	-	-
0.2925	5100	5.0241	-	-	-	-	-
0.2983	5200	5.2679	-	-	-	-	-
0.3040	5300	5.3576	-	-	-	-	-
0.3098	5400	5.3454	-	-	-	-	-
0.3155	5500	5.2142	-	-	-	-	-
0.3212	5600	4.8418	-	-	-	-	-
0.3270	5700	4.9597	-	-	-	-	-
0.3327	5800	5.1989	-	-	-	-	-
0.3384	5900	5.2624	-	-	-	-	-
0.3442	6000	5.0705	-	-	-	-	-
0.3499	6100	5.232	-	-	-	-	-
0.3556	6200	5.2428	-	-	-	-	-
0.3614	6300	4.755	-	-	-	-	-
0.3671	6400	4.7266	-	-	-	-	-
0.3729	6500	4.6452	-	-	-	-	-
0.3786	6600	5.1431	-	-	-	-	-
0.3843	6700	4.5343	-	-	-	-	-
0.3901	6800	4.698	-	-	-	-	-
0.3958	6900	4.6944	-	-	-	-	-
0.4015	7000	4.6255	-	-	-	-	-
0.4073	7100	5.0211	-	-	-	-	-
0.4130	7200	4.6974	-	-	-	-	-
0.4187	7300	4.9182	-	-	-	-	-
0.4245	7400	4.652	-	-	-	-	-
0.4302	7500	5.1015	-	-	-	-	-
0.4360	7600	4.5249	-	-	-	-	-
0.4417	7700	4.455	-	-	-	-	-
0.4474	7800	4.8153	-	-	-	-	-
0.4532	7900	4.7665	-	-	-	-	-
0.4589	8000	4.3413	-	-	-	-	-
0.4646	8100	4.4697	-	-	-	-	-
0.4704	8200	4.6776	-	-	-	-	-
0.4761	8300	4.2868	-	-	-	-	-
0.4818	8400	4.7052	-	-	-	-	-
0.4876	8500	4.4721	-	-	-	-	-
0.4933	8600	4.6926	-	-	-	-	-
0.4991	8700	4.9891	-	-	-	-	-
0.5048	8800	4.4837	-	-	-	-	-
0.5105	8900	4.8127	-	-	-	-	-
0.5163	9000	4.3438	-	-	-	-	-
0.5220	9100	4.4743	-	-	-	-	-
0.5277	9200	4.6879	-	-	-	-	-
0.5335	9300	4.3593	-	-	-	-	-
0.5392	9400	4.3023	-	-	-	-	-
0.5449	9500	4.8188	-	-	-	-	-
0.5507	9600	4.6142	-	-	-	-	-
0.5564	9700	4.7679	-	-	-	-	-
0.5622	9800	4.6224	-	-	-	-	-
0.5679	9900	4.9154	-	-	-	-	-
0.5736	10000	4.7557	-	-	-	-	-
0.5794	10100	4.6395	-	-	-	-	-
0.5851	10200	4.7977	-	-	-	-	-
0.5908	10300	4.915	-	-	-	-	-
0.5966	10400	4.4854	-	-	-	-	-
0.6023	10500	4.3973	-	-	-	-	-
0.6080	10600	4.6964	-	-	-	-	-
0.6138	10700	4.8853	-	-	-	-	-
0.6195	10800	4.786	-	-	-	-	-
0.6253	10900	4.5482	-	-	-	-	-
0.6310	11000	4.4857	-	-	-	-	-
0.6367	11100	4.7415	-	-	-	-	-
0.6425	11200	4.2596	-	-	-	-	-
0.6482	11300	4.8578	-	-	-	-	-
0.6539	11400	4.5471	-	-	-	-	-
0.6597	11500	4.8337	-	-	-	-	-
0.6654	11600	4.2244	-	-	-	-	-
0.6711	11700	4.9619	-	-	-	-	-
0.6769	11800	4.9369	-	-	-	-	-
0.6826	11900	4.2697	-	-	-	-	-
0.6883	12000	4.2711	-	-	-	-	-
0.6941	12100	4.6396	-	-	-	-	-
0.6998	12200	4.5626	-	-	-	-	-
0.7056	12300	4.5767	-	-	-	-	-
0.7113	12400	4.6449	-	-	-	-	-
0.7170	12500	4.4217	-	-	-	-	-
0.7228	12600	4.0203	-	-	-	-	-
0.7285	12700	4.5381	-	-	-	-	-
0.7342	12800	4.5865	-	-	-	-	-
0.7400	12900	4.4203	-	-	-	-	-
0.7457	13000	4.3761	-	-	-	-	-
0.7514	13100	4.093	-	-	-	-	-
0.7572	13200	5.9235	-	-	-	-	-
0.7629	13300	5.4098	-	-	-	-	-
0.7687	13400	5.3079	-	-	-	-	-
0.7744	13500	5.0946	-	-	-	-	-
0.7801	13600	4.7098	-	-	-	-	-
0.7859	13700	4.9471	-	-	-	-	-
0.7916	13800	4.5742	-	-	-	-	-
0.7973	13900	4.6178	-	-	-	-	-
0.8031	14000	4.4516	-	-	-	-	-
0.8088	14100	4.429	-	-	-	-	-
0.8145	14200	4.3812	-	-	-	-	-
0.8203	14300	4.3739	-	-	-	-	-
0.8260	14400	4.3821	-	-	-	-	-
0.8318	14500	4.4396	-	-	-	-	-
0.8375	14600	4.2667	-	-	-	-	-
0.8432	14700	4.1963	-	-	-	-	-
0.8490	14800	4.1298	-	-	-	-	-
0.8547	14900	4.1843	-	-	-	-	-
0.8604	15000	4.0735	-	-	-	-	-
0.8662	15100	3.9319	-	-	-	-	-
0.8719	15200	4.1544	-	-	-	-	-
0.8776	15300	4.105	-	-	-	-	-
0.8834	15400	4.014	-	-	-	-	-
0.8891	15500	4.0345	-	-	-	-	-
0.8949	15600	3.9127	-	-	-	-	-
0.9006	15700	4.1002	-	-	-	-	-
0.9063	15800	3.8564	-	-	-	-	-
0.9121	15900	3.9297	-	-	-	-	-
0.9178	16000	3.8487	-	-	-	-	-
0.9235	16100	3.7099	-	-	-	-	-
0.9293	16200	3.8545	-	-	-	-	-
0.9350	16300	3.8122	-	-	-	-	-
0.9407	16400	3.8951	-	-	-	-	-
0.9465	16500	3.6996	-	-	-	-	-
0.9522	16600	3.9081	-	-	-	-	-
0.9580	16700	3.8603	-	-	-	-	-
0.9637	16800	3.8534	-	-	-	-	-
0.9694	16900	3.8145	-	-	-	-	-
0.9752	17000	3.9858	-	-	-	-	-
0.9809	17100	3.8224	-	-	-	-	-
0.9866	17200	3.7469	-	-	-	-	-
0.9924	17300	3.9066	-	-	-	-	-
0.9981	17400	3.6754	-	-	-	-	-
1.0	17433	-	0.6795	0.6817	0.6847	0.6691	0.6873

Framework Versions

Python: 3.11.9
Sentence Transformers: 3.0.1
Transformers: 4.40.1
PyTorch: 2.3.0+cu121
Accelerate: 0.29.3
Datasets: 2.19.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Mollel
/

MultiLinguSwahili-bert-base-sw-cased-nli-matryoshka

SentenceTransformer based on Geotrend/bert-base-sw-cased

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Semantic Similarity

Semantic Similarity

Semantic Similarity

Semantic Similarity

Semantic Similarity

Training Details

Training Dataset

Mollel/swahili-n_li-triplet-swh-eng

Evaluation Dataset

Mollel/swahili-n_li-triplet-swh-eng

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

MatryoshkaLoss

MultipleNegativesRankingLoss

Model tree for Mollel/MultiLinguSwahili-bert-base-sw-cased-nli-matryoshka

Space using Mollel/MultiLinguSwahili-bert-base-sw-cased-nli-matryoshka 1

Evaluation results