|
--- |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- feature-extraction |
|
- sentence-similarity |
|
license: mit |
|
language: |
|
- fr |
|
- en |
|
--- |
|
|
|
# Solon Embeddings — Base 0.1 |
|
SOTA Open source french embedding model. |
|
|
|
**Instructions :** |
|
Add "query : " before the *query* to retrieve to increase performance of retrieval. |
|
No instructions needed for *passages*. |
|
|
|
|
|
| Model | Mean Score | |
|
| --- | --- | |
|
| **OrdalieTech/Solon-embeddings-large-0.1** | 0.7490 | |
|
| cohere/embed-multilingual-v3 | 0.7402 | |
|
| **OrdalieTech/Solon-embeddings-base-0.1** | 0.7306 | |
|
| openai/ada-002 | 0.7290 | |
|
| cohere/embed-multilingual-light-v3 | 0.6945 | |
|
| antoinelouis/biencoder-camembert-base-mmarcoFR | 0.6826 | |
|
| dangvantuan/sentence-camembert-large | 0.6756 | |
|
| voyage/voyage-01 | 0.6753 | |
|
| intfloat/multilingual-e5-large | 0.6660 | |
|
| intfloat/multilingual-e5-base | 0.6597 | |
|
| Sbert/paraphrase-multilingual-mpnet-base-v2 | 0.5975 | |
|
| dangvantuan/sentence-camembert-base | 0.5456 | |
|
| EuropeanParliament/eubert_embedding_v1 | 0.5063 | |
|
|
|
These results have been obtained through 9 french benchmarks on a variety of text similarity tasks (classification, reranking, STS) : |
|
- AmazonReviewsClassification (MTEB) |
|
- MassiveIntentClassification (MTEB) |
|
- MassiveScenarioClassification (MTEB) |
|
- MTOPDomainClassification (MTEB) |
|
- MTOPIntentClassification (MTEB) |
|
- STS22 (MTEB) |
|
- MiraclFRRerank (Miracl) |
|
- OrdalieFRSTS (Ordalie) |
|
- OrdalieFRReranking (Ordalie) |
|
|
|
We created OrdalieFRSTS and OrdalieFRReranking to enhance the benchmarking capabilities of French STS and reranking assessments. |
|
|
|
(evaluation script available here : github.com/OrdalieTech/mteb) |