File size: 2,307 Bytes
e7417b6
 
 
 
75136d5
 
 
8556a5c
d9cfe58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e7417b6
 
b769fc1
75136d5
e7417b6
b769fc1
 
 
 
 
1f42b6f
 
3654b93
1f42b6f
90e1432
1f42b6f
 
 
 
 
 
 
 
 
 
 
3ec711d
3654b93
 
 
 
 
 
 
 
 
1f42b6f
3ec711d
 
3654b93
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
pipeline_tag: sentence-similarity
tags:
- feature-extraction
license: mit
language:
- fr
- en
model-index:
- name: Solon-embeddings-base-0.1
  results:
    - task:
        type: sentence-similarity
        name: Passage Retrieval
      dataset:
        type: unicamp-dl/mmarco
        name: mMARCO-fr
        config: french
        split: validation
      metrics:
        - type: recall_at_500
          name: Recall@500
          value: 90.9
        - type: recall_at_100
          name: Recall@100
          value: 80.6
        - type: recall_at_10
          name: Recall@10
          value: 52.5
        - type: map_at_10
          name: MAP@10
          value: 27.4
        - type: ndcg_at_10
          name: nDCG@10
          value: 33.5
        - type: mrr_at_10
          name: MRR@10
          value: 27.9
---

# Solon Embeddings — Base 0.1  
SOTA Open source french embedding model.

**Instructions :**  
Add "query : " before the *query* to retrieve to increase performance of retrieval.  
No instructions needed for *passages*.


| Model | Mean Score |
| --- | --- |
| **OrdalieTech/Solon-embeddings-large-0.1** | 0.7490 |
| cohere/embed-multilingual-v3 | 0.7402 |
| **OrdalieTech/Solon-embeddings-base-0.1** | 0.7306 |
| openai/ada-002 | 0.7290 |
| cohere/embed-multilingual-light-v3 | 0.6945 |
| antoinelouis/biencoder-camembert-base-mmarcoFR | 0.6826 |
| dangvantuan/sentence-camembert-large | 0.6756 |
| voyage/voyage-01 | 0.6753 |
| intfloat/multilingual-e5-large | 0.6660 |
| intfloat/multilingual-e5-base | 0.6597 |
| Sbert/paraphrase-multilingual-mpnet-base-v2 | 0.5975 |
| dangvantuan/sentence-camembert-base | 0.5456 |
| EuropeanParliament/eubert_embedding_v1 | 0.5063 |

These results have been obtained through 9 french benchmarks on a variety of text similarity tasks (classification, reranking, STS) :
- AmazonReviewsClassification (MTEB)
- MassiveIntentClassification (MTEB)
- MassiveScenarioClassification (MTEB)
- MTOPDomainClassification (MTEB)
- MTOPIntentClassification (MTEB)
- STS22 (MTEB)
- MiraclFRRerank (Miracl)
- OrdalieFRSTS (Ordalie)
- OrdalieFRReranking (Ordalie)

We created OrdalieFRSTS and OrdalieFRReranking to enhance the benchmarking capabilities of French STS and reranking assessments.

(evaluation script available here : github.com/OrdalieTech/mteb)