|
--- |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- sentence-transformers |
|
- feature-extraction |
|
- sentence-similarity |
|
- transformers |
|
- mteb |
|
model-index: |
|
- name: mmlw-e5-large |
|
results: |
|
- task: |
|
type: Clustering |
|
dataset: |
|
type: PL-MTEB/8tags-clustering |
|
name: MTEB 8TagsClustering |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: v_measure |
|
value: 30.623921415441725 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: PL-MTEB/allegro-reviews |
|
name: MTEB AllegroReviews |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 37.683896620278325 |
|
- type: f1 |
|
value: 34.19193027014284 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: arguana-pl |
|
name: MTEB ArguAna-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 38.407000000000004 |
|
- type: map_at_10 |
|
value: 55.147 |
|
- type: map_at_100 |
|
value: 55.757 |
|
- type: map_at_1000 |
|
value: 55.761 |
|
- type: map_at_3 |
|
value: 51.268 |
|
- type: map_at_5 |
|
value: 53.696999999999996 |
|
- type: mrr_at_1 |
|
value: 40.043 |
|
- type: mrr_at_10 |
|
value: 55.840999999999994 |
|
- type: mrr_at_100 |
|
value: 56.459 |
|
- type: mrr_at_1000 |
|
value: 56.462999999999994 |
|
- type: mrr_at_3 |
|
value: 52.074 |
|
- type: mrr_at_5 |
|
value: 54.364999999999995 |
|
- type: ndcg_at_1 |
|
value: 38.407000000000004 |
|
- type: ndcg_at_10 |
|
value: 63.248000000000005 |
|
- type: ndcg_at_100 |
|
value: 65.717 |
|
- type: ndcg_at_1000 |
|
value: 65.79 |
|
- type: ndcg_at_3 |
|
value: 55.403999999999996 |
|
- type: ndcg_at_5 |
|
value: 59.760000000000005 |
|
- type: precision_at_1 |
|
value: 38.407000000000004 |
|
- type: precision_at_10 |
|
value: 8.862 |
|
- type: precision_at_100 |
|
value: 0.991 |
|
- type: precision_at_1000 |
|
value: 0.1 |
|
- type: precision_at_3 |
|
value: 22.451 |
|
- type: precision_at_5 |
|
value: 15.576 |
|
- type: recall_at_1 |
|
value: 38.407000000000004 |
|
- type: recall_at_10 |
|
value: 88.62 |
|
- type: recall_at_100 |
|
value: 99.075 |
|
- type: recall_at_1000 |
|
value: 99.57300000000001 |
|
- type: recall_at_3 |
|
value: 67.354 |
|
- type: recall_at_5 |
|
value: 77.881 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: PL-MTEB/cbd |
|
name: MTEB CBD |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 66.14999999999999 |
|
- type: ap |
|
value: 21.69513674684204 |
|
- type: f1 |
|
value: 56.48142830893528 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: PL-MTEB/cdsce-pairclassification |
|
name: MTEB CDSC-E |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 89.4 |
|
- type: cos_sim_ap |
|
value: 76.83228768203222 |
|
- type: cos_sim_f1 |
|
value: 65.3658536585366 |
|
- type: cos_sim_precision |
|
value: 60.909090909090914 |
|
- type: cos_sim_recall |
|
value: 70.52631578947368 |
|
- type: dot_accuracy |
|
value: 84.1 |
|
- type: dot_ap |
|
value: 57.26072201751864 |
|
- type: dot_f1 |
|
value: 62.75395033860045 |
|
- type: dot_precision |
|
value: 54.9407114624506 |
|
- type: dot_recall |
|
value: 73.15789473684211 |
|
- type: euclidean_accuracy |
|
value: 89.4 |
|
- type: euclidean_ap |
|
value: 76.59095263388942 |
|
- type: euclidean_f1 |
|
value: 65.21739130434783 |
|
- type: euclidean_precision |
|
value: 60.26785714285714 |
|
- type: euclidean_recall |
|
value: 71.05263157894737 |
|
- type: manhattan_accuracy |
|
value: 89.4 |
|
- type: manhattan_ap |
|
value: 76.58825999753456 |
|
- type: manhattan_f1 |
|
value: 64.72019464720195 |
|
- type: manhattan_precision |
|
value: 60.18099547511312 |
|
- type: manhattan_recall |
|
value: 70.0 |
|
- type: max_accuracy |
|
value: 89.4 |
|
- type: max_ap |
|
value: 76.83228768203222 |
|
- type: max_f1 |
|
value: 65.3658536585366 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: PL-MTEB/cdscr-sts |
|
name: MTEB CDSC-R |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 93.73949495291659 |
|
- type: cos_sim_spearman |
|
value: 93.50397366192922 |
|
- type: euclidean_pearson |
|
value: 92.47498888987636 |
|
- type: euclidean_spearman |
|
value: 93.39315936230747 |
|
- type: manhattan_pearson |
|
value: 92.47250250777654 |
|
- type: manhattan_spearman |
|
value: 93.36739690549109 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: dbpedia-pl |
|
name: MTEB DBPedia-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 8.434 |
|
- type: map_at_10 |
|
value: 18.424 |
|
- type: map_at_100 |
|
value: 26.428 |
|
- type: map_at_1000 |
|
value: 28.002 |
|
- type: map_at_3 |
|
value: 13.502 |
|
- type: map_at_5 |
|
value: 15.577 |
|
- type: mrr_at_1 |
|
value: 63.0 |
|
- type: mrr_at_10 |
|
value: 72.714 |
|
- type: mrr_at_100 |
|
value: 73.021 |
|
- type: mrr_at_1000 |
|
value: 73.028 |
|
- type: mrr_at_3 |
|
value: 70.75 |
|
- type: mrr_at_5 |
|
value: 72.3 |
|
- type: ndcg_at_1 |
|
value: 52.75 |
|
- type: ndcg_at_10 |
|
value: 39.839999999999996 |
|
- type: ndcg_at_100 |
|
value: 44.989000000000004 |
|
- type: ndcg_at_1000 |
|
value: 52.532999999999994 |
|
- type: ndcg_at_3 |
|
value: 45.198 |
|
- type: ndcg_at_5 |
|
value: 42.015 |
|
- type: precision_at_1 |
|
value: 63.0 |
|
- type: precision_at_10 |
|
value: 31.05 |
|
- type: precision_at_100 |
|
value: 10.26 |
|
- type: precision_at_1000 |
|
value: 1.9879999999999998 |
|
- type: precision_at_3 |
|
value: 48.25 |
|
- type: precision_at_5 |
|
value: 40.45 |
|
- type: recall_at_1 |
|
value: 8.434 |
|
- type: recall_at_10 |
|
value: 24.004 |
|
- type: recall_at_100 |
|
value: 51.428 |
|
- type: recall_at_1000 |
|
value: 75.712 |
|
- type: recall_at_3 |
|
value: 15.015 |
|
- type: recall_at_5 |
|
value: 18.282999999999998 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: fiqa-pl |
|
name: MTEB FiQA-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 19.088 |
|
- type: map_at_10 |
|
value: 31.818 |
|
- type: map_at_100 |
|
value: 33.689 |
|
- type: map_at_1000 |
|
value: 33.86 |
|
- type: map_at_3 |
|
value: 27.399 |
|
- type: map_at_5 |
|
value: 29.945 |
|
- type: mrr_at_1 |
|
value: 38.117000000000004 |
|
- type: mrr_at_10 |
|
value: 47.668 |
|
- type: mrr_at_100 |
|
value: 48.428 |
|
- type: mrr_at_1000 |
|
value: 48.475 |
|
- type: mrr_at_3 |
|
value: 45.242 |
|
- type: mrr_at_5 |
|
value: 46.716 |
|
- type: ndcg_at_1 |
|
value: 38.272 |
|
- type: ndcg_at_10 |
|
value: 39.903 |
|
- type: ndcg_at_100 |
|
value: 46.661 |
|
- type: ndcg_at_1000 |
|
value: 49.625 |
|
- type: ndcg_at_3 |
|
value: 35.921 |
|
- type: ndcg_at_5 |
|
value: 37.558 |
|
- type: precision_at_1 |
|
value: 38.272 |
|
- type: precision_at_10 |
|
value: 11.358 |
|
- type: precision_at_100 |
|
value: 1.8190000000000002 |
|
- type: precision_at_1000 |
|
value: 0.23500000000000001 |
|
- type: precision_at_3 |
|
value: 24.434 |
|
- type: precision_at_5 |
|
value: 18.395 |
|
- type: recall_at_1 |
|
value: 19.088 |
|
- type: recall_at_10 |
|
value: 47.355999999999995 |
|
- type: recall_at_100 |
|
value: 72.451 |
|
- type: recall_at_1000 |
|
value: 90.257 |
|
- type: recall_at_3 |
|
value: 32.931 |
|
- type: recall_at_5 |
|
value: 39.878 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: hotpotqa-pl |
|
name: MTEB HotpotQA-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 39.095 |
|
- type: map_at_10 |
|
value: 62.529 |
|
- type: map_at_100 |
|
value: 63.425 |
|
- type: map_at_1000 |
|
value: 63.483000000000004 |
|
- type: map_at_3 |
|
value: 58.887 |
|
- type: map_at_5 |
|
value: 61.18599999999999 |
|
- type: mrr_at_1 |
|
value: 78.123 |
|
- type: mrr_at_10 |
|
value: 84.231 |
|
- type: mrr_at_100 |
|
value: 84.408 |
|
- type: mrr_at_1000 |
|
value: 84.414 |
|
- type: mrr_at_3 |
|
value: 83.286 |
|
- type: mrr_at_5 |
|
value: 83.94 |
|
- type: ndcg_at_1 |
|
value: 78.19 |
|
- type: ndcg_at_10 |
|
value: 70.938 |
|
- type: ndcg_at_100 |
|
value: 73.992 |
|
- type: ndcg_at_1000 |
|
value: 75.1 |
|
- type: ndcg_at_3 |
|
value: 65.863 |
|
- type: ndcg_at_5 |
|
value: 68.755 |
|
- type: precision_at_1 |
|
value: 78.19 |
|
- type: precision_at_10 |
|
value: 14.949000000000002 |
|
- type: precision_at_100 |
|
value: 1.733 |
|
- type: precision_at_1000 |
|
value: 0.188 |
|
- type: precision_at_3 |
|
value: 42.381 |
|
- type: precision_at_5 |
|
value: 27.711000000000002 |
|
- type: recall_at_1 |
|
value: 39.095 |
|
- type: recall_at_10 |
|
value: 74.747 |
|
- type: recall_at_100 |
|
value: 86.631 |
|
- type: recall_at_1000 |
|
value: 93.923 |
|
- type: recall_at_3 |
|
value: 63.571999999999996 |
|
- type: recall_at_5 |
|
value: 69.27799999999999 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: msmarco-pl |
|
name: MTEB MSMARCO-PL |
|
config: default |
|
split: validation |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 19.439999999999998 |
|
- type: map_at_10 |
|
value: 30.264000000000003 |
|
- type: map_at_100 |
|
value: 31.438 |
|
- type: map_at_1000 |
|
value: 31.495 |
|
- type: map_at_3 |
|
value: 26.735 |
|
- type: map_at_5 |
|
value: 28.716 |
|
- type: mrr_at_1 |
|
value: 19.914 |
|
- type: mrr_at_10 |
|
value: 30.753999999999998 |
|
- type: mrr_at_100 |
|
value: 31.877 |
|
- type: mrr_at_1000 |
|
value: 31.929000000000002 |
|
- type: mrr_at_3 |
|
value: 27.299 |
|
- type: mrr_at_5 |
|
value: 29.254 |
|
- type: ndcg_at_1 |
|
value: 20.014000000000003 |
|
- type: ndcg_at_10 |
|
value: 36.472 |
|
- type: ndcg_at_100 |
|
value: 42.231 |
|
- type: ndcg_at_1000 |
|
value: 43.744 |
|
- type: ndcg_at_3 |
|
value: 29.268 |
|
- type: ndcg_at_5 |
|
value: 32.79 |
|
- type: precision_at_1 |
|
value: 20.014000000000003 |
|
- type: precision_at_10 |
|
value: 5.814 |
|
- type: precision_at_100 |
|
value: 0.8710000000000001 |
|
- type: precision_at_1000 |
|
value: 0.1 |
|
- type: precision_at_3 |
|
value: 12.426 |
|
- type: precision_at_5 |
|
value: 9.238 |
|
- type: recall_at_1 |
|
value: 19.439999999999998 |
|
- type: recall_at_10 |
|
value: 55.535000000000004 |
|
- type: recall_at_100 |
|
value: 82.44399999999999 |
|
- type: recall_at_1000 |
|
value: 94.217 |
|
- type: recall_at_3 |
|
value: 35.963 |
|
- type: recall_at_5 |
|
value: 44.367000000000004 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: mteb/amazon_massive_intent |
|
name: MTEB MassiveIntentClassification (pl) |
|
config: pl |
|
split: test |
|
revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 |
|
metrics: |
|
- type: accuracy |
|
value: 72.01412239408205 |
|
- type: f1 |
|
value: 70.04544187503352 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: mteb/amazon_massive_scenario |
|
name: MTEB MassiveScenarioClassification (pl) |
|
config: pl |
|
split: test |
|
revision: 7d571f92784cd94a019292a1f45445077d0ef634 |
|
metrics: |
|
- type: accuracy |
|
value: 75.26899798251513 |
|
- type: f1 |
|
value: 75.55876166863844 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: nfcorpus-pl |
|
name: MTEB NFCorpus-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 5.772 |
|
- type: map_at_10 |
|
value: 12.708 |
|
- type: map_at_100 |
|
value: 16.194 |
|
- type: map_at_1000 |
|
value: 17.630000000000003 |
|
- type: map_at_3 |
|
value: 9.34 |
|
- type: map_at_5 |
|
value: 10.741 |
|
- type: mrr_at_1 |
|
value: 43.344 |
|
- type: mrr_at_10 |
|
value: 53.429 |
|
- type: mrr_at_100 |
|
value: 53.88699999999999 |
|
- type: mrr_at_1000 |
|
value: 53.925 |
|
- type: mrr_at_3 |
|
value: 51.342 |
|
- type: mrr_at_5 |
|
value: 52.456 |
|
- type: ndcg_at_1 |
|
value: 41.641 |
|
- type: ndcg_at_10 |
|
value: 34.028000000000006 |
|
- type: ndcg_at_100 |
|
value: 31.613000000000003 |
|
- type: ndcg_at_1000 |
|
value: 40.428 |
|
- type: ndcg_at_3 |
|
value: 38.991 |
|
- type: ndcg_at_5 |
|
value: 36.704 |
|
- type: precision_at_1 |
|
value: 43.034 |
|
- type: precision_at_10 |
|
value: 25.324999999999996 |
|
- type: precision_at_100 |
|
value: 7.889 |
|
- type: precision_at_1000 |
|
value: 2.069 |
|
- type: precision_at_3 |
|
value: 36.739 |
|
- type: precision_at_5 |
|
value: 32.074000000000005 |
|
- type: recall_at_1 |
|
value: 5.772 |
|
- type: recall_at_10 |
|
value: 16.827 |
|
- type: recall_at_100 |
|
value: 32.346000000000004 |
|
- type: recall_at_1000 |
|
value: 62.739 |
|
- type: recall_at_3 |
|
value: 10.56 |
|
- type: recall_at_5 |
|
value: 12.655 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: nq-pl |
|
name: MTEB NQ-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 26.101000000000003 |
|
- type: map_at_10 |
|
value: 39.912 |
|
- type: map_at_100 |
|
value: 41.037 |
|
- type: map_at_1000 |
|
value: 41.077000000000005 |
|
- type: map_at_3 |
|
value: 35.691 |
|
- type: map_at_5 |
|
value: 38.155 |
|
- type: mrr_at_1 |
|
value: 29.403000000000002 |
|
- type: mrr_at_10 |
|
value: 42.376999999999995 |
|
- type: mrr_at_100 |
|
value: 43.248999999999995 |
|
- type: mrr_at_1000 |
|
value: 43.277 |
|
- type: mrr_at_3 |
|
value: 38.794000000000004 |
|
- type: mrr_at_5 |
|
value: 40.933 |
|
- type: ndcg_at_1 |
|
value: 29.519000000000002 |
|
- type: ndcg_at_10 |
|
value: 47.33 |
|
- type: ndcg_at_100 |
|
value: 52.171 |
|
- type: ndcg_at_1000 |
|
value: 53.125 |
|
- type: ndcg_at_3 |
|
value: 39.316 |
|
- type: ndcg_at_5 |
|
value: 43.457 |
|
- type: precision_at_1 |
|
value: 29.519000000000002 |
|
- type: precision_at_10 |
|
value: 8.03 |
|
- type: precision_at_100 |
|
value: 1.075 |
|
- type: precision_at_1000 |
|
value: 0.117 |
|
- type: precision_at_3 |
|
value: 18.009 |
|
- type: precision_at_5 |
|
value: 13.221 |
|
- type: recall_at_1 |
|
value: 26.101000000000003 |
|
- type: recall_at_10 |
|
value: 67.50399999999999 |
|
- type: recall_at_100 |
|
value: 88.64699999999999 |
|
- type: recall_at_1000 |
|
value: 95.771 |
|
- type: recall_at_3 |
|
value: 46.669 |
|
- type: recall_at_5 |
|
value: 56.24 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: laugustyniak/abusive-clauses-pl |
|
name: MTEB PAC |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 63.76773819866782 |
|
- type: ap |
|
value: 74.87896817642536 |
|
- type: f1 |
|
value: 61.420506092721425 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: PL-MTEB/ppc-pairclassification |
|
name: MTEB PPC |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 82.1 |
|
- type: cos_sim_ap |
|
value: 91.09417013497443 |
|
- type: cos_sim_f1 |
|
value: 84.78437754271766 |
|
- type: cos_sim_precision |
|
value: 83.36 |
|
- type: cos_sim_recall |
|
value: 86.25827814569537 |
|
- type: dot_accuracy |
|
value: 75.9 |
|
- type: dot_ap |
|
value: 86.82680649789796 |
|
- type: dot_f1 |
|
value: 80.5379746835443 |
|
- type: dot_precision |
|
value: 77.12121212121212 |
|
- type: dot_recall |
|
value: 84.27152317880795 |
|
- type: euclidean_accuracy |
|
value: 81.6 |
|
- type: euclidean_ap |
|
value: 90.81248760600693 |
|
- type: euclidean_f1 |
|
value: 84.35374149659863 |
|
- type: euclidean_precision |
|
value: 86.7132867132867 |
|
- type: euclidean_recall |
|
value: 82.11920529801324 |
|
- type: manhattan_accuracy |
|
value: 81.6 |
|
- type: manhattan_ap |
|
value: 90.81272803548767 |
|
- type: manhattan_f1 |
|
value: 84.33530906011855 |
|
- type: manhattan_precision |
|
value: 86.30849220103987 |
|
- type: manhattan_recall |
|
value: 82.45033112582782 |
|
- type: max_accuracy |
|
value: 82.1 |
|
- type: max_ap |
|
value: 91.09417013497443 |
|
- type: max_f1 |
|
value: 84.78437754271766 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: PL-MTEB/psc-pairclassification |
|
name: MTEB PSC |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 98.05194805194806 |
|
- type: cos_sim_ap |
|
value: 99.52709687103496 |
|
- type: cos_sim_f1 |
|
value: 96.83257918552036 |
|
- type: cos_sim_precision |
|
value: 95.82089552238806 |
|
- type: cos_sim_recall |
|
value: 97.86585365853658 |
|
- type: dot_accuracy |
|
value: 92.30055658627087 |
|
- type: dot_ap |
|
value: 94.12759311032353 |
|
- type: dot_f1 |
|
value: 87.00906344410878 |
|
- type: dot_precision |
|
value: 86.22754491017965 |
|
- type: dot_recall |
|
value: 87.8048780487805 |
|
- type: euclidean_accuracy |
|
value: 98.05194805194806 |
|
- type: euclidean_ap |
|
value: 99.49402675624125 |
|
- type: euclidean_f1 |
|
value: 96.8133535660091 |
|
- type: euclidean_precision |
|
value: 96.37462235649546 |
|
- type: euclidean_recall |
|
value: 97.2560975609756 |
|
- type: manhattan_accuracy |
|
value: 98.05194805194806 |
|
- type: manhattan_ap |
|
value: 99.50120505935962 |
|
- type: manhattan_f1 |
|
value: 96.8133535660091 |
|
- type: manhattan_precision |
|
value: 96.37462235649546 |
|
- type: manhattan_recall |
|
value: 97.2560975609756 |
|
- type: max_accuracy |
|
value: 98.05194805194806 |
|
- type: max_ap |
|
value: 99.52709687103496 |
|
- type: max_f1 |
|
value: 96.83257918552036 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: PL-MTEB/polemo2_in |
|
name: MTEB PolEmo2.0-IN |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 69.45983379501385 |
|
- type: f1 |
|
value: 68.60917948426784 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: PL-MTEB/polemo2_out |
|
name: MTEB PolEmo2.0-OUT |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 43.13765182186235 |
|
- type: f1 |
|
value: 36.15557441785656 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: quora-pl |
|
name: MTEB Quora-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 67.448 |
|
- type: map_at_10 |
|
value: 81.566 |
|
- type: map_at_100 |
|
value: 82.284 |
|
- type: map_at_1000 |
|
value: 82.301 |
|
- type: map_at_3 |
|
value: 78.425 |
|
- type: map_at_5 |
|
value: 80.43400000000001 |
|
- type: mrr_at_1 |
|
value: 77.61 |
|
- type: mrr_at_10 |
|
value: 84.467 |
|
- type: mrr_at_100 |
|
value: 84.63199999999999 |
|
- type: mrr_at_1000 |
|
value: 84.634 |
|
- type: mrr_at_3 |
|
value: 83.288 |
|
- type: mrr_at_5 |
|
value: 84.095 |
|
- type: ndcg_at_1 |
|
value: 77.66 |
|
- type: ndcg_at_10 |
|
value: 85.63199999999999 |
|
- type: ndcg_at_100 |
|
value: 87.166 |
|
- type: ndcg_at_1000 |
|
value: 87.306 |
|
- type: ndcg_at_3 |
|
value: 82.32300000000001 |
|
- type: ndcg_at_5 |
|
value: 84.22 |
|
- type: precision_at_1 |
|
value: 77.66 |
|
- type: precision_at_10 |
|
value: 13.136000000000001 |
|
- type: precision_at_100 |
|
value: 1.522 |
|
- type: precision_at_1000 |
|
value: 0.156 |
|
- type: precision_at_3 |
|
value: 36.153 |
|
- type: precision_at_5 |
|
value: 23.982 |
|
- type: recall_at_1 |
|
value: 67.448 |
|
- type: recall_at_10 |
|
value: 93.83200000000001 |
|
- type: recall_at_100 |
|
value: 99.212 |
|
- type: recall_at_1000 |
|
value: 99.94 |
|
- type: recall_at_3 |
|
value: 84.539 |
|
- type: recall_at_5 |
|
value: 89.71000000000001 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: scidocs-pl |
|
name: MTEB SCIDOCS-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 4.393 |
|
- type: map_at_10 |
|
value: 11.472 |
|
- type: map_at_100 |
|
value: 13.584999999999999 |
|
- type: map_at_1000 |
|
value: 13.918 |
|
- type: map_at_3 |
|
value: 8.212 |
|
- type: map_at_5 |
|
value: 9.864 |
|
- type: mrr_at_1 |
|
value: 21.7 |
|
- type: mrr_at_10 |
|
value: 32.268 |
|
- type: mrr_at_100 |
|
value: 33.495000000000005 |
|
- type: mrr_at_1000 |
|
value: 33.548 |
|
- type: mrr_at_3 |
|
value: 29.15 |
|
- type: mrr_at_5 |
|
value: 30.91 |
|
- type: ndcg_at_1 |
|
value: 21.6 |
|
- type: ndcg_at_10 |
|
value: 19.126 |
|
- type: ndcg_at_100 |
|
value: 27.496 |
|
- type: ndcg_at_1000 |
|
value: 33.274 |
|
- type: ndcg_at_3 |
|
value: 18.196 |
|
- type: ndcg_at_5 |
|
value: 15.945 |
|
- type: precision_at_1 |
|
value: 21.6 |
|
- type: precision_at_10 |
|
value: 9.94 |
|
- type: precision_at_100 |
|
value: 2.1999999999999997 |
|
- type: precision_at_1000 |
|
value: 0.359 |
|
- type: precision_at_3 |
|
value: 17.2 |
|
- type: precision_at_5 |
|
value: 14.12 |
|
- type: recall_at_1 |
|
value: 4.393 |
|
- type: recall_at_10 |
|
value: 20.166999999999998 |
|
- type: recall_at_100 |
|
value: 44.678000000000004 |
|
- type: recall_at_1000 |
|
value: 72.868 |
|
- type: recall_at_3 |
|
value: 10.473 |
|
- type: recall_at_5 |
|
value: 14.313 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: PL-MTEB/sicke-pl-pairclassification |
|
name: MTEB SICK-E-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 82.65389319200979 |
|
- type: cos_sim_ap |
|
value: 76.13749398520014 |
|
- type: cos_sim_f1 |
|
value: 66.64355062413314 |
|
- type: cos_sim_precision |
|
value: 64.93243243243244 |
|
- type: cos_sim_recall |
|
value: 68.44729344729345 |
|
- type: dot_accuracy |
|
value: 76.0905014268243 |
|
- type: dot_ap |
|
value: 58.058968583382494 |
|
- type: dot_f1 |
|
value: 61.181080324657145 |
|
- type: dot_precision |
|
value: 50.391885661595204 |
|
- type: dot_recall |
|
value: 77.84900284900284 |
|
- type: euclidean_accuracy |
|
value: 82.61312678353036 |
|
- type: euclidean_ap |
|
value: 76.10290283033221 |
|
- type: euclidean_f1 |
|
value: 66.50782845473111 |
|
- type: euclidean_precision |
|
value: 63.6897001303781 |
|
- type: euclidean_recall |
|
value: 69.58689458689459 |
|
- type: manhattan_accuracy |
|
value: 82.6742763962495 |
|
- type: manhattan_ap |
|
value: 76.12712309700966 |
|
- type: manhattan_f1 |
|
value: 66.59700452803902 |
|
- type: manhattan_precision |
|
value: 65.16700749829583 |
|
- type: manhattan_recall |
|
value: 68.09116809116809 |
|
- type: max_accuracy |
|
value: 82.6742763962495 |
|
- type: max_ap |
|
value: 76.13749398520014 |
|
- type: max_f1 |
|
value: 66.64355062413314 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: PL-MTEB/sickr-pl-sts |
|
name: MTEB SICK-R-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 81.23898481255246 |
|
- type: cos_sim_spearman |
|
value: 76.0416957474899 |
|
- type: euclidean_pearson |
|
value: 78.96475496102107 |
|
- type: euclidean_spearman |
|
value: 76.07208683063504 |
|
- type: manhattan_pearson |
|
value: 78.92666424673251 |
|
- type: manhattan_spearman |
|
value: 76.04968227583831 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: mteb/sts22-crosslingual-sts |
|
name: MTEB STS22 (pl) |
|
config: pl |
|
split: test |
|
revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 39.13987124398541 |
|
- type: cos_sim_spearman |
|
value: 40.40194528288759 |
|
- type: euclidean_pearson |
|
value: 29.14566247168167 |
|
- type: euclidean_spearman |
|
value: 39.97389932591777 |
|
- type: manhattan_pearson |
|
value: 29.172993134388935 |
|
- type: manhattan_spearman |
|
value: 39.85681935287037 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: scifact-pl |
|
name: MTEB SciFact-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 57.260999999999996 |
|
- type: map_at_10 |
|
value: 66.92399999999999 |
|
- type: map_at_100 |
|
value: 67.443 |
|
- type: map_at_1000 |
|
value: 67.47800000000001 |
|
- type: map_at_3 |
|
value: 64.859 |
|
- type: map_at_5 |
|
value: 65.71900000000001 |
|
- type: mrr_at_1 |
|
value: 60.333000000000006 |
|
- type: mrr_at_10 |
|
value: 67.95400000000001 |
|
- type: mrr_at_100 |
|
value: 68.42 |
|
- type: mrr_at_1000 |
|
value: 68.45 |
|
- type: mrr_at_3 |
|
value: 66.444 |
|
- type: mrr_at_5 |
|
value: 67.128 |
|
- type: ndcg_at_1 |
|
value: 60.333000000000006 |
|
- type: ndcg_at_10 |
|
value: 71.209 |
|
- type: ndcg_at_100 |
|
value: 73.37 |
|
- type: ndcg_at_1000 |
|
value: 74.287 |
|
- type: ndcg_at_3 |
|
value: 67.66799999999999 |
|
- type: ndcg_at_5 |
|
value: 68.644 |
|
- type: precision_at_1 |
|
value: 60.333000000000006 |
|
- type: precision_at_10 |
|
value: 9.467 |
|
- type: precision_at_100 |
|
value: 1.053 |
|
- type: precision_at_1000 |
|
value: 0.11299999999999999 |
|
- type: precision_at_3 |
|
value: 26.778000000000002 |
|
- type: precision_at_5 |
|
value: 16.933 |
|
- type: recall_at_1 |
|
value: 57.260999999999996 |
|
- type: recall_at_10 |
|
value: 83.256 |
|
- type: recall_at_100 |
|
value: 92.767 |
|
- type: recall_at_1000 |
|
value: 100.0 |
|
- type: recall_at_3 |
|
value: 72.933 |
|
- type: recall_at_5 |
|
value: 75.744 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: trec-covid-pl |
|
name: MTEB TRECCOVID-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 0.22 |
|
- type: map_at_10 |
|
value: 1.693 |
|
- type: map_at_100 |
|
value: 9.281 |
|
- type: map_at_1000 |
|
value: 21.462999999999997 |
|
- type: map_at_3 |
|
value: 0.609 |
|
- type: map_at_5 |
|
value: 0.9570000000000001 |
|
- type: mrr_at_1 |
|
value: 80.0 |
|
- type: mrr_at_10 |
|
value: 88.73299999999999 |
|
- type: mrr_at_100 |
|
value: 88.73299999999999 |
|
- type: mrr_at_1000 |
|
value: 88.73299999999999 |
|
- type: mrr_at_3 |
|
value: 88.333 |
|
- type: mrr_at_5 |
|
value: 88.73299999999999 |
|
- type: ndcg_at_1 |
|
value: 79.0 |
|
- type: ndcg_at_10 |
|
value: 71.177 |
|
- type: ndcg_at_100 |
|
value: 52.479 |
|
- type: ndcg_at_1000 |
|
value: 45.333 |
|
- type: ndcg_at_3 |
|
value: 77.48 |
|
- type: ndcg_at_5 |
|
value: 76.137 |
|
- type: precision_at_1 |
|
value: 82.0 |
|
- type: precision_at_10 |
|
value: 74.0 |
|
- type: precision_at_100 |
|
value: 53.68000000000001 |
|
- type: precision_at_1000 |
|
value: 19.954 |
|
- type: precision_at_3 |
|
value: 80.667 |
|
- type: precision_at_5 |
|
value: 80.80000000000001 |
|
- type: recall_at_1 |
|
value: 0.22 |
|
- type: recall_at_10 |
|
value: 1.934 |
|
- type: recall_at_100 |
|
value: 12.728 |
|
- type: recall_at_1000 |
|
value: 41.869 |
|
- type: recall_at_3 |
|
value: 0.637 |
|
- type: recall_at_5 |
|
value: 1.042 |
|
language: pl |
|
license: apache-2.0 |
|
widget: |
|
- source_sentence: "query: Jak dożyć 100 lat?" |
|
sentences: |
|
- "passage: Trzeba zdrowo się odżywiać i uprawiać sport." |
|
- "passage: Trzeba pić alkohol, imprezować i jeździć szybkimi autami." |
|
- "passage: Gdy trwała kampania politycy zapewniali, że rozprawią się z zakazem niedzielnego handlu." |
|
|
|
--- |
|
|
|
<h1 align="center">MMLW-e5-large</h1> |
|
|
|
MMLW (muszę mieć lepszą wiadomość) are neural text encoders for Polish. |
|
This is a distilled model that can be used to generate embeddings applicable to many tasks such as semantic similarity, clustering, information retrieval. The model can also serve as a base for further fine-tuning. |
|
It transforms texts to 1024 dimensional vectors. |
|
The model was initialized with multilingual E5 checkpoint, and then trained with [multilingual knowledge distillation method](https://aclanthology.org/2020.emnlp-main.365/) on a diverse corpus of 60 million Polish-English text pairs. We utilised [English FlagEmbeddings (BGE)](https://huggingface.co/BAAI/bge-base-en) as teacher models for distillation. |
|
|
|
## Usage (Sentence-Transformers) |
|
|
|
⚠️ Our embedding models require the use of specific prefixes and suffixes when encoding texts. For this model, queries should be prefixed with **"query: "** and passages with **"passage: "** ⚠️ |
|
|
|
You can use the model like this with [sentence-transformers](https://www.SBERT.net): |
|
|
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
from sentence_transformers.util import cos_sim |
|
|
|
query_prefix = "query: " |
|
answer_prefix = "passage: " |
|
queries = [query_prefix + "Jak dożyć 100 lat?"] |
|
answers = [ |
|
answer_prefix + "Trzeba zdrowo się odżywiać i uprawiać sport.", |
|
answer_prefix + "Trzeba pić alkohol, imprezować i jeździć szybkimi autami.", |
|
answer_prefix + "Gdy trwała kampania politycy zapewniali, że rozprawią się z zakazem niedzielnego handlu." |
|
] |
|
model = SentenceTransformer("sdadas/mmlw-e5-large") |
|
queries_emb = model.encode(queries, convert_to_tensor=True, show_progress_bar=False) |
|
answers_emb = model.encode(answers, convert_to_tensor=True, show_progress_bar=False) |
|
|
|
best_answer = cos_sim(queries_emb, answers_emb).argmax().item() |
|
print(answers[best_answer]) |
|
# Trzeba zdrowo się odżywiać i uprawiać sport. |
|
``` |
|
|
|
## Evaluation Results |
|
|
|
- The model achieves an **Average Score** of **61.17** on the Polish Massive Text Embedding Benchmark (MTEB). See [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for detailed results. |
|
- The model achieves **NDCG@10** of **56.09** on the Polish Information Retrieval Benchmark. See [PIRB Leaderboard](https://huggingface.co/spaces/sdadas/pirb) for detailed results. |
|
|
|
## Acknowledgements |
|
This model was trained with the A100 GPU cluster support delivered by the Gdansk University of Technology within the TASK center initiative. |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@article{dadas2024pirb, |
|
title={{PIRB}: A Comprehensive Benchmark of Polish Dense and Hybrid Text Retrieval Methods}, |
|
author={Sławomir Dadas and Michał Perełkiewicz and Rafał Poświata}, |
|
year={2024}, |
|
eprint={2402.13350}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |