Sentence Similarity
PEFT
Safetensors
English
text-embedding
embeddings
information-retrieval
beir
text-classification
language-model
text-clustering
text-semantic-similarity
text-evaluation
text-reranking
feature-extraction
Sentence Similarity
natural_questions
ms_marco
fever
hotpot_qa
mteb
Eval Results
library_name: peft | |
license: mit | |
language: | |
- en | |
pipeline_tag: sentence-similarity | |
tags: | |
- text-embedding | |
- embeddings | |
- information-retrieval | |
- beir | |
- text-classification | |
- language-model | |
- text-clustering | |
- text-semantic-similarity | |
- text-evaluation | |
- text-reranking | |
- feature-extraction | |
- sentence-similarity | |
- Sentence Similarity | |
- natural_questions | |
- ms_marco | |
- fever | |
- hotpot_qa | |
- mteb | |
model-index: | |
- name: LLM2Vec-Llama-2-supervised | |
results: | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_counterfactual | |
name: MTEB AmazonCounterfactualClassification (en) | |
config: en | |
split: test | |
revision: e8379541af4e31359cca9fbcf4b00f2671dba205 | |
metrics: | |
- type: accuracy | |
value: 82.22388059701493 | |
- type: ap | |
value: 47.788307673555714 | |
- type: f1 | |
value: 76.49604943193079 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_polarity | |
name: MTEB AmazonPolarityClassification | |
config: default | |
split: test | |
revision: e2d317d38cd51312af73b3d32a06d1a08b442046 | |
metrics: | |
- type: accuracy | |
value: 89.69365 | |
- type: ap | |
value: 86.10524801582373 | |
- type: f1 | |
value: 89.68072139277054 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_reviews_multi | |
name: MTEB AmazonReviewsClassification (en) | |
config: en | |
split: test | |
revision: 1399c76144fd37290681b995c656ef9b2e06e26d | |
metrics: | |
- type: accuracy | |
value: 48.472 | |
- type: f1 | |
value: 47.393562374719444 | |
- task: | |
type: Retrieval | |
dataset: | |
type: arguana | |
name: MTEB ArguAna | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 29.942999999999998 | |
- type: map_at_10 | |
value: 47.233999999999995 | |
- type: map_at_100 | |
value: 48.031 | |
- type: map_at_1000 | |
value: 48.033 | |
- type: map_at_3 | |
value: 42.307 | |
- type: map_at_5 | |
value: 45.269 | |
- type: mrr_at_1 | |
value: 30.797 | |
- type: mrr_at_10 | |
value: 47.53 | |
- type: mrr_at_100 | |
value: 48.327 | |
- type: mrr_at_1000 | |
value: 48.329 | |
- type: mrr_at_3 | |
value: 42.662 | |
- type: mrr_at_5 | |
value: 45.564 | |
- type: ndcg_at_1 | |
value: 29.942999999999998 | |
- type: ndcg_at_10 | |
value: 56.535000000000004 | |
- type: ndcg_at_100 | |
value: 59.699999999999996 | |
- type: ndcg_at_1000 | |
value: 59.731 | |
- type: ndcg_at_3 | |
value: 46.397 | |
- type: ndcg_at_5 | |
value: 51.747 | |
- type: precision_at_1 | |
value: 29.942999999999998 | |
- type: precision_at_10 | |
value: 8.613 | |
- type: precision_at_100 | |
value: 0.9939999999999999 | |
- type: precision_at_1000 | |
value: 0.1 | |
- type: precision_at_3 | |
value: 19.417 | |
- type: precision_at_5 | |
value: 14.252999999999998 | |
- type: recall_at_1 | |
value: 29.942999999999998 | |
- type: recall_at_10 | |
value: 86.131 | |
- type: recall_at_100 | |
value: 99.431 | |
- type: recall_at_1000 | |
value: 99.644 | |
- type: recall_at_3 | |
value: 58.25 | |
- type: recall_at_5 | |
value: 71.26599999999999 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/arxiv-clustering-p2p | |
name: MTEB ArxivClusteringP2P | |
config: default | |
split: test | |
revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d | |
metrics: | |
- type: v_measure | |
value: 43.136536817000525 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/arxiv-clustering-s2s | |
name: MTEB ArxivClusteringS2S | |
config: default | |
split: test | |
revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 | |
metrics: | |
- type: v_measure | |
value: 42.37552764639677 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/askubuntudupquestions-reranking | |
name: MTEB AskUbuntuDupQuestions | |
config: default | |
split: test | |
revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 | |
metrics: | |
- type: map | |
value: 63.13252095544898 | |
- type: mrr | |
value: 75.23721584663414 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/biosses-sts | |
name: MTEB BIOSSES | |
config: default | |
split: test | |
revision: d3fb88f8f02e40887cd149695127462bbcf29b4a | |
metrics: | |
- type: cos_sim_spearman | |
value: 82.13259433844514 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/banking77 | |
name: MTEB Banking77Classification | |
config: default | |
split: test | |
revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 | |
metrics: | |
- type: accuracy | |
value: 88.16558441558442 | |
- type: f1 | |
value: 88.1065214360906 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/biorxiv-clustering-p2p | |
name: MTEB BiorxivClusteringP2P | |
config: default | |
split: test | |
revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 | |
metrics: | |
- type: v_measure | |
value: 35.88158182824787 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/biorxiv-clustering-s2s | |
name: MTEB BiorxivClusteringS2S | |
config: default | |
split: test | |
revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 | |
metrics: | |
- type: v_measure | |
value: 34.80880955757979 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/android | |
name: MTEB CQADupstackAndroidRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 35.793 | |
- type: map_at_10 | |
value: 48.413000000000004 | |
- type: map_at_100 | |
value: 50.112 | |
- type: map_at_1000 | |
value: 50.212999999999994 | |
- type: map_at_3 | |
value: 44.656 | |
- type: map_at_5 | |
value: 46.577 | |
- type: mrr_at_1 | |
value: 44.921 | |
- type: mrr_at_10 | |
value: 55.16 | |
- type: mrr_at_100 | |
value: 55.886 | |
- type: mrr_at_1000 | |
value: 55.915000000000006 | |
- type: mrr_at_3 | |
value: 52.861000000000004 | |
- type: mrr_at_5 | |
value: 54.113 | |
- type: ndcg_at_1 | |
value: 44.921 | |
- type: ndcg_at_10 | |
value: 55.205000000000005 | |
- type: ndcg_at_100 | |
value: 60.62800000000001 | |
- type: ndcg_at_1000 | |
value: 61.949 | |
- type: ndcg_at_3 | |
value: 50.597 | |
- type: ndcg_at_5 | |
value: 52.261 | |
- type: precision_at_1 | |
value: 44.921 | |
- type: precision_at_10 | |
value: 10.73 | |
- type: precision_at_100 | |
value: 1.6809999999999998 | |
- type: precision_at_1000 | |
value: 0.208 | |
- type: precision_at_3 | |
value: 24.701999999999998 | |
- type: precision_at_5 | |
value: 17.339 | |
- type: recall_at_1 | |
value: 35.793 | |
- type: recall_at_10 | |
value: 67.49300000000001 | |
- type: recall_at_100 | |
value: 89.74499999999999 | |
- type: recall_at_1000 | |
value: 97.855 | |
- type: recall_at_3 | |
value: 52.586 | |
- type: recall_at_5 | |
value: 58.267 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/english | |
name: MTEB CQADupstackEnglishRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 35.989 | |
- type: map_at_10 | |
value: 47.61 | |
- type: map_at_100 | |
value: 48.956 | |
- type: map_at_1000 | |
value: 49.074 | |
- type: map_at_3 | |
value: 44.563 | |
- type: map_at_5 | |
value: 46.181 | |
- type: mrr_at_1 | |
value: 45.096000000000004 | |
- type: mrr_at_10 | |
value: 53.583999999999996 | |
- type: mrr_at_100 | |
value: 54.242000000000004 | |
- type: mrr_at_1000 | |
value: 54.277 | |
- type: mrr_at_3 | |
value: 51.73 | |
- type: mrr_at_5 | |
value: 52.759 | |
- type: ndcg_at_1 | |
value: 45.096000000000004 | |
- type: ndcg_at_10 | |
value: 53.318 | |
- type: ndcg_at_100 | |
value: 57.541 | |
- type: ndcg_at_1000 | |
value: 59.30800000000001 | |
- type: ndcg_at_3 | |
value: 49.725 | |
- type: ndcg_at_5 | |
value: 51.117000000000004 | |
- type: precision_at_1 | |
value: 45.096000000000004 | |
- type: precision_at_10 | |
value: 10.032 | |
- type: precision_at_100 | |
value: 1.559 | |
- type: precision_at_1000 | |
value: 0.201 | |
- type: precision_at_3 | |
value: 24.331 | |
- type: precision_at_5 | |
value: 16.777 | |
- type: recall_at_1 | |
value: 35.989 | |
- type: recall_at_10 | |
value: 62.759 | |
- type: recall_at_100 | |
value: 80.353 | |
- type: recall_at_1000 | |
value: 91.328 | |
- type: recall_at_3 | |
value: 51.127 | |
- type: recall_at_5 | |
value: 55.823 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/gaming | |
name: MTEB CQADupstackGamingRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 44.277 | |
- type: map_at_10 | |
value: 57.699 | |
- type: map_at_100 | |
value: 58.718 | |
- type: map_at_1000 | |
value: 58.754 | |
- type: map_at_3 | |
value: 54.04 | |
- type: map_at_5 | |
value: 56.184999999999995 | |
- type: mrr_at_1 | |
value: 50.658 | |
- type: mrr_at_10 | |
value: 61.245000000000005 | |
- type: mrr_at_100 | |
value: 61.839999999999996 | |
- type: mrr_at_1000 | |
value: 61.85699999999999 | |
- type: mrr_at_3 | |
value: 58.797999999999995 | |
- type: mrr_at_5 | |
value: 60.35 | |
- type: ndcg_at_1 | |
value: 50.658 | |
- type: ndcg_at_10 | |
value: 63.788 | |
- type: ndcg_at_100 | |
value: 67.52 | |
- type: ndcg_at_1000 | |
value: 68.12 | |
- type: ndcg_at_3 | |
value: 57.923 | |
- type: ndcg_at_5 | |
value: 60.976 | |
- type: precision_at_1 | |
value: 50.658 | |
- type: precision_at_10 | |
value: 10.257 | |
- type: precision_at_100 | |
value: 1.303 | |
- type: precision_at_1000 | |
value: 0.13799999999999998 | |
- type: precision_at_3 | |
value: 25.705 | |
- type: precision_at_5 | |
value: 17.718 | |
- type: recall_at_1 | |
value: 44.277 | |
- type: recall_at_10 | |
value: 78.056 | |
- type: recall_at_100 | |
value: 93.973 | |
- type: recall_at_1000 | |
value: 97.946 | |
- type: recall_at_3 | |
value: 62.578 | |
- type: recall_at_5 | |
value: 70.03 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/gis | |
name: MTEB CQADupstackGisRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 27.101 | |
- type: map_at_10 | |
value: 36.775000000000006 | |
- type: map_at_100 | |
value: 37.901 | |
- type: map_at_1000 | |
value: 37.97 | |
- type: map_at_3 | |
value: 33.721000000000004 | |
- type: map_at_5 | |
value: 35.641 | |
- type: mrr_at_1 | |
value: 29.153000000000002 | |
- type: mrr_at_10 | |
value: 38.951 | |
- type: mrr_at_100 | |
value: 39.896 | |
- type: mrr_at_1000 | |
value: 39.946 | |
- type: mrr_at_3 | |
value: 36.102000000000004 | |
- type: mrr_at_5 | |
value: 37.96 | |
- type: ndcg_at_1 | |
value: 29.153000000000002 | |
- type: ndcg_at_10 | |
value: 42.134 | |
- type: ndcg_at_100 | |
value: 47.499 | |
- type: ndcg_at_1000 | |
value: 49.169000000000004 | |
- type: ndcg_at_3 | |
value: 36.351 | |
- type: ndcg_at_5 | |
value: 39.596 | |
- type: precision_at_1 | |
value: 29.153000000000002 | |
- type: precision_at_10 | |
value: 6.508 | |
- type: precision_at_100 | |
value: 0.966 | |
- type: precision_at_1000 | |
value: 0.11499999999999999 | |
- type: precision_at_3 | |
value: 15.367 | |
- type: precision_at_5 | |
value: 11.096 | |
- type: recall_at_1 | |
value: 27.101 | |
- type: recall_at_10 | |
value: 56.447 | |
- type: recall_at_100 | |
value: 80.828 | |
- type: recall_at_1000 | |
value: 93.171 | |
- type: recall_at_3 | |
value: 41.087 | |
- type: recall_at_5 | |
value: 48.888999999999996 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/mathematica | |
name: MTEB CQADupstackMathematicaRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 19.227 | |
- type: map_at_10 | |
value: 28.965000000000003 | |
- type: map_at_100 | |
value: 30.383 | |
- type: map_at_1000 | |
value: 30.494 | |
- type: map_at_3 | |
value: 26.157999999999998 | |
- type: map_at_5 | |
value: 27.794 | |
- type: mrr_at_1 | |
value: 23.756 | |
- type: mrr_at_10 | |
value: 33.728 | |
- type: mrr_at_100 | |
value: 34.743 | |
- type: mrr_at_1000 | |
value: 34.799 | |
- type: mrr_at_3 | |
value: 31.074 | |
- type: mrr_at_5 | |
value: 32.803 | |
- type: ndcg_at_1 | |
value: 23.756 | |
- type: ndcg_at_10 | |
value: 34.772 | |
- type: ndcg_at_100 | |
value: 41.041 | |
- type: ndcg_at_1000 | |
value: 43.399 | |
- type: ndcg_at_3 | |
value: 29.776000000000003 | |
- type: ndcg_at_5 | |
value: 32.318999999999996 | |
- type: precision_at_1 | |
value: 23.756 | |
- type: precision_at_10 | |
value: 6.505 | |
- type: precision_at_100 | |
value: 1.107 | |
- type: precision_at_1000 | |
value: 0.14400000000000002 | |
- type: precision_at_3 | |
value: 14.594 | |
- type: precision_at_5 | |
value: 10.671999999999999 | |
- type: recall_at_1 | |
value: 19.227 | |
- type: recall_at_10 | |
value: 47.514 | |
- type: recall_at_100 | |
value: 74.378 | |
- type: recall_at_1000 | |
value: 90.615 | |
- type: recall_at_3 | |
value: 33.995 | |
- type: recall_at_5 | |
value: 40.361000000000004 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/physics | |
name: MTEB CQADupstackPhysicsRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 34.164 | |
- type: map_at_10 | |
value: 45.943 | |
- type: map_at_100 | |
value: 47.321999999999996 | |
- type: map_at_1000 | |
value: 47.426 | |
- type: map_at_3 | |
value: 42.485 | |
- type: map_at_5 | |
value: 44.440000000000005 | |
- type: mrr_at_1 | |
value: 41.577999999999996 | |
- type: mrr_at_10 | |
value: 51.373000000000005 | |
- type: mrr_at_100 | |
value: 52.176 | |
- type: mrr_at_1000 | |
value: 52.205999999999996 | |
- type: mrr_at_3 | |
value: 49.07 | |
- type: mrr_at_5 | |
value: 50.451 | |
- type: ndcg_at_1 | |
value: 41.577999999999996 | |
- type: ndcg_at_10 | |
value: 52.071 | |
- type: ndcg_at_100 | |
value: 57.467999999999996 | |
- type: ndcg_at_1000 | |
value: 59.068 | |
- type: ndcg_at_3 | |
value: 47.053 | |
- type: ndcg_at_5 | |
value: 49.508 | |
- type: precision_at_1 | |
value: 41.577999999999996 | |
- type: precision_at_10 | |
value: 9.461 | |
- type: precision_at_100 | |
value: 1.425 | |
- type: precision_at_1000 | |
value: 0.17500000000000002 | |
- type: precision_at_3 | |
value: 22.425 | |
- type: precision_at_5 | |
value: 15.823 | |
- type: recall_at_1 | |
value: 34.164 | |
- type: recall_at_10 | |
value: 64.446 | |
- type: recall_at_100 | |
value: 86.978 | |
- type: recall_at_1000 | |
value: 96.976 | |
- type: recall_at_3 | |
value: 50.358999999999995 | |
- type: recall_at_5 | |
value: 56.825 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/programmers | |
name: MTEB CQADupstackProgrammersRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 30.988 | |
- type: map_at_10 | |
value: 43.293 | |
- type: map_at_100 | |
value: 44.64 | |
- type: map_at_1000 | |
value: 44.735 | |
- type: map_at_3 | |
value: 39.041 | |
- type: map_at_5 | |
value: 41.461999999999996 | |
- type: mrr_at_1 | |
value: 39.498 | |
- type: mrr_at_10 | |
value: 49.763000000000005 | |
- type: mrr_at_100 | |
value: 50.517 | |
- type: mrr_at_1000 | |
value: 50.556 | |
- type: mrr_at_3 | |
value: 46.747 | |
- type: mrr_at_5 | |
value: 48.522 | |
- type: ndcg_at_1 | |
value: 39.498 | |
- type: ndcg_at_10 | |
value: 50.285000000000004 | |
- type: ndcg_at_100 | |
value: 55.457 | |
- type: ndcg_at_1000 | |
value: 57.062999999999995 | |
- type: ndcg_at_3 | |
value: 43.795 | |
- type: ndcg_at_5 | |
value: 46.813 | |
- type: precision_at_1 | |
value: 39.498 | |
- type: precision_at_10 | |
value: 9.486 | |
- type: precision_at_100 | |
value: 1.403 | |
- type: precision_at_1000 | |
value: 0.172 | |
- type: precision_at_3 | |
value: 21.081 | |
- type: precision_at_5 | |
value: 15.434000000000001 | |
- type: recall_at_1 | |
value: 30.988 | |
- type: recall_at_10 | |
value: 64.751 | |
- type: recall_at_100 | |
value: 86.496 | |
- type: recall_at_1000 | |
value: 96.86200000000001 | |
- type: recall_at_3 | |
value: 46.412 | |
- type: recall_at_5 | |
value: 54.381 | |
- task: | |
type: Retrieval | |
dataset: | |
type: mteb/cqadupstack | |
name: MTEB CQADupstackRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 29.636000000000003 | |
- type: map_at_10 | |
value: 40.15091666666667 | |
- type: map_at_100 | |
value: 41.47933333333333 | |
- type: map_at_1000 | |
value: 41.58425 | |
- type: map_at_3 | |
value: 36.98025 | |
- type: map_at_5 | |
value: 38.76483333333333 | |
- type: mrr_at_1 | |
value: 35.3525 | |
- type: mrr_at_10 | |
value: 44.62258333333334 | |
- type: mrr_at_100 | |
value: 45.47491666666667 | |
- type: mrr_at_1000 | |
value: 45.52275 | |
- type: mrr_at_3 | |
value: 42.18574999999999 | |
- type: mrr_at_5 | |
value: 43.608333333333334 | |
- type: ndcg_at_1 | |
value: 35.3525 | |
- type: ndcg_at_10 | |
value: 45.935333333333325 | |
- type: ndcg_at_100 | |
value: 51.185249999999996 | |
- type: ndcg_at_1000 | |
value: 53.07075 | |
- type: ndcg_at_3 | |
value: 40.893416666666674 | |
- type: ndcg_at_5 | |
value: 43.272916666666674 | |
- type: precision_at_1 | |
value: 35.3525 | |
- type: precision_at_10 | |
value: 8.118 | |
- type: precision_at_100 | |
value: 1.2704166666666667 | |
- type: precision_at_1000 | |
value: 0.16158333333333333 | |
- type: precision_at_3 | |
value: 18.987000000000002 | |
- type: precision_at_5 | |
value: 13.416083333333335 | |
- type: recall_at_1 | |
value: 29.636000000000003 | |
- type: recall_at_10 | |
value: 58.38899999999999 | |
- type: recall_at_100 | |
value: 81.08758333333334 | |
- type: recall_at_1000 | |
value: 93.93433333333333 | |
- type: recall_at_3 | |
value: 44.1485 | |
- type: recall_at_5 | |
value: 50.43808333333334 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/stats | |
name: MTEB CQADupstackStatsRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 25.102999999999998 | |
- type: map_at_10 | |
value: 33.822 | |
- type: map_at_100 | |
value: 34.77 | |
- type: map_at_1000 | |
value: 34.862 | |
- type: map_at_3 | |
value: 31.305 | |
- type: map_at_5 | |
value: 32.714999999999996 | |
- type: mrr_at_1 | |
value: 28.221 | |
- type: mrr_at_10 | |
value: 36.677 | |
- type: mrr_at_100 | |
value: 37.419999999999995 | |
- type: mrr_at_1000 | |
value: 37.49 | |
- type: mrr_at_3 | |
value: 34.407 | |
- type: mrr_at_5 | |
value: 35.510999999999996 | |
- type: ndcg_at_1 | |
value: 28.221 | |
- type: ndcg_at_10 | |
value: 38.739000000000004 | |
- type: ndcg_at_100 | |
value: 43.4 | |
- type: ndcg_at_1000 | |
value: 45.759 | |
- type: ndcg_at_3 | |
value: 34.076 | |
- type: ndcg_at_5 | |
value: 36.153999999999996 | |
- type: precision_at_1 | |
value: 28.221 | |
- type: precision_at_10 | |
value: 6.227 | |
- type: precision_at_100 | |
value: 0.9339999999999999 | |
- type: precision_at_1000 | |
value: 0.122 | |
- type: precision_at_3 | |
value: 14.979999999999999 | |
- type: precision_at_5 | |
value: 10.306999999999999 | |
- type: recall_at_1 | |
value: 25.102999999999998 | |
- type: recall_at_10 | |
value: 50.924 | |
- type: recall_at_100 | |
value: 72.507 | |
- type: recall_at_1000 | |
value: 89.869 | |
- type: recall_at_3 | |
value: 38.041000000000004 | |
- type: recall_at_5 | |
value: 43.139 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/tex | |
name: MTEB CQADupstackTexRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 19.284000000000002 | |
- type: map_at_10 | |
value: 27.632 | |
- type: map_at_100 | |
value: 28.811999999999998 | |
- type: map_at_1000 | |
value: 28.937 | |
- type: map_at_3 | |
value: 24.884 | |
- type: map_at_5 | |
value: 26.479999999999997 | |
- type: mrr_at_1 | |
value: 23.641000000000002 | |
- type: mrr_at_10 | |
value: 31.716 | |
- type: mrr_at_100 | |
value: 32.644 | |
- type: mrr_at_1000 | |
value: 32.717 | |
- type: mrr_at_3 | |
value: 29.284 | |
- type: mrr_at_5 | |
value: 30.697000000000003 | |
- type: ndcg_at_1 | |
value: 23.641000000000002 | |
- type: ndcg_at_10 | |
value: 32.805 | |
- type: ndcg_at_100 | |
value: 38.229 | |
- type: ndcg_at_1000 | |
value: 40.938 | |
- type: ndcg_at_3 | |
value: 28.116999999999997 | |
- type: ndcg_at_5 | |
value: 30.442999999999998 | |
- type: precision_at_1 | |
value: 23.641000000000002 | |
- type: precision_at_10 | |
value: 6.05 | |
- type: precision_at_100 | |
value: 1.0250000000000001 | |
- type: precision_at_1000 | |
value: 0.14400000000000002 | |
- type: precision_at_3 | |
value: 13.478000000000002 | |
- type: precision_at_5 | |
value: 9.876 | |
- type: recall_at_1 | |
value: 19.284000000000002 | |
- type: recall_at_10 | |
value: 44.257999999999996 | |
- type: recall_at_100 | |
value: 68.475 | |
- type: recall_at_1000 | |
value: 87.362 | |
- type: recall_at_3 | |
value: 31.09 | |
- type: recall_at_5 | |
value: 37.13 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/unix | |
name: MTEB CQADupstackUnixRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 30.301000000000002 | |
- type: map_at_10 | |
value: 40.65 | |
- type: map_at_100 | |
value: 41.934 | |
- type: map_at_1000 | |
value: 42.025 | |
- type: map_at_3 | |
value: 37.482 | |
- type: map_at_5 | |
value: 39.364 | |
- type: mrr_at_1 | |
value: 35.728 | |
- type: mrr_at_10 | |
value: 44.836999999999996 | |
- type: mrr_at_100 | |
value: 45.747 | |
- type: mrr_at_1000 | |
value: 45.800000000000004 | |
- type: mrr_at_3 | |
value: 42.335 | |
- type: mrr_at_5 | |
value: 43.818 | |
- type: ndcg_at_1 | |
value: 35.728 | |
- type: ndcg_at_10 | |
value: 46.199 | |
- type: ndcg_at_100 | |
value: 51.721 | |
- type: ndcg_at_1000 | |
value: 53.751000000000005 | |
- type: ndcg_at_3 | |
value: 41.053 | |
- type: ndcg_at_5 | |
value: 43.686 | |
- type: precision_at_1 | |
value: 35.728 | |
- type: precision_at_10 | |
value: 7.836 | |
- type: precision_at_100 | |
value: 1.179 | |
- type: precision_at_1000 | |
value: 0.146 | |
- type: precision_at_3 | |
value: 18.781 | |
- type: precision_at_5 | |
value: 13.245999999999999 | |
- type: recall_at_1 | |
value: 30.301000000000002 | |
- type: recall_at_10 | |
value: 58.626999999999995 | |
- type: recall_at_100 | |
value: 82.245 | |
- type: recall_at_1000 | |
value: 96.177 | |
- type: recall_at_3 | |
value: 44.533 | |
- type: recall_at_5 | |
value: 51.449 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/webmasters | |
name: MTEB CQADupstackWebmastersRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 29.203000000000003 | |
- type: map_at_10 | |
value: 38.988 | |
- type: map_at_100 | |
value: 40.986 | |
- type: map_at_1000 | |
value: 41.198 | |
- type: map_at_3 | |
value: 36.069 | |
- type: map_at_5 | |
value: 37.547000000000004 | |
- type: mrr_at_1 | |
value: 35.178 | |
- type: mrr_at_10 | |
value: 43.858999999999995 | |
- type: mrr_at_100 | |
value: 44.938 | |
- type: mrr_at_1000 | |
value: 44.986 | |
- type: mrr_at_3 | |
value: 41.535 | |
- type: mrr_at_5 | |
value: 42.809999999999995 | |
- type: ndcg_at_1 | |
value: 35.178 | |
- type: ndcg_at_10 | |
value: 45.025 | |
- type: ndcg_at_100 | |
value: 51.397999999999996 | |
- type: ndcg_at_1000 | |
value: 53.419000000000004 | |
- type: ndcg_at_3 | |
value: 40.451 | |
- type: ndcg_at_5 | |
value: 42.304 | |
- type: precision_at_1 | |
value: 35.178 | |
- type: precision_at_10 | |
value: 8.538 | |
- type: precision_at_100 | |
value: 1.755 | |
- type: precision_at_1000 | |
value: 0.249 | |
- type: precision_at_3 | |
value: 18.906 | |
- type: precision_at_5 | |
value: 13.241 | |
- type: recall_at_1 | |
value: 29.203000000000003 | |
- type: recall_at_10 | |
value: 55.876999999999995 | |
- type: recall_at_100 | |
value: 83.234 | |
- type: recall_at_1000 | |
value: 96.056 | |
- type: recall_at_3 | |
value: 42.472 | |
- type: recall_at_5 | |
value: 47.78 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/wordpress | |
name: MTEB CQADupstackWordpressRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 24.202 | |
- type: map_at_10 | |
value: 32.021 | |
- type: map_at_100 | |
value: 33.217999999999996 | |
- type: map_at_1000 | |
value: 33.323 | |
- type: map_at_3 | |
value: 29.359 | |
- type: map_at_5 | |
value: 30.792 | |
- type: mrr_at_1 | |
value: 26.802 | |
- type: mrr_at_10 | |
value: 34.577999999999996 | |
- type: mrr_at_100 | |
value: 35.65 | |
- type: mrr_at_1000 | |
value: 35.724000000000004 | |
- type: mrr_at_3 | |
value: 32.286 | |
- type: mrr_at_5 | |
value: 33.506 | |
- type: ndcg_at_1 | |
value: 26.802 | |
- type: ndcg_at_10 | |
value: 36.882999999999996 | |
- type: ndcg_at_100 | |
value: 42.321 | |
- type: ndcg_at_1000 | |
value: 44.906 | |
- type: ndcg_at_3 | |
value: 31.804 | |
- type: ndcg_at_5 | |
value: 34.098 | |
- type: precision_at_1 | |
value: 26.802 | |
- type: precision_at_10 | |
value: 5.7860000000000005 | |
- type: precision_at_100 | |
value: 0.9079999999999999 | |
- type: precision_at_1000 | |
value: 0.125 | |
- type: precision_at_3 | |
value: 13.494 | |
- type: precision_at_5 | |
value: 9.464 | |
- type: recall_at_1 | |
value: 24.202 | |
- type: recall_at_10 | |
value: 49.516 | |
- type: recall_at_100 | |
value: 73.839 | |
- type: recall_at_1000 | |
value: 92.995 | |
- type: recall_at_3 | |
value: 35.502 | |
- type: recall_at_5 | |
value: 41.183 | |
- task: | |
type: Retrieval | |
dataset: | |
type: climate-fever | |
name: MTEB ClimateFEVER | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 12.651000000000002 | |
- type: map_at_10 | |
value: 21.773 | |
- type: map_at_100 | |
value: 23.901 | |
- type: map_at_1000 | |
value: 24.096999999999998 | |
- type: map_at_3 | |
value: 18.012 | |
- type: map_at_5 | |
value: 19.979 | |
- type: mrr_at_1 | |
value: 28.143 | |
- type: mrr_at_10 | |
value: 40.772999999999996 | |
- type: mrr_at_100 | |
value: 41.735 | |
- type: mrr_at_1000 | |
value: 41.768 | |
- type: mrr_at_3 | |
value: 37.458999999999996 | |
- type: mrr_at_5 | |
value: 39.528 | |
- type: ndcg_at_1 | |
value: 28.143 | |
- type: ndcg_at_10 | |
value: 30.705 | |
- type: ndcg_at_100 | |
value: 38.554 | |
- type: ndcg_at_1000 | |
value: 41.846 | |
- type: ndcg_at_3 | |
value: 24.954 | |
- type: ndcg_at_5 | |
value: 27.12 | |
- type: precision_at_1 | |
value: 28.143 | |
- type: precision_at_10 | |
value: 9.622 | |
- type: precision_at_100 | |
value: 1.8030000000000002 | |
- type: precision_at_1000 | |
value: 0.242 | |
- type: precision_at_3 | |
value: 18.654 | |
- type: precision_at_5 | |
value: 14.567 | |
- type: recall_at_1 | |
value: 12.651000000000002 | |
- type: recall_at_10 | |
value: 37.24 | |
- type: recall_at_100 | |
value: 63.660000000000004 | |
- type: recall_at_1000 | |
value: 81.878 | |
- type: recall_at_3 | |
value: 23.205000000000002 | |
- type: recall_at_5 | |
value: 29.081000000000003 | |
- task: | |
type: Retrieval | |
dataset: | |
type: dbpedia-entity | |
name: MTEB DBPedia | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 10.075000000000001 | |
- type: map_at_10 | |
value: 23.344 | |
- type: map_at_100 | |
value: 33.219 | |
- type: map_at_1000 | |
value: 35.165 | |
- type: map_at_3 | |
value: 15.857 | |
- type: map_at_5 | |
value: 19.195999999999998 | |
- type: mrr_at_1 | |
value: 74.5 | |
- type: mrr_at_10 | |
value: 81.056 | |
- type: mrr_at_100 | |
value: 81.281 | |
- type: mrr_at_1000 | |
value: 81.285 | |
- type: mrr_at_3 | |
value: 79.667 | |
- type: mrr_at_5 | |
value: 80.529 | |
- type: ndcg_at_1 | |
value: 62.125 | |
- type: ndcg_at_10 | |
value: 48.416 | |
- type: ndcg_at_100 | |
value: 52.842999999999996 | |
- type: ndcg_at_1000 | |
value: 60.318000000000005 | |
- type: ndcg_at_3 | |
value: 52.381 | |
- type: ndcg_at_5 | |
value: 50.439 | |
- type: precision_at_1 | |
value: 74.5 | |
- type: precision_at_10 | |
value: 38.975 | |
- type: precision_at_100 | |
value: 12.046999999999999 | |
- type: precision_at_1000 | |
value: 2.3369999999999997 | |
- type: precision_at_3 | |
value: 55.833 | |
- type: precision_at_5 | |
value: 49.2 | |
- type: recall_at_1 | |
value: 10.075000000000001 | |
- type: recall_at_10 | |
value: 29.470000000000002 | |
- type: recall_at_100 | |
value: 59.09100000000001 | |
- type: recall_at_1000 | |
value: 82.555 | |
- type: recall_at_3 | |
value: 17.058 | |
- type: recall_at_5 | |
value: 22.148 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/emotion | |
name: MTEB EmotionClassification | |
config: default | |
split: test | |
revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 | |
metrics: | |
- type: accuracy | |
value: 51.70999999999999 | |
- type: f1 | |
value: 46.808328210555985 | |
- task: | |
type: Retrieval | |
dataset: | |
type: fever | |
name: MTEB FEVER | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 80.026 | |
- type: map_at_10 | |
value: 86.856 | |
- type: map_at_100 | |
value: 87.04899999999999 | |
- type: map_at_1000 | |
value: 87.062 | |
- type: map_at_3 | |
value: 85.964 | |
- type: map_at_5 | |
value: 86.53699999999999 | |
- type: mrr_at_1 | |
value: 86.169 | |
- type: mrr_at_10 | |
value: 91.569 | |
- type: mrr_at_100 | |
value: 91.619 | |
- type: mrr_at_1000 | |
value: 91.619 | |
- type: mrr_at_3 | |
value: 91.12700000000001 | |
- type: mrr_at_5 | |
value: 91.45400000000001 | |
- type: ndcg_at_1 | |
value: 86.169 | |
- type: ndcg_at_10 | |
value: 89.92599999999999 | |
- type: ndcg_at_100 | |
value: 90.565 | |
- type: ndcg_at_1000 | |
value: 90.762 | |
- type: ndcg_at_3 | |
value: 88.673 | |
- type: ndcg_at_5 | |
value: 89.396 | |
- type: precision_at_1 | |
value: 86.169 | |
- type: precision_at_10 | |
value: 10.530000000000001 | |
- type: precision_at_100 | |
value: 1.107 | |
- type: precision_at_1000 | |
value: 0.11399999999999999 | |
- type: precision_at_3 | |
value: 33.303 | |
- type: precision_at_5 | |
value: 20.528 | |
- type: recall_at_1 | |
value: 80.026 | |
- type: recall_at_10 | |
value: 94.781 | |
- type: recall_at_100 | |
value: 97.209 | |
- type: recall_at_1000 | |
value: 98.38 | |
- type: recall_at_3 | |
value: 91.34299999999999 | |
- type: recall_at_5 | |
value: 93.256 | |
- task: | |
type: Retrieval | |
dataset: | |
type: fiqa | |
name: MTEB FiQA2018 | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 26.222 | |
- type: map_at_10 | |
value: 42.833 | |
- type: map_at_100 | |
value: 44.935 | |
- type: map_at_1000 | |
value: 45.079 | |
- type: map_at_3 | |
value: 37.016 | |
- type: map_at_5 | |
value: 40.264 | |
- type: mrr_at_1 | |
value: 50.617000000000004 | |
- type: mrr_at_10 | |
value: 58.799 | |
- type: mrr_at_100 | |
value: 59.455999999999996 | |
- type: mrr_at_1000 | |
value: 59.48 | |
- type: mrr_at_3 | |
value: 56.172999999999995 | |
- type: mrr_at_5 | |
value: 57.724 | |
- type: ndcg_at_1 | |
value: 50.617000000000004 | |
- type: ndcg_at_10 | |
value: 51.281 | |
- type: ndcg_at_100 | |
value: 57.922 | |
- type: ndcg_at_1000 | |
value: 60.141 | |
- type: ndcg_at_3 | |
value: 46.19 | |
- type: ndcg_at_5 | |
value: 47.998000000000005 | |
- type: precision_at_1 | |
value: 50.617000000000004 | |
- type: precision_at_10 | |
value: 14.321 | |
- type: precision_at_100 | |
value: 2.136 | |
- type: precision_at_1000 | |
value: 0.253 | |
- type: precision_at_3 | |
value: 30.503999999999998 | |
- type: precision_at_5 | |
value: 22.685 | |
- type: recall_at_1 | |
value: 26.222 | |
- type: recall_at_10 | |
value: 59.241 | |
- type: recall_at_100 | |
value: 83.102 | |
- type: recall_at_1000 | |
value: 96.318 | |
- type: recall_at_3 | |
value: 41.461999999999996 | |
- type: recall_at_5 | |
value: 49.389 | |
- task: | |
type: Retrieval | |
dataset: | |
type: hotpotqa | |
name: MTEB HotpotQA | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 38.379000000000005 | |
- type: map_at_10 | |
value: 65.397 | |
- type: map_at_100 | |
value: 66.347 | |
- type: map_at_1000 | |
value: 66.39699999999999 | |
- type: map_at_3 | |
value: 61.637 | |
- type: map_at_5 | |
value: 63.966 | |
- type: mrr_at_1 | |
value: 76.77199999999999 | |
- type: mrr_at_10 | |
value: 82.797 | |
- type: mrr_at_100 | |
value: 83.011 | |
- type: mrr_at_1000 | |
value: 83.018 | |
- type: mrr_at_3 | |
value: 81.711 | |
- type: mrr_at_5 | |
value: 82.405 | |
- type: ndcg_at_1 | |
value: 76.759 | |
- type: ndcg_at_10 | |
value: 72.987 | |
- type: ndcg_at_100 | |
value: 76.209 | |
- type: ndcg_at_1000 | |
value: 77.137 | |
- type: ndcg_at_3 | |
value: 67.655 | |
- type: ndcg_at_5 | |
value: 70.6 | |
- type: precision_at_1 | |
value: 76.759 | |
- type: precision_at_10 | |
value: 15.645000000000001 | |
- type: precision_at_100 | |
value: 1.813 | |
- type: precision_at_1000 | |
value: 0.193 | |
- type: precision_at_3 | |
value: 44.299 | |
- type: precision_at_5 | |
value: 28.902 | |
- type: recall_at_1 | |
value: 38.379000000000005 | |
- type: recall_at_10 | |
value: 78.224 | |
- type: recall_at_100 | |
value: 90.628 | |
- type: recall_at_1000 | |
value: 96.691 | |
- type: recall_at_3 | |
value: 66.448 | |
- type: recall_at_5 | |
value: 72.255 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/imdb | |
name: MTEB ImdbClassification | |
config: default | |
split: test | |
revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 | |
metrics: | |
- type: accuracy | |
value: 85.77920000000002 | |
- type: ap | |
value: 81.04289405069312 | |
- type: f1 | |
value: 85.73430221016837 | |
- task: | |
type: Retrieval | |
dataset: | |
type: msmarco | |
name: MTEB MSMARCO | |
config: default | |
split: dev | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 21.178 | |
- type: map_at_10 | |
value: 34.122 | |
- type: map_at_100 | |
value: 35.337 | |
- type: map_at_1000 | |
value: 35.38 | |
- type: map_at_3 | |
value: 29.933 | |
- type: map_at_5 | |
value: 32.342999999999996 | |
- type: mrr_at_1 | |
value: 21.791 | |
- type: mrr_at_10 | |
value: 34.681 | |
- type: mrr_at_100 | |
value: 35.832 | |
- type: mrr_at_1000 | |
value: 35.869 | |
- type: mrr_at_3 | |
value: 30.592000000000002 | |
- type: mrr_at_5 | |
value: 32.946999999999996 | |
- type: ndcg_at_1 | |
value: 21.791 | |
- type: ndcg_at_10 | |
value: 41.455 | |
- type: ndcg_at_100 | |
value: 47.25 | |
- type: ndcg_at_1000 | |
value: 48.307 | |
- type: ndcg_at_3 | |
value: 32.963 | |
- type: ndcg_at_5 | |
value: 37.238 | |
- type: precision_at_1 | |
value: 21.791 | |
- type: precision_at_10 | |
value: 6.701 | |
- type: precision_at_100 | |
value: 0.96 | |
- type: precision_at_1000 | |
value: 0.105 | |
- type: precision_at_3 | |
value: 14.202 | |
- type: precision_at_5 | |
value: 10.693 | |
- type: recall_at_1 | |
value: 21.178 | |
- type: recall_at_10 | |
value: 64.13 | |
- type: recall_at_100 | |
value: 90.793 | |
- type: recall_at_1000 | |
value: 98.817 | |
- type: recall_at_3 | |
value: 41.08 | |
- type: recall_at_5 | |
value: 51.312999999999995 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/mtop_domain | |
name: MTEB MTOPDomainClassification (en) | |
config: en | |
split: test | |
revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf | |
metrics: | |
- type: accuracy | |
value: 95.56543547651619 | |
- type: f1 | |
value: 95.18113603357101 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/mtop_intent | |
name: MTEB MTOPIntentClassification (en) | |
config: en | |
split: test | |
revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba | |
metrics: | |
- type: accuracy | |
value: 82.81121751025992 | |
- type: f1 | |
value: 68.10945432103077 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_massive_intent | |
name: MTEB MassiveIntentClassification (en) | |
config: en | |
split: test | |
revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 | |
metrics: | |
- type: accuracy | |
value: 78.05985205110962 | |
- type: f1 | |
value: 75.94480942195571 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_massive_scenario | |
name: MTEB MassiveScenarioClassification (en) | |
config: en | |
split: test | |
revision: 7d571f92784cd94a019292a1f45445077d0ef634 | |
metrics: | |
- type: accuracy | |
value: 81.3483523873571 | |
- type: f1 | |
value: 81.12756796889384 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/medrxiv-clustering-p2p | |
name: MTEB MedrxivClusteringP2P | |
config: default | |
split: test | |
revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 | |
metrics: | |
- type: v_measure | |
value: 32.22549249333914 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/medrxiv-clustering-s2s | |
name: MTEB MedrxivClusteringS2S | |
config: default | |
split: test | |
revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 | |
metrics: | |
- type: v_measure | |
value: 31.367740973522007 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/mind_small | |
name: MTEB MindSmallReranking | |
config: default | |
split: test | |
revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 | |
metrics: | |
- type: map | |
value: 31.341185395073968 | |
- type: mrr | |
value: 32.38730713652477 | |
- task: | |
type: Retrieval | |
dataset: | |
type: nfcorpus | |
name: MTEB NFCorpus | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 6.02 | |
- type: map_at_10 | |
value: 15.265999999999998 | |
- type: map_at_100 | |
value: 19.737 | |
- type: map_at_1000 | |
value: 21.468 | |
- type: map_at_3 | |
value: 10.929 | |
- type: map_at_5 | |
value: 12.839999999999998 | |
- type: mrr_at_1 | |
value: 50.464 | |
- type: mrr_at_10 | |
value: 59.622 | |
- type: mrr_at_100 | |
value: 60.028999999999996 | |
- type: mrr_at_1000 | |
value: 60.06700000000001 | |
- type: mrr_at_3 | |
value: 57.018 | |
- type: mrr_at_5 | |
value: 58.550000000000004 | |
- type: ndcg_at_1 | |
value: 49.226 | |
- type: ndcg_at_10 | |
value: 40.329 | |
- type: ndcg_at_100 | |
value: 37.002 | |
- type: ndcg_at_1000 | |
value: 45.781 | |
- type: ndcg_at_3 | |
value: 45.165 | |
- type: ndcg_at_5 | |
value: 43.241 | |
- type: precision_at_1 | |
value: 50.464 | |
- type: precision_at_10 | |
value: 30.372 | |
- type: precision_at_100 | |
value: 9.663 | |
- type: precision_at_1000 | |
value: 2.305 | |
- type: precision_at_3 | |
value: 42.208 | |
- type: precision_at_5 | |
value: 37.771 | |
- type: recall_at_1 | |
value: 6.02 | |
- type: recall_at_10 | |
value: 20.48 | |
- type: recall_at_100 | |
value: 37.554 | |
- type: recall_at_1000 | |
value: 68.953 | |
- type: recall_at_3 | |
value: 12.353 | |
- type: recall_at_5 | |
value: 15.497 | |
- task: | |
type: Retrieval | |
dataset: | |
type: nq | |
name: MTEB NQ | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 36.073 | |
- type: map_at_10 | |
value: 53.227999999999994 | |
- type: map_at_100 | |
value: 54.13400000000001 | |
- type: map_at_1000 | |
value: 54.147999999999996 | |
- type: map_at_3 | |
value: 48.861 | |
- type: map_at_5 | |
value: 51.473 | |
- type: mrr_at_1 | |
value: 40.701 | |
- type: mrr_at_10 | |
value: 55.667 | |
- type: mrr_at_100 | |
value: 56.306 | |
- type: mrr_at_1000 | |
value: 56.315000000000005 | |
- type: mrr_at_3 | |
value: 52.245 | |
- type: mrr_at_5 | |
value: 54.39000000000001 | |
- type: ndcg_at_1 | |
value: 40.701 | |
- type: ndcg_at_10 | |
value: 61.244 | |
- type: ndcg_at_100 | |
value: 64.767 | |
- type: ndcg_at_1000 | |
value: 65.031 | |
- type: ndcg_at_3 | |
value: 53.248 | |
- type: ndcg_at_5 | |
value: 57.538999999999994 | |
- type: precision_at_1 | |
value: 40.701 | |
- type: precision_at_10 | |
value: 9.93 | |
- type: precision_at_100 | |
value: 1.187 | |
- type: precision_at_1000 | |
value: 0.121 | |
- type: precision_at_3 | |
value: 24.343 | |
- type: precision_at_5 | |
value: 17.092 | |
- type: recall_at_1 | |
value: 36.073 | |
- type: recall_at_10 | |
value: 83.017 | |
- type: recall_at_100 | |
value: 97.762 | |
- type: recall_at_1000 | |
value: 99.614 | |
- type: recall_at_3 | |
value: 62.529 | |
- type: recall_at_5 | |
value: 72.361 | |
- task: | |
type: Retrieval | |
dataset: | |
type: quora | |
name: MTEB QuoraRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 66.678 | |
- type: map_at_10 | |
value: 81.26100000000001 | |
- type: map_at_100 | |
value: 81.972 | |
- type: map_at_1000 | |
value: 81.987 | |
- type: map_at_3 | |
value: 78.05199999999999 | |
- type: map_at_5 | |
value: 80.01599999999999 | |
- type: mrr_at_1 | |
value: 76.73 | |
- type: mrr_at_10 | |
value: 84.178 | |
- type: mrr_at_100 | |
value: 84.31 | |
- type: mrr_at_1000 | |
value: 84.311 | |
- type: mrr_at_3 | |
value: 82.91 | |
- type: mrr_at_5 | |
value: 83.75399999999999 | |
- type: ndcg_at_1 | |
value: 76.73 | |
- type: ndcg_at_10 | |
value: 85.59 | |
- type: ndcg_at_100 | |
value: 87.041 | |
- type: ndcg_at_1000 | |
value: 87.141 | |
- type: ndcg_at_3 | |
value: 82.122 | |
- type: ndcg_at_5 | |
value: 83.975 | |
- type: precision_at_1 | |
value: 76.73 | |
- type: precision_at_10 | |
value: 13.241 | |
- type: precision_at_100 | |
value: 1.537 | |
- type: precision_at_1000 | |
value: 0.157 | |
- type: precision_at_3 | |
value: 36.233 | |
- type: precision_at_5 | |
value: 23.988 | |
- type: recall_at_1 | |
value: 66.678 | |
- type: recall_at_10 | |
value: 94.512 | |
- type: recall_at_100 | |
value: 99.516 | |
- type: recall_at_1000 | |
value: 99.995 | |
- type: recall_at_3 | |
value: 84.77900000000001 | |
- type: recall_at_5 | |
value: 89.89399999999999 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/reddit-clustering | |
name: MTEB RedditClustering | |
config: default | |
split: test | |
revision: 24640382cdbf8abc73003fb0fa6d111a705499eb | |
metrics: | |
- type: v_measure | |
value: 61.0961342812016 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/reddit-clustering-p2p | |
name: MTEB RedditClusteringP2P | |
config: default | |
split: test | |
revision: 282350215ef01743dc01b456c7f5241fa8937f16 | |
metrics: | |
- type: v_measure | |
value: 64.523271835229 | |
- task: | |
type: Retrieval | |
dataset: | |
type: scidocs | |
name: MTEB SCIDOCS | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 4.7379999999999995 | |
- type: map_at_10 | |
value: 12.540999999999999 | |
- type: map_at_100 | |
value: 15.012 | |
- type: map_at_1000 | |
value: 15.339 | |
- type: map_at_3 | |
value: 8.809000000000001 | |
- type: map_at_5 | |
value: 10.774000000000001 | |
- type: mrr_at_1 | |
value: 23.400000000000002 | |
- type: mrr_at_10 | |
value: 35.175 | |
- type: mrr_at_100 | |
value: 36.345 | |
- type: mrr_at_1000 | |
value: 36.393 | |
- type: mrr_at_3 | |
value: 31.867 | |
- type: mrr_at_5 | |
value: 33.742 | |
- type: ndcg_at_1 | |
value: 23.400000000000002 | |
- type: ndcg_at_10 | |
value: 21.05 | |
- type: ndcg_at_100 | |
value: 30.087999999999997 | |
- type: ndcg_at_1000 | |
value: 35.421 | |
- type: ndcg_at_3 | |
value: 19.819 | |
- type: ndcg_at_5 | |
value: 17.576 | |
- type: precision_at_1 | |
value: 23.400000000000002 | |
- type: precision_at_10 | |
value: 11.01 | |
- type: precision_at_100 | |
value: 2.393 | |
- type: precision_at_1000 | |
value: 0.367 | |
- type: precision_at_3 | |
value: 18.767 | |
- type: precision_at_5 | |
value: 15.72 | |
- type: recall_at_1 | |
value: 4.7379999999999995 | |
- type: recall_at_10 | |
value: 22.343 | |
- type: recall_at_100 | |
value: 48.545 | |
- type: recall_at_1000 | |
value: 74.422 | |
- type: recall_at_3 | |
value: 11.428 | |
- type: recall_at_5 | |
value: 15.952 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sickr-sts | |
name: MTEB SICK-R | |
config: default | |
split: test | |
revision: a6ea5a8cab320b040a23452cc28066d9beae2cee | |
metrics: | |
- type: cos_sim_spearman | |
value: 83.00728009929533 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts12-sts | |
name: MTEB STS12 | |
config: default | |
split: test | |
revision: a0d554a64d88156834ff5ae9920b964011b16384 | |
metrics: | |
- type: cos_sim_spearman | |
value: 78.85484854952163 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts13-sts | |
name: MTEB STS13 | |
config: default | |
split: test | |
revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca | |
metrics: | |
- type: cos_sim_spearman | |
value: 86.84017260596792 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts14-sts | |
name: MTEB STS14 | |
config: default | |
split: test | |
revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 | |
metrics: | |
- type: cos_sim_spearman | |
value: 84.04244912638237 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts15-sts | |
name: MTEB STS15 | |
config: default | |
split: test | |
revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 | |
metrics: | |
- type: cos_sim_spearman | |
value: 88.71661848841296 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts16-sts | |
name: MTEB STS16 | |
config: default | |
split: test | |
revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 | |
metrics: | |
- type: cos_sim_spearman | |
value: 86.79243876108002 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts17-crosslingual-sts | |
name: MTEB STS17 (en-en) | |
config: en-en | |
split: test | |
revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d | |
metrics: | |
- type: cos_sim_spearman | |
value: 90.63340320875899 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts22-crosslingual-sts | |
name: MTEB STS22 (en) | |
config: en | |
split: test | |
revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 | |
metrics: | |
- type: cos_sim_spearman | |
value: 67.55467310427919 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/stsbenchmark-sts | |
name: MTEB STSBenchmark | |
config: default | |
split: test | |
revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 | |
metrics: | |
- type: cos_sim_spearman | |
value: 88.7218677688666 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/scidocs-reranking | |
name: MTEB SciDocsRR | |
config: default | |
split: test | |
revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab | |
metrics: | |
- type: map | |
value: 84.03370829809433 | |
- type: mrr | |
value: 95.8981740844486 | |
- task: | |
type: Retrieval | |
dataset: | |
type: scifact | |
name: MTEB SciFact | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 61.594 | |
- type: map_at_10 | |
value: 72.482 | |
- type: map_at_100 | |
value: 72.89 | |
- type: map_at_1000 | |
value: 72.905 | |
- type: map_at_3 | |
value: 69.694 | |
- type: map_at_5 | |
value: 71.552 | |
- type: mrr_at_1 | |
value: 64.333 | |
- type: mrr_at_10 | |
value: 73.449 | |
- type: mrr_at_100 | |
value: 73.68599999999999 | |
- type: mrr_at_1000 | |
value: 73.70100000000001 | |
- type: mrr_at_3 | |
value: 71.5 | |
- type: mrr_at_5 | |
value: 72.76700000000001 | |
- type: ndcg_at_1 | |
value: 64.333 | |
- type: ndcg_at_10 | |
value: 77.304 | |
- type: ndcg_at_100 | |
value: 78.82400000000001 | |
- type: ndcg_at_1000 | |
value: 79.143 | |
- type: ndcg_at_3 | |
value: 72.85000000000001 | |
- type: ndcg_at_5 | |
value: 75.24 | |
- type: precision_at_1 | |
value: 64.333 | |
- type: precision_at_10 | |
value: 10.233 | |
- type: precision_at_100 | |
value: 1.107 | |
- type: precision_at_1000 | |
value: 0.11299999999999999 | |
- type: precision_at_3 | |
value: 28.666999999999998 | |
- type: precision_at_5 | |
value: 18.933 | |
- type: recall_at_1 | |
value: 61.594 | |
- type: recall_at_10 | |
value: 90.967 | |
- type: recall_at_100 | |
value: 97.667 | |
- type: recall_at_1000 | |
value: 100.0 | |
- type: recall_at_3 | |
value: 78.889 | |
- type: recall_at_5 | |
value: 84.678 | |
- task: | |
type: PairClassification | |
dataset: | |
type: mteb/sprintduplicatequestions-pairclassification | |
name: MTEB SprintDuplicateQuestions | |
config: default | |
split: test | |
revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 | |
metrics: | |
- type: cos_sim_accuracy | |
value: 99.87029702970297 | |
- type: cos_sim_ap | |
value: 96.83157940825447 | |
- type: cos_sim_f1 | |
value: 93.43358395989975 | |
- type: cos_sim_precision | |
value: 93.66834170854271 | |
- type: cos_sim_recall | |
value: 93.2 | |
- type: dot_accuracy | |
value: 99.74059405940594 | |
- type: dot_ap | |
value: 92.64621145397966 | |
- type: dot_f1 | |
value: 86.92614770459082 | |
- type: dot_precision | |
value: 86.75298804780877 | |
- type: dot_recall | |
value: 87.1 | |
- type: euclidean_accuracy | |
value: 99.86336633663366 | |
- type: euclidean_ap | |
value: 96.65013202788877 | |
- type: euclidean_f1 | |
value: 93.05835010060363 | |
- type: euclidean_precision | |
value: 93.62348178137651 | |
- type: euclidean_recall | |
value: 92.5 | |
- type: manhattan_accuracy | |
value: 99.86435643564356 | |
- type: manhattan_ap | |
value: 96.66170584513262 | |
- type: manhattan_f1 | |
value: 93.11903566047214 | |
- type: manhattan_precision | |
value: 93.54187689202826 | |
- type: manhattan_recall | |
value: 92.7 | |
- type: max_accuracy | |
value: 99.87029702970297 | |
- type: max_ap | |
value: 96.83157940825447 | |
- type: max_f1 | |
value: 93.43358395989975 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/stackexchange-clustering | |
name: MTEB StackExchangeClustering | |
config: default | |
split: test | |
revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 | |
metrics: | |
- type: v_measure | |
value: 67.98137643571387 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/stackexchange-clustering-p2p | |
name: MTEB StackExchangeClusteringP2P | |
config: default | |
split: test | |
revision: 815ca46b2622cec33ccafc3735d572c266efdb44 | |
metrics: | |
- type: v_measure | |
value: 33.203165154741 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/stackoverflowdupquestions-reranking | |
name: MTEB StackOverflowDupQuestions | |
config: default | |
split: test | |
revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 | |
metrics: | |
- type: map | |
value: 51.023136529441835 | |
- type: mrr | |
value: 51.78392379679144 | |
- task: | |
type: Summarization | |
dataset: | |
type: mteb/summeval | |
name: MTEB SummEval | |
config: default | |
split: test | |
revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c | |
metrics: | |
- type: cos_sim_pearson | |
value: 30.996218041439295 | |
- type: cos_sim_spearman | |
value: 28.49337441341285 | |
- type: dot_pearson | |
value: 28.69511068705681 | |
- type: dot_spearman | |
value: 28.738712641821696 | |
- task: | |
type: Retrieval | |
dataset: | |
type: trec-covid | |
name: MTEB TRECCOVID | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 0.23500000000000001 | |
- type: map_at_10 | |
value: 2.07 | |
- type: map_at_100 | |
value: 13.056999999999999 | |
- type: map_at_1000 | |
value: 32.87 | |
- type: map_at_3 | |
value: 0.662 | |
- type: map_at_5 | |
value: 1.0630000000000002 | |
- type: mrr_at_1 | |
value: 86.0 | |
- type: mrr_at_10 | |
value: 91.286 | |
- type: mrr_at_100 | |
value: 91.286 | |
- type: mrr_at_1000 | |
value: 91.286 | |
- type: mrr_at_3 | |
value: 91.0 | |
- type: mrr_at_5 | |
value: 91.0 | |
- type: ndcg_at_1 | |
value: 82.0 | |
- type: ndcg_at_10 | |
value: 79.253 | |
- type: ndcg_at_100 | |
value: 64.042 | |
- type: ndcg_at_1000 | |
value: 59.073 | |
- type: ndcg_at_3 | |
value: 80.235 | |
- type: ndcg_at_5 | |
value: 79.353 | |
- type: precision_at_1 | |
value: 86.0 | |
- type: precision_at_10 | |
value: 84.39999999999999 | |
- type: precision_at_100 | |
value: 65.92 | |
- type: precision_at_1000 | |
value: 26.05 | |
- type: precision_at_3 | |
value: 86.0 | |
- type: precision_at_5 | |
value: 84.39999999999999 | |
- type: recall_at_1 | |
value: 0.23500000000000001 | |
- type: recall_at_10 | |
value: 2.26 | |
- type: recall_at_100 | |
value: 16.271 | |
- type: recall_at_1000 | |
value: 56.074999999999996 | |
- type: recall_at_3 | |
value: 0.694 | |
- type: recall_at_5 | |
value: 1.1280000000000001 | |
- task: | |
type: Retrieval | |
dataset: | |
type: webis-touche2020 | |
name: MTEB Touche2020 | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 1.629 | |
- type: map_at_10 | |
value: 6.444999999999999 | |
- type: map_at_100 | |
value: 12.561 | |
- type: map_at_1000 | |
value: 14.183000000000002 | |
- type: map_at_3 | |
value: 3.1780000000000004 | |
- type: map_at_5 | |
value: 4.0649999999999995 | |
- type: mrr_at_1 | |
value: 20.408 | |
- type: mrr_at_10 | |
value: 31.601000000000003 | |
- type: mrr_at_100 | |
value: 33.33 | |
- type: mrr_at_1000 | |
value: 33.337 | |
- type: mrr_at_3 | |
value: 27.891 | |
- type: mrr_at_5 | |
value: 29.626 | |
- type: ndcg_at_1 | |
value: 19.387999999999998 | |
- type: ndcg_at_10 | |
value: 16.921 | |
- type: ndcg_at_100 | |
value: 31.762 | |
- type: ndcg_at_1000 | |
value: 43.723 | |
- type: ndcg_at_3 | |
value: 15.834999999999999 | |
- type: ndcg_at_5 | |
value: 15.158 | |
- type: precision_at_1 | |
value: 20.408 | |
- type: precision_at_10 | |
value: 15.714 | |
- type: precision_at_100 | |
value: 7.306 | |
- type: precision_at_1000 | |
value: 1.539 | |
- type: precision_at_3 | |
value: 16.326999999999998 | |
- type: precision_at_5 | |
value: 15.101999999999999 | |
- type: recall_at_1 | |
value: 1.629 | |
- type: recall_at_10 | |
value: 12.283 | |
- type: recall_at_100 | |
value: 45.867999999999995 | |
- type: recall_at_1000 | |
value: 83.557 | |
- type: recall_at_3 | |
value: 3.801 | |
- type: recall_at_5 | |
value: 5.763 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/toxic_conversations_50k | |
name: MTEB ToxicConversationsClassification | |
config: default | |
split: test | |
revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c | |
metrics: | |
- type: accuracy | |
value: 71.01119999999999 | |
- type: ap | |
value: 14.776705879525846 | |
- type: f1 | |
value: 54.96628145160803 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/tweet_sentiment_extraction | |
name: MTEB TweetSentimentExtractionClassification | |
config: default | |
split: test | |
revision: d604517c81ca91fe16a244d1248fc021f9ecee7a | |
metrics: | |
- type: accuracy | |
value: 61.114883984153934 | |
- type: f1 | |
value: 61.250947755016604 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/twentynewsgroups-clustering | |
name: MTEB TwentyNewsgroupsClustering | |
config: default | |
split: test | |
revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 | |
metrics: | |
- type: v_measure | |
value: 51.03991134069674 | |
- task: | |
type: PairClassification | |
dataset: | |
type: mteb/twittersemeval2015-pairclassification | |
name: MTEB TwitterSemEval2015 | |
config: default | |
split: test | |
revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 | |
metrics: | |
- type: cos_sim_accuracy | |
value: 88.13256243666925 | |
- type: cos_sim_ap | |
value: 80.69819368353635 | |
- type: cos_sim_f1 | |
value: 73.49014621741895 | |
- type: cos_sim_precision | |
value: 70.920245398773 | |
- type: cos_sim_recall | |
value: 76.2532981530343 | |
- type: dot_accuracy | |
value: 86.08809679918936 | |
- type: dot_ap | |
value: 74.41500765551534 | |
- type: dot_f1 | |
value: 69.3204365079365 | |
- type: dot_precision | |
value: 65.39541413196069 | |
- type: dot_recall | |
value: 73.7467018469657 | |
- type: euclidean_accuracy | |
value: 88.15640460153782 | |
- type: euclidean_ap | |
value: 80.31937915172527 | |
- type: euclidean_f1 | |
value: 73.57214428857716 | |
- type: euclidean_precision | |
value: 70.02861230329042 | |
- type: euclidean_recall | |
value: 77.4934036939314 | |
- type: manhattan_accuracy | |
value: 88.15044406032068 | |
- type: manhattan_ap | |
value: 80.30776043635841 | |
- type: manhattan_f1 | |
value: 73.54741971760589 | |
- type: manhattan_precision | |
value: 69.85521006408734 | |
- type: manhattan_recall | |
value: 77.65171503957784 | |
- type: max_accuracy | |
value: 88.15640460153782 | |
- type: max_ap | |
value: 80.69819368353635 | |
- type: max_f1 | |
value: 73.57214428857716 | |
- task: | |
type: PairClassification | |
dataset: | |
type: mteb/twitterurlcorpus-pairclassification | |
name: MTEB TwitterURLCorpus | |
config: default | |
split: test | |
revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf | |
metrics: | |
- type: cos_sim_accuracy | |
value: 89.37982691038926 | |
- type: cos_sim_ap | |
value: 86.5585074386676 | |
- type: cos_sim_f1 | |
value: 79.1182953710507 | |
- type: cos_sim_precision | |
value: 75.66048341765037 | |
- type: cos_sim_recall | |
value: 82.90729904527257 | |
- type: dot_accuracy | |
value: 87.75177552683665 | |
- type: dot_ap | |
value: 82.73501819446388 | |
- type: dot_f1 | |
value: 76.31569570639587 | |
- type: dot_precision | |
value: 71.02871924122837 | |
- type: dot_recall | |
value: 82.45303356944872 | |
- type: euclidean_accuracy | |
value: 89.30220825086352 | |
- type: euclidean_ap | |
value: 86.43839637395196 | |
- type: euclidean_f1 | |
value: 79.12071479307637 | |
- type: euclidean_precision | |
value: 76.89848121502799 | |
- type: euclidean_recall | |
value: 81.4752078842008 | |
- type: manhattan_accuracy | |
value: 89.30997011681609 | |
- type: manhattan_ap | |
value: 86.43582668119362 | |
- type: manhattan_f1 | |
value: 79.11144297181258 | |
- type: manhattan_precision | |
value: 76.79205624411104 | |
- type: manhattan_recall | |
value: 81.57530027717893 | |
- type: max_accuracy | |
value: 89.37982691038926 | |
- type: max_ap | |
value: 86.5585074386676 | |
- type: max_f1 | |
value: 79.12071479307637 | |
# LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders | |
> LLM2Vec is a simple recipe to convert decoder-only LLMs into text encoders. It consists of 3 simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. The model can be further fine-tuned to achieve state-of-the-art performance. | |
- **Repository:** https://github.com/McGill-NLP/llm2vec | |
- **Paper:** https://arxiv.org/abs/2404.05961 | |
## Installation | |
```bash | |
pip install llm2vec | |
``` | |
## Usage | |
```python | |
from llm2vec import LLM2Vec | |
import torch | |
from transformers import AutoTokenizer, AutoModel, AutoConfig | |
from peft import PeftModel | |
# Loading base Mistral model, along with custom code that enables bidirectional connections in decoder-only LLMs. MNTP LoRA weights are merged into the base model. | |
tokenizer = AutoTokenizer.from_pretrained( | |
"McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp" | |
) | |
config = AutoConfig.from_pretrained( | |
"McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp", trust_remote_code=True | |
) | |
model = AutoModel.from_pretrained( | |
"McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp", | |
trust_remote_code=True, | |
config=config, | |
torch_dtype=torch.bfloat16, | |
device_map="cuda" if torch.cuda.is_available() else "cpu", | |
) | |
model = PeftModel.from_pretrained( | |
model, | |
"McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp", | |
) | |
model = model.merge_and_unload() # This can take several minutes on cpu | |
# Loading supervised model. This loads the trained LoRA weights on top of MNTP model. Hence the final weights are -- Base model + MNTP (LoRA) + supervised (LoRA). | |
model = PeftModel.from_pretrained( | |
model, "McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp-supervised" | |
) | |
# Wrapper for encoding and pooling operations | |
l2v = LLM2Vec(model, tokenizer, pooling_mode="mean", max_length=512) | |
# Encoding queries using instructions | |
instruction = ( | |
"Given a web search query, retrieve relevant passages that answer the query:" | |
) | |
queries = [ | |
[instruction, "how much protein should a female eat"], | |
[instruction, "summit define"], | |
] | |
q_reps = l2v.encode(queries) | |
# Encoding documents. Instruction are not required for documents | |
documents = [ | |
"As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.", | |
"Definition of summit for English Language Learners. : 1 the highest point of a mountain : the top of a mountain. : 2 the highest level. : 3 a meeting or series of meetings between the leaders of two or more governments.", | |
] | |
d_reps = l2v.encode(documents) | |
# Compute cosine similarity | |
q_reps_norm = torch.nn.functional.normalize(q_reps, p=2, dim=1) | |
d_reps_norm = torch.nn.functional.normalize(d_reps, p=2, dim=1) | |
cos_sim = torch.mm(q_reps_norm, d_reps_norm.transpose(0, 1)) | |
print(cos_sim) | |
""" | |
tensor([[0.5417, 0.0780], | |
[0.0627, 0.5726]]) | |
""" | |
``` | |
## Questions | |
If you have any question about the code, feel free to email Parishad (`[email protected]`) and Vaibhav (`[email protected]`). |