File size: 5,352 Bytes
a418052 dfd64c8 9822383 dfd64c8 a418052 dfd64c8 ef545c8 3b40d1b dfd64c8 3b40d1b dfd64c8 a418052 6ea9a53 a418052 6ea9a53 dfd64c8 a418052 ab216b6 3b40d1b ab216b6 5c9c5e5 a418052 ab216b6 dfd64c8 3b40d1b 5c9c5e5 3b40d1b ab216b6 5c9c5e5 dfd64c8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
---
library_name: transformers
tags:
- cross-encoder
datasets:
- lightonai/ms-marco-en-bge
- juanluisdb/triviaqa-bge-m3-logits
- juanluisdb/nq-bge-m3-logits
language:
- en
base_model:
- cross-encoder/ms-marco-MiniLM-L-6-v2
---
# Model Card for Model ID
This model is finetuned starting from the well-known [ms-marco-MiniLM-L-6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-6-v2) using KL distillation techniques as described [here](https://www.answer.ai/posts/2024-08-13-small-but-mighty-colbert.html),
using [bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) as teacher
# Usage
## Usage with Transformers
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model = AutoModelForSequenceClassification.from_pretrained("juanluisdb/MiniLM-L-6-rerank-m3")
tokenizer = AutoTokenizer.from_pretrained("juanluisdb/MiniLM-L-6-rerank-m3")
features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'], padding=True, truncation=True, return_tensors="pt")
model.eval()
with torch.no_grad():
scores = model(**features).logits
print(scores)
```
## Usage with SentenceTransformers
```python
from sentence_transformers import CrossEncoder
model = CrossEncoder("juanluisdb/MiniLM-L-6-rerank-m3", max_length=512)
scores = model.predict([('Query', 'Paragraph1'), ('Query', 'Paragraph2') , ('Query', 'Paragraph3')])
```
# Evaluation
### BEIR (NDCG@10)
I've run tests on different BEIR datasets. Cross Encoders rerank top100 BM25 results.
| | bm25 | jina-reranker-v1-turbo-en | bge-reranker-v2-m3 | mxbai-rerank-base-v1 | ms-marco-MiniLM-L-6-v2 | MiniLM-L-6-rerank-m3 |
|:---------------|:-------:|:----------------------------:|:---------------------:|:-----------------------:|:-------------------------:|:------------------------------:|
| nq* | 0.305 | 0.533 | **0.597** | 0.535 | 0.523 | 0.580 |
| fever* | 0.638 | 0.852 | 0.857 | 0.767 | 0.801 | **0.867** |
| fiqa | 0.238 | 0.336 | **0.397** | 0.382 | 0.349 | 0.364 |
| trec-covid | 0.589 | 0.774 | 0.784 | **0.830** | 0.741 | 0.738 |
| scidocs | 0.15 | 0.166 | 0.169 | **0.171** | 0.164 | 0.165 |
| scifact | 0.676 | 0.739 | 0.731 | 0.719 | 0.688 | **0.750** |
| nfcorpus | 0.318 | 0.353 | 0.336 | **0.353** | 0.349 | 0.350 |
| hotpotqa | 0.629 | 0.745 | **0.794** | 0.668 | 0.724 | 0.775 |
| dbpedia-entity | 0.319 | 0.421 | **0.445** | 0.416 | 0.445 | 0.444 |
| quora | 0.787 | 0.858 | 0.858 | 0.747 | 0.825 | **0.871** |
| climate-fever | 0.163 | 0.233 | **0.314** | 0.253 | 0.244 | 0.309 |
\* Training splits of NQ and Fever were used as part of the training data.
Comparison with [ablated model](https://huggingface.co/juanluisdb/MiniLM-L-6-rerank-m3-ablated) trained only on MSMarco:
| | ms-marco-MiniLM-L-6-v2 | MiniLM-L-6-rerank-m3-ablated |
|:---------------|:-------------------------:|:--------------------------------------:|
| nq | 0.5234 | **0.5412** |
| fever | 0.8007 | **0.8221** |
| fiqa | 0.349 | **0.3598** |
| trec-covid | **0.741** | 0.7331 |
| scidocs | **0.1638** | 0.163 |
| scifact | 0.688 | **0.7376** |
| nfcorpus | 0.3493 | **0.3495** |
| hotpotqa | 0.7235 | **0.7583** |
| dbpedia-entity | **0.4445** | 0.4382 |
| quora | 0.8251 | **0.8619** |
| climate-fever | 0.2438 | **0.2449** |
# Datasets Used
~900k queries with 32-way triplets were used from these datasets:
* MSMarco
* TriviaQA
* Natural Questions
* FEVER |