Cross-Encoder for MS Marco - ONNX
ONNX versions of Sentence Transformers Cross Encoders to allow ranking without heavy dependencies.
The models were trained on the MS Marco Passage Ranking task.
The models can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See SBERT.net Retrieve & Re-rank for more details.
Models Available
Model Name | Precision | File Name | File Size |
---|---|---|---|
ms-marco-MiniLM-L-4-v2 ONNX | FP32 | ms-marco-MiniLM-L-4-v2-onnx.zip | 70 MB |
ms-marco-MiniLM-L-4-v2 ONNX (Quantized) | INT8 | ms-marco-MiniLM-L-4-v2-onnx-int8.zip | 12.8 MB |
ms-marco-MiniLM-L-6-v2 ONNX | FP32 | ms-marco-MiniLM-L-6-v2-onnx.zip | 83.4 MB |
ms-marco-MiniLM-L-6-v2 ONNX (Quantized) | INT8 | ms-marco-MiniLM-L-6-v2-onnx-int8.zip | 15.2 MB |
Usage with ONNX Runtime
import onnxruntime as ort
from transformers import AutoTokenizer
model_path="ms-marco-MiniLM-L-4-v2-onnx/"
tokenizer = AutoTokenizer.from_pretrained('model_path')
ort_sess = ort.InferenceSession(model_path + "ms-marco-MiniLM-L-4-v2.onnx")
features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'], padding=True, truncation=True, return_tensors="np")
ort_outs = ort_sess.run(None, features)
print(ort_outs)
Performance
TBU...