Converted intfloat/multilingual-e5-small model in onnx format for use with Vespa Embedding.
- intfloat-multilingual-e5-small.onnx
- intfloat-multilingual-e5-small_fp16.onnx
- intfloat-multilingual-e5-small_quantized.onnx
- (int8 quantize, In python, running it produces a different result...)
python can also output the same vectors as vespa's embeddings.
Note: normalize must be set to true in vespa's service.xml in order for embeddings output to be the same as python.
<component id="me5_small_q" type="hugging-face-embedder">
<transformer-model path="me5/intfloat-multilingual-e5-small_quantized.onnx" />
<tokenizer-model path="me5/tokenizer.json" />
<normalize>true</normalize>
<pooling-strategy>mean</pooling-strategy>
</component>
<component id="me5_small" type="hugging-face-embedder">
<transformer-model path="me5/intfloat-multilingual-e5-small.onnx" />
<tokenizer-model path="me5/tokenizer.json" />
<normalize>true</normalize>
<pooling-strategy>mean</pooling-strategy>
</component>
or url
<component id="me5_small_fp16" type="hugging-face-embedder">
<transformer-model
url="https://huggingface.co/hotchpotch/vespa-onnx-intfloat-multilingual-e5-small/resolve/main/intfloat-multilingual-e5-small_fp16.onnx" />
<tokenizer-model
url="https://huggingface.co/hotchpotch/vespa-onnx-intfloat-multilingual-e5-small/resolve/main/tokenizer.json" />
<normalize>true</normalize>
<pooling-strategy>mean</pooling-strategy>
</component>
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
from torch import Tensor
import torch
import torch.nn.functional as F
model_name = "hotchpotch/vespa-onnx-intfloat-multilingual-e5-small"
onnx_file_name = "intfloat-multilingual-e5-small.onnx"
model = ORTModelForSequenceClassification.from_pretrained(
model_name, file_name=onnx_file_name
)
# override for last_hidden_states
model.output_names["logits"] = 0
tokenizer = AutoTokenizer.from_pretrained(model_name)
def average_pool(last_hidden_state: Tensor, attention_mask: Tensor) -> Tensor:
last_hidden = last_hidden_state.masked_fill(~attention_mask[..., None].bool(), 0.0)
return last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]
input_texts = [
"query: What is the capital of Japan?",
"query: 日本の首都は?", # "What is the capital of Japan?" in Japanese
"passage: ニューヨークは大きな都市です年エネ年エネ", # "New York is a big city" in Japanese
"passage: 東京は良い場所です", # "Tokyo is a good place" in Japanese, Tokyo is the capital of Japan.
]
batch_dict = tokenizer(
input_texts, max_length=512, padding=True, truncation=True, return_tensors="pt"
)
if "token_type_ids" not in batch_dict:
batch_dict["token_type_ids"] = torch.zeros_like(batch_dict["input_ids"])
# logits is last_hidden_state
last_hidden_states = model(**batch_dict).logits
embeddings = average_pool(last_hidden_states, batch_dict["attention_mask"])
# same vespa embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
# similarity score
print(embeddings[:2] @ embeddings[2:].T)
License
same e5 (MIT)
Attribution
All credits for this model go to the authors of Multilingual-E5-large and the associated researchers and organizations. When using this model, please be sure to attribute the original authors.
- Downloads last month
- 37
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.