---
license: mit
---
Converted [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) model in onnx format for use with [Vespa Embedding](https://docs.vespa.ai/en/embedding.html).
- intfloat-multilingual-e5-small.onnx
- intfloat-multilingual-e5-small_fp16.onnx
- intfloat-multilingual-e5-small_quantized.onnx
- (int8 quantize, In python, running it produces a different result...)
python can also output the same vectors as vespa's embeddings.
Note: normalize must be set to true in vespa's service.xml in order for embeddings output to be the same as python.
```xml
true
mean
true
mean
```
or `url`
```xml
true
mean
```
```python
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
from torch import Tensor
import torch
import torch.nn.functional as F
model_name = "hotchpotch/vespa-onnx-intfloat-multilingual-e5-small"
onnx_file_name = "intfloat-multilingual-e5-small.onnx"
model = ORTModelForSequenceClassification.from_pretrained(
model_name, file_name=onnx_file_name
)
# override for last_hidden_states
model.output_names["logits"] = 0
tokenizer = AutoTokenizer.from_pretrained(model_name)
def average_pool(last_hidden_state: Tensor, attention_mask: Tensor) -> Tensor:
last_hidden = last_hidden_state.masked_fill(~attention_mask[..., None].bool(), 0.0)
return last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]
input_texts = [
"query: What is the capital of Japan?",
"query: 日本の首都は?", # "What is the capital of Japan?" in Japanese
"passage: ニューヨークは大きな都市です年エネ年エネ", # "New York is a big city" in Japanese
"passage: 東京は良い場所です", # "Tokyo is a good place" in Japanese, Tokyo is the capital of Japan.
]
batch_dict = tokenizer(
input_texts, max_length=512, padding=True, truncation=True, return_tensors="pt"
)
if "token_type_ids" not in batch_dict:
batch_dict["token_type_ids"] = torch.zeros_like(batch_dict["input_ids"])
# logits is last_hidden_state
last_hidden_states = model(**batch_dict).logits
embeddings = average_pool(last_hidden_states, batch_dict["attention_mask"])
# same vespa embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
# similarity score
print(embeddings[:2] @ embeddings[2:].T)
```
## License
same e5 (MIT)
- https://huggingface.co/intfloat/multilingual-e5-small
## Attribution
All credits for this model go to the authors of Multilingual-E5-large and the associated researchers and organizations. When using this model, please be sure to attribute the original authors.