--- license: mit --- Converted [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) model in onnx format for use with [Vespa Embedding](https://docs.vespa.ai/en/embedding.html). - intfloat-multilingual-e5-small.onnx - intfloat-multilingual-e5-small_fp16.onnx - intfloat-multilingual-e5-small_quantized.onnx - (int8 quantize, In python, running it produces a different result...) python can also output the same vectors as vespa's embeddings. Note: normalize must be set to true in vespa's service.xml in order for embeddings output to be the same as python. ```xml

true

mean

true

mean

``` or `url` ```xml

true

mean

``` ```python from optimum.onnxruntime import ORTModelForSequenceClassification from transformers import AutoTokenizer from torch import Tensor import torch import torch.nn.functional as F model_name = "hotchpotch/vespa-onnx-intfloat-multilingual-e5-small" onnx_file_name = "intfloat-multilingual-e5-small.onnx" model = ORTModelForSequenceClassification.from_pretrained( model_name, file_name=onnx_file_name ) # override for last_hidden_states model.output_names["logits"] = 0 tokenizer = AutoTokenizer.from_pretrained(model_name) def average_pool(last_hidden_state: Tensor, attention_mask: Tensor) -> Tensor: last_hidden = last_hidden_state.masked_fill(~attention_mask[..., None].bool(), 0.0) return last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None] input_texts = [ "query: What is the capital of Japan?", "query: 日本の首都は?", # "What is the capital of Japan?" in Japanese "passage: ニューヨークは大きな都市です年エネ年エネ", # "New York is a big city" in Japanese "passage: 東京は良い場所です", # "Tokyo is a good place" in Japanese, Tokyo is the capital of Japan. ] batch_dict = tokenizer( input_texts, max_length=512, padding=True, truncation=True, return_tensors="pt" ) if "token_type_ids" not in batch_dict: batch_dict["token_type_ids"] = torch.zeros_like(batch_dict["input_ids"]) # logits is last_hidden_state last_hidden_states = model(**batch_dict).logits embeddings = average_pool(last_hidden_states, batch_dict["attention_mask"]) # same vespa embeddings embeddings = F.normalize(embeddings, p=2, dim=1) # similarity score print(embeddings[:2] @ embeddings[2:].T) ``` ## License same e5 (MIT) - https://huggingface.co/intfloat/multilingual-e5-small ## Attribution All credits for this model go to the authors of Multilingual-E5-large and the associated researchers and organizations. When using this model, please be sure to attribute the original authors.