jinaai
/

jina-embeddings-v3

@@ -21524,7 +21524,7 @@ model-index:
 <p align="center">
-<b>The embedding set trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
 </p>
 <p align="center">
@@ -21555,7 +21555,7 @@ Additionally, it features 5 LoRA adapters to generate task-specific embeddings e
 - **Matryoshka Embeddings**: Supports flexible embedding sizes (`32, 64, 128, 256, 512, 768, 1024`), allowing for truncating embeddings to fit your application.
 ### Supported Languages:
-While the foundation model supports 89 languages, we've focused our tuning efforts on the following 30 languages:
 **Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, Georgian, German, Greek,
 Hindi, Indonesian, Italian, Japanese, Korean, Latvian, Norwegian, Polish, Portuguese, Romanian,
 Russian, Slovak, Spanish, Swedish, Thai, Turkish, Ukrainian, Urdu,** and **Vietnamese.**
@@ -21598,9 +21598,11 @@ tokenizer = AutoTokenizer.from_pretrained("jinaai/jina-embeddings-v3")
 model = AutoModel.from_pretrained("jinaai/jina-embeddings-v3", trust_remote_code=True)
 encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")
 with torch.no_grad():
-    model_output = model(**encoded_input, task='retrieval.query')
 embeddings = mean_pooling(model_output, encoded_input["attention_mask"])
 embeddings = F.normalize(embeddings, p=2, dim=1)
@@ -21661,9 +21663,6 @@ embeddings = model.encode(['Sample text'], truncate_dim=256)
 ```
-Note that the `truncate_dim` could be any integer between 1 and 1024 for the `separation`, `classification`, and `text-matching` tasks. As for the `retrieval.passage` and `retrieval.query` tasks, the value must be larger than the length of the instruction prompt. By default, the value must be larger than 9 for the `retrieval.passage` task and larger than 12 for the `retrieval.query` task.
 The latest version (3.1.0) of [SentenceTransformers](https://github.com/UKPLab/sentence-transformers) also supports `jina-embeddings-v3`:
 ```bash

 <p align="center">
+<b>The embedding model trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
 </p>
 <p align="center">
 - **Matryoshka Embeddings**: Supports flexible embedding sizes (`32, 64, 128, 256, 512, 768, 1024`), allowing for truncating embeddings to fit your application.
 ### Supported Languages:
+While the foundation model supports 100 languages, we've focused our tuning efforts on the following 30 languages:
 **Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, Georgian, German, Greek,
 Hindi, Indonesian, Italian, Japanese, Korean, Latvian, Norwegian, Polish, Portuguese, Romanian,
 Russian, Slovak, Spanish, Swedish, Thai, Turkish, Ukrainian, Urdu,** and **Vietnamese.**
 model = AutoModel.from_pretrained("jinaai/jina-embeddings-v3", trust_remote_code=True)
 encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")
+task = 'retrieval.query'
+task_id = model._adaptation_map[task]
+adapter_mask = torch.full((len(sentences),), task_id, dtype=torch.int32)
 with torch.no_grad():
+    model_output = model(**encoded_input, adapter_mask=adapter_mask)
 embeddings = mean_pooling(model_output, encoded_input["attention_mask"])
 embeddings = F.normalize(embeddings, p=2, dim=1)
 ```
 The latest version (3.1.0) of [SentenceTransformers](https://github.com/UKPLab/sentence-transformers) also supports `jina-embeddings-v3`:
 ```bash