jinaai
/

jina-embeddings-v2-base-zh

@@ -1141,7 +1141,24 @@ embeddings = F.normalize(embeddings, p=2, dim=1)
 </p>
 </details>
-You can use Jina Embedding models directly from transformers package:
 ```python
 !pip install transformers
 from transformers import AutoModel
@@ -1175,6 +1192,28 @@ embeddings = model.encode(['How is the weather today?', '今天天气怎么样?'
 print(cos_sim(embeddings[0], embeddings[1]))
 ```
 ## Alternatives to Using Transformers Package
 1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
@@ -1188,6 +1227,16 @@ According to the latest blog post from [LLamaIndex](https://blog.llamaindex.ai/b
 <img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png" width="780px">
 ## Contact

 </p>
 </details>
+You can use Jina Embedding models directly from transformers package.
+First, you need to make sure that you are logged into huggingface. You can either use the huggingface-cli tool (after installing the `transformers` package) and pass your [hugginface access token](https://huggingface.co/docs/hub/security-tokens):
+```bash
+huggingface-cli login
+```
+Alternatively, you can provide the access token as an environment variable in the shell:
+```bash
+export HF_TOKEN="<your token here>"
+```
+or in Python:
+```python
+import os
+os.environ['HF_TOKEN'] = "<your token here>"
+```
+Then, you can use load and use the model via the `AutoModel` class:
 ```python
 !pip install transformers
 from transformers import AutoModel
 print(cos_sim(embeddings[0], embeddings[1]))
 ```
+Using the its latest release (v2.3.0) sentence-transformers also supports Jina embeddings (Please make sure that you are logged into huggingface as well):
+```python
+!pip install -U sentence-transformers
+from sentence_transformers import SentenceTransformer
+from sentence_transformers.util import cos_sim
+model = SentenceTransformer(
+    "jinaai/jina-embeddings-v2-base-de", # switch to en/zh for English or Chinese
+    trust_remote_code=True
+)
+# control your input sequence length up to 8192
+model.max_seq_length = 1024
+embeddings = model.encode([
+    'How is the weather today?',
+    'Wie ist das Wetter heute?'
+])
+print(cos_sim(embeddings[0], embeddings[1]))
+```
 ## Alternatives to Using Transformers Package
 1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
 <img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png" width="780px">
+## Trouble Shooting
+**Loading of Model Code failed**
+If you forgot to pass the `trust_remote_code=True` flag when calling `AutoModel.from_pretrained` or initializing the model via the `SentenceTransformer` class, you will receive an error that the model weights could not be initialized.
+This is caused by tranformers falling back to creating a default BERT model, instead of a jina-embedding model:
+```bash
+Some weights of the model checkpoint at jinaai/jina-embeddings-v2-base-en were not used when initializing BertModel: ['encoder.layer.2.mlp.layernorm.weight', 'encoder.layer.3.mlp.layernorm.weight', 'encoder.layer.10.mlp.wo.bias', 'encoder.layer.5.mlp.wo.bias', 'encoder.layer.2.mlp.layernorm.bias', 'encoder.layer.1.mlp.gated_layers.weight', 'encoder.layer.5.mlp.gated_layers.weight', 'encoder.layer.8.mlp.layernorm.bias', ...
+```
 ## Contact