michael-guenther
commited on
Commit
•
79bfe60
1
Parent(s):
0eb77d9
Update README.md
Browse files
README.md
CHANGED
@@ -1141,7 +1141,24 @@ embeddings = F.normalize(embeddings, p=2, dim=1)
|
|
1141 |
</p>
|
1142 |
</details>
|
1143 |
|
1144 |
-
You can use Jina Embedding models directly from transformers package
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1145 |
```python
|
1146 |
!pip install transformers
|
1147 |
from transformers import AutoModel
|
@@ -1175,6 +1192,28 @@ embeddings = model.encode(['How is the weather today?', '今天天气怎么样?'
|
|
1175 |
print(cos_sim(embeddings[0], embeddings[1]))
|
1176 |
```
|
1177 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1178 |
## Alternatives to Using Transformers Package
|
1179 |
|
1180 |
1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
|
@@ -1188,6 +1227,16 @@ According to the latest blog post from [LLamaIndex](https://blog.llamaindex.ai/b
|
|
1188 |
|
1189 |
<img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png" width="780px">
|
1190 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1191 |
|
1192 |
## Contact
|
1193 |
|
|
|
1141 |
</p>
|
1142 |
</details>
|
1143 |
|
1144 |
+
You can use Jina Embedding models directly from transformers package.
|
1145 |
+
|
1146 |
+
First, you need to make sure that you are logged into huggingface. You can either use the huggingface-cli tool (after installing the `transformers` package) and pass your [hugginface access token](https://huggingface.co/docs/hub/security-tokens):
|
1147 |
+
```bash
|
1148 |
+
huggingface-cli login
|
1149 |
+
```
|
1150 |
+
Alternatively, you can provide the access token as an environment variable in the shell:
|
1151 |
+
```bash
|
1152 |
+
export HF_TOKEN="<your token here>"
|
1153 |
+
```
|
1154 |
+
or in Python:
|
1155 |
+
```python
|
1156 |
+
import os
|
1157 |
+
|
1158 |
+
os.environ['HF_TOKEN'] = "<your token here>"
|
1159 |
+
```
|
1160 |
+
|
1161 |
+
Then, you can use load and use the model via the `AutoModel` class:
|
1162 |
```python
|
1163 |
!pip install transformers
|
1164 |
from transformers import AutoModel
|
|
|
1192 |
print(cos_sim(embeddings[0], embeddings[1]))
|
1193 |
```
|
1194 |
|
1195 |
+
Using the its latest release (v2.3.0) sentence-transformers also supports Jina embeddings (Please make sure that you are logged into huggingface as well):
|
1196 |
+
|
1197 |
+
```python
|
1198 |
+
!pip install -U sentence-transformers
|
1199 |
+
from sentence_transformers import SentenceTransformer
|
1200 |
+
from sentence_transformers.util import cos_sim
|
1201 |
+
|
1202 |
+
model = SentenceTransformer(
|
1203 |
+
"jinaai/jina-embeddings-v2-base-de", # switch to en/zh for English or Chinese
|
1204 |
+
trust_remote_code=True
|
1205 |
+
)
|
1206 |
+
|
1207 |
+
# control your input sequence length up to 8192
|
1208 |
+
model.max_seq_length = 1024
|
1209 |
+
|
1210 |
+
embeddings = model.encode([
|
1211 |
+
'How is the weather today?',
|
1212 |
+
'Wie ist das Wetter heute?'
|
1213 |
+
])
|
1214 |
+
print(cos_sim(embeddings[0], embeddings[1]))
|
1215 |
+
```
|
1216 |
+
|
1217 |
## Alternatives to Using Transformers Package
|
1218 |
|
1219 |
1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
|
|
|
1227 |
|
1228 |
<img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png" width="780px">
|
1229 |
|
1230 |
+
## Trouble Shooting
|
1231 |
+
|
1232 |
+
**Loading of Model Code failed**
|
1233 |
+
|
1234 |
+
If you forgot to pass the `trust_remote_code=True` flag when calling `AutoModel.from_pretrained` or initializing the model via the `SentenceTransformer` class, you will receive an error that the model weights could not be initialized.
|
1235 |
+
This is caused by tranformers falling back to creating a default BERT model, instead of a jina-embedding model:
|
1236 |
+
|
1237 |
+
```bash
|
1238 |
+
Some weights of the model checkpoint at jinaai/jina-embeddings-v2-base-en were not used when initializing BertModel: ['encoder.layer.2.mlp.layernorm.weight', 'encoder.layer.3.mlp.layernorm.weight', 'encoder.layer.10.mlp.wo.bias', 'encoder.layer.5.mlp.wo.bias', 'encoder.layer.2.mlp.layernorm.bias', 'encoder.layer.1.mlp.gated_layers.weight', 'encoder.layer.5.mlp.gated_layers.weight', 'encoder.layer.8.mlp.layernorm.bias', ...
|
1239 |
+
```
|
1240 |
|
1241 |
## Contact
|
1242 |
|