Update README.md
Browse files
README.md
CHANGED
@@ -1082,6 +1082,10 @@ model-index:
|
|
1082 |
It is based on a BERT architecture (JinaBERT) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length.
|
1083 |
We have designed it for high performance in mongolingual & cross-language applications and trained it specifically to support mixed Chinese-English input without bias.
|
1084 |
|
|
|
|
|
|
|
|
|
1085 |
The embedding model was trained using 512 sequence length, but extrapolates to 8k sequence length (or even longer) thanks to ALiBi.
|
1086 |
This makes our model useful for a range of use cases, especially when processing long documents is needed, including long document retrieval, semantic textual similarity, text reranking, recommendation, RAG and LLM-based generative search, etc.
|
1087 |
|
@@ -1175,7 +1179,7 @@ According to the latest blog post from [LLamaIndex](https://blog.llamaindex.ai/b
|
|
1175 |
## Plans
|
1176 |
|
1177 |
1. Bilingual embedding models supporting more European & Asian languages, including Spanish, French, Italian and Japanese.
|
1178 |
-
2. Multimodal embedding models enable
|
1179 |
3. High-performt rerankers.
|
1180 |
|
1181 |
## Contact
|
|
|
1082 |
It is based on a BERT architecture (JinaBERT) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length.
|
1083 |
We have designed it for high performance in mongolingual & cross-language applications and trained it specifically to support mixed Chinese-English input without bias.
|
1084 |
|
1085 |
+
`jina-embeddings-v2-base-zh` 是支持中英双语的文本向量模型,它支持长达8192字符的文本编码。
|
1086 |
+
该模型的研发基于BERT架构(JinaBERT),JinaBERT是在BERT架构基础上的改进,首次将[ALiBi](https://arxiv.org/abs/2108.12409)应用到编码器架构中以支持更长的序列。
|
1087 |
+
不同于以往的单语言/多语言向量模型,我们设计双语模型来更好的支持单语言(中搜中)以及跨语言(中搜英)文档检索。
|
1088 |
+
|
1089 |
The embedding model was trained using 512 sequence length, but extrapolates to 8k sequence length (or even longer) thanks to ALiBi.
|
1090 |
This makes our model useful for a range of use cases, especially when processing long documents is needed, including long document retrieval, semantic textual similarity, text reranking, recommendation, RAG and LLM-based generative search, etc.
|
1091 |
|
|
|
1179 |
## Plans
|
1180 |
|
1181 |
1. Bilingual embedding models supporting more European & Asian languages, including Spanish, French, Italian and Japanese.
|
1182 |
+
2. Multimodal embedding models enable Multimodal RAG applications.
|
1183 |
3. High-performt rerankers.
|
1184 |
|
1185 |
## Contact
|