Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ This is a text encoder based on [stella_en_1.5B_v5](https://huggingface.co/dunzh
|
|
22 |
- In the first step, we adapted the model for Polish with [multilingual knowledge distillation method](https://aclanthology.org/2020.emnlp-main.365/) using a diverse corpus of 20 million Polish-English text pairs.
|
23 |
- The second step involved fine-tuning the model with contrastrive loss using a dataset consisting of 1.4 million queries. Positive and negative passages for each query have been selected with the help of [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight) reranker. The model was trained for three epochs with a batch size of 1024 queries.
|
24 |
|
25 |
-
The encoder transforms texts to 1024 dimensional vectors. The model is optimized specifically for Polish information retrieval tasks. If you need a more versatile encoder, suitable for a wider range of tasks such as semantic similarity or clustering, you probably use the distilled version from the first step: [sdadas/stella-pl](https://huggingface.co/sdadas/stella-pl).
|
26 |
|
27 |
## Usage (Sentence-Transformers)
|
28 |
|
|
|
22 |
- In the first step, we adapted the model for Polish with [multilingual knowledge distillation method](https://aclanthology.org/2020.emnlp-main.365/) using a diverse corpus of 20 million Polish-English text pairs.
|
23 |
- The second step involved fine-tuning the model with contrastrive loss using a dataset consisting of 1.4 million queries. Positive and negative passages for each query have been selected with the help of [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight) reranker. The model was trained for three epochs with a batch size of 1024 queries.
|
24 |
|
25 |
+
The encoder transforms texts to 1024 dimensional vectors. The model is optimized specifically for Polish information retrieval tasks. If you need a more versatile encoder, suitable for a wider range of tasks such as semantic similarity or clustering, you should probably use the distilled version from the first step: [sdadas/stella-pl](https://huggingface.co/sdadas/stella-pl).
|
26 |
|
27 |
## Usage (Sentence-Transformers)
|
28 |
|