sdadas
/

stella-pl-retrieval

Sentence Similarity

sentence-transformers

text-generation

feature-extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

sdadas commited on Oct 2

Commit

a6d84d0

•

1 Parent(s): acef991

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -22,7 +22,7 @@ This is a text encoder based on [stella_en_1.5B_v5](https://huggingface.co/dunzh
 - In the first step, we adapted the model for Polish with [multilingual knowledge distillation method](https://aclanthology.org/2020.emnlp-main.365/) using a diverse corpus of 20 million Polish-English text pairs.
 - The second step involved fine-tuning the model with contrastrive loss using a dataset consisting of 1.4 million queries. Positive and negative passages for each query have been selected with the help of [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight) reranker. The model was trained for three epochs with a batch size of 1024 queries.
-The encoder transforms texts to 1024 dimensional vectors. The model is optimized specifically for Polish information retrieval tasks. If you need a more versatile encoder, suitable for a wider range of tasks such as semantic similarity or clustering, you probably use the distilled version from the first step: [sdadas/stella-pl](https://huggingface.co/sdadas/stella-pl).
 ## Usage (Sentence-Transformers)

 - In the first step, we adapted the model for Polish with [multilingual knowledge distillation method](https://aclanthology.org/2020.emnlp-main.365/) using a diverse corpus of 20 million Polish-English text pairs.
 - The second step involved fine-tuning the model with contrastrive loss using a dataset consisting of 1.4 million queries. Positive and negative passages for each query have been selected with the help of [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight) reranker. The model was trained for three epochs with a batch size of 1024 queries.
+The encoder transforms texts to 1024 dimensional vectors. The model is optimized specifically for Polish information retrieval tasks. If you need a more versatile encoder, suitable for a wider range of tasks such as semantic similarity or clustering, you should probably use the distilled version from the first step: [sdadas/stella-pl](https://huggingface.co/sdadas/stella-pl).
 ## Usage (Sentence-Transformers)