openbmb/MiniCPM-Embedding · Patch Sentence Transformers integration

tomaarsen

Sep 30

•

edited Sep 30

Hello!

Congratulations on your release! Well done 👏

Pull Request overview

Patch Sentence Transformers integration, in particular:
- Rename "1_Pool" to "1_Pooling": the latter is referenced in modules.json and will be used to load the pooling configuration.
- Update the pooling configuration to also include the prompt in the pooling. This previously resulted in a slight difference between transformers and sentence-transformers.
Simplified the code snippet:
- max_seq_length is now defined in sentence_bert_config.json.
- a Normalize module is added in modules.json, which means that all outputs will be normalized even without specifying normalize_embeddings=True.
Add instructions to the prompts dictionary in config_sentence_transformers.json. This allows for model.encode(my_texts, prompt_name="nq")
Add a sentence-transformers tag, making the model easier to find when searching for embedding models under https://huggingface.co/models?library=sentence-transformers&sort=trending

Details

I ran the updated script in the README, and it gave me [[0.35365450382232666, 0.18592746555805206]], which is the same as what I get when running the transformers snippet.

Tom Aarsen

Patch Sentence Transformers implementation6398ee4b

tomaarsen changed pull request status to open Sep 30

Kaguya-19 changed pull request status to merged Sep 30

Kaguya-19

OpenBMB org Sep 30

Thank you!

yushi

OpenBMB org Sep 30

Thank you for your helpful work!