Snowflake/snowflake-arctic-embed-xs · Sentence Transformers integration

tomaarsen

Apr 16

•

edited Apr 16

Hello!

Pull Request overview

Add Sentence Transformers integration.

Details

This PR adds proper support in Sentence Transformers, i.e. the package often used in third party embedding applications. It abstracts away a lot of the transformers code from the user, and instead hides it in the configuration. As a result, the user can just use:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Snowflake/snowflake-arctic-embed-xs")

queries = ['what is snowflake?', 'Where can I get the best tacos?']
documents = ['The Data Cloud!', 'Mexico City of Course!']

query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

instead of manually loading both the model and the tokenizer, adding the query prompt themselves, computing the token embeddings & then taking the CLS embedding and then doing normalization.

P.s. Sentence Transformers is being maintained by Hugging Face.

Tom Aarsen

Add Sentence Transformers integration + README7b408708

tomaarsen changed pull request status to open Apr 16

spacemanidol changed pull request status to merged Apr 16