ssmits
/

Falcon2-5.5B-multilingual-embed-base

Text Classification

sentence-transformers

ssmits/Falcon2-5.5B-multilingual

Model card Files Files and versions Community

ssmits commited on Jun 8

Commit

888636b

•

1 Parent(s): b550d04

Update README.md

Files changed (1) hide show

README.md +30 -1

README.md CHANGED Viewed

@@ -25,4 +25,33 @@ pipeline_tag: text-classification
 Embeddings version of the base model [ssmits/Falcon2-5.5B-multilingual](https://huggingface.co/ssmits/Falcon2-5.5B-multilingual/edit/main/README.md).
 The 'lm_head' layer of this model has been removed, which means it can be used for embeddings. It will not perform greatly, as it needs to be further fine-tuned, as shown by [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct).
 Additionaly, in stead of a normalization layer, the hidden layers are followed up by both a classical weight and bias 1-dimensional array of 4096 values.
-Further research needs to be conducted if this architecture will fully function when adding a classification head in combination with utilizing the transformers library.

 Embeddings version of the base model [ssmits/Falcon2-5.5B-multilingual](https://huggingface.co/ssmits/Falcon2-5.5B-multilingual/edit/main/README.md).
 The 'lm_head' layer of this model has been removed, which means it can be used for embeddings. It will not perform greatly, as it needs to be further fine-tuned, as shown by [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct).
 Additionaly, in stead of a normalization layer, the hidden layers are followed up by both a classical weight and bias 1-dimensional array of 4096 values.
+Further research needs to be conducted if this architecture will fully function when adding a classification head in combination with utilizing the transformers library.
+## Inference
+```python
+from sentence_transformers import SentenceTransformer
+import torch
+# 1. Load a pretrained Sentence Transformer model
+model = SentenceTransformer("ssmits/Falcon2-5.5B-multilingual-embed-base")
+# The sentences to encode
+sentences = [
+    "The weather is lovely today.",
+    "It's so sunny outside!",
+    "He drove to the stadium.",
+]
+# 2. Calculate embeddings by calling model.encode()
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# (3, 4096)
+# 3. Calculate the embedding similarities
+# Using torch to compute cosine similarity matrix
+similarities = torch.nn.functional.cosine_similarity(embeddings.unsqueeze(0), embeddings.unsqueeze(1), dim=2)
+print(similarities)
+# tensor([[1.0000, 0.7120, 0.5937],
+#         [0.7120, 1.0000, 0.5925],
+#         [0.5937, 0.5925, 1.0000]])
+```