omarelshehy
commited on
Commit
•
7768f9a
1
Parent(s):
8425037
Update README.md
Browse files
README.md
CHANGED
@@ -102,23 +102,34 @@ model-index:
|
|
102 |
value: 82.18717939041626
|
103 |
task:
|
104 |
type: STS
|
|
|
|
|
|
|
105 |
---
|
106 |
|
107 |
# SentenceTransformer based on FacebookAI/xlm-roberta-large
|
108 |
|
109 |
-
This
|
110 |
|
111 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
112 |
|
113 |
-
### Model Description
|
114 |
- **Model Type:** Sentence Transformer
|
115 |
- **Base model:** [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) <!-- at revision c23d21b0620b635a76227c604d44e43a9f0ee389 -->
|
116 |
- **Maximum Sequence Length:** 512 tokens
|
117 |
- **Output Dimensionality:** 1024 tokens
|
118 |
- **Similarity Function:** Cosine Similarity
|
119 |
-
|
120 |
-
|
121 |
-
|
|
|
|
|
|
|
122 |
|
123 |
|
124 |
## Usage
|
@@ -136,12 +147,13 @@ Then you can load this model and run inference.
|
|
136 |
from sentence_transformers import SentenceTransformer
|
137 |
|
138 |
# Download from the 🤗 Hub
|
139 |
-
|
|
|
140 |
# Run inference
|
141 |
sentences = [
|
142 |
-
|
143 |
-
|
144 |
-
|
145 |
]
|
146 |
embeddings = model.encode(sentences)
|
147 |
print(embeddings.shape)
|
|
|
102 |
value: 82.18717939041626
|
103 |
task:
|
104 |
type: STS
|
105 |
+
language:
|
106 |
+
- ar
|
107 |
+
- en
|
108 |
---
|
109 |
|
110 |
# SentenceTransformer based on FacebookAI/xlm-roberta-large
|
111 |
|
112 |
+
🚀 This **v2.0** from the previously released version of (omarelshehy/arabic-english-sts-matryoshka)[https://huggingface.co/omarelshehy/arabic-english-sts-matryoshka]
|
113 |
|
114 |
+
📊 Metrics (MTEB) in this version are better especially on **ar-en** metrics, but again don't just rely on them — test the model yourself and see if it fits your needs! ✅
|
115 |
+
|
116 |
+
# Model description
|
117 |
+
|
118 |
+
This is a **Bilingual** (Arabic-English) [sentence-transformers](https://www.SBERT.net) model finetuned from [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for **semantic textual similarity, semantic search, paraphrase mining, text classification, clustering**, and more.
|
119 |
+
|
120 |
+
The model handles both languages separately 🌐, but also **interchangeably**, which unlocks flexible applications for developers and researchers who want to further build on Arabic models! 💡
|
121 |
|
|
|
122 |
- **Model Type:** Sentence Transformer
|
123 |
- **Base model:** [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) <!-- at revision c23d21b0620b635a76227c604d44e43a9f0ee389 -->
|
124 |
- **Maximum Sequence Length:** 512 tokens
|
125 |
- **Output Dimensionality:** 1024 tokens
|
126 |
- **Similarity Function:** Cosine Similarity
|
127 |
+
|
128 |
+
## Matryoshka Embeddings 🪆
|
129 |
+
|
130 |
+
This model supports Matryoshka embeddings, allowing you to truncate embeddings into smaller sizes to optimize performance and memory usage, based on your task requirements. Available truncation sizes include: **1024, 768, 512, 256, 128, and 64**
|
131 |
+
|
132 |
+
You can select the appropriate embedding size for your use case, ensuring flexibility in resource management.
|
133 |
|
134 |
|
135 |
## Usage
|
|
|
147 |
from sentence_transformers import SentenceTransformer
|
148 |
|
149 |
# Download from the 🤗 Hub
|
150 |
+
matryoshka_dim = 786
|
151 |
+
model = SentenceTransformer("omarelshehy/arabic-english-sts-matryoshka", truncate_dim=matryoshka_dim)
|
152 |
# Run inference
|
153 |
sentences = [
|
154 |
+
"She enjoyed reading books by the window as the rain poured outside.",
|
155 |
+
"كانت تستمتع بقراءة الكتب بجانب النافذة بينما كانت الأمطار تتساقط في الخارج.",
|
156 |
+
"Reading by the window was her favorite thing, especially during rainy days."
|
157 |
]
|
158 |
embeddings = model.encode(sentences)
|
159 |
print(embeddings.shape)
|