omarelshehy commited on
Commit
7768f9a
1 Parent(s): 8425037

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -10
README.md CHANGED
@@ -102,23 +102,34 @@ model-index:
102
  value: 82.18717939041626
103
  task:
104
  type: STS
 
 
 
105
  ---
106
 
107
  # SentenceTransformer based on FacebookAI/xlm-roberta-large
108
 
109
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
110
 
111
- ## Model Details
 
 
 
 
 
 
112
 
113
- ### Model Description
114
  - **Model Type:** Sentence Transformer
115
  - **Base model:** [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) <!-- at revision c23d21b0620b635a76227c604d44e43a9f0ee389 -->
116
  - **Maximum Sequence Length:** 512 tokens
117
  - **Output Dimensionality:** 1024 tokens
118
  - **Similarity Function:** Cosine Similarity
119
- <!-- - **Training Dataset:** Unknown -->
120
- <!-- - **Language:** Unknown -->
121
- <!-- - **License:** Unknown -->
 
 
 
122
 
123
 
124
  ## Usage
@@ -136,12 +147,13 @@ Then you can load this model and run inference.
136
  from sentence_transformers import SentenceTransformer
137
 
138
  # Download from the 🤗 Hub
139
- model = SentenceTransformer("sentence_transformers_model_id")
 
140
  # Run inference
141
  sentences = [
142
- 'في حين أن كل عام يجلب جولة جديدة من المعارك الحزبية في واشنطن حول قانون الضرائب، برنامج EITC هو اقتراح واحد الذي يرضي قطعة واسعة من الطيف السياسي.',
143
- 'اقتراح برنامج EITC يرضي قسم واسع من الطيف السياسي.',
144
- 'The proposal of the EITC program satisfies a very narrow section of the political spectrum.',
145
  ]
146
  embeddings = model.encode(sentences)
147
  print(embeddings.shape)
 
102
  value: 82.18717939041626
103
  task:
104
  type: STS
105
+ language:
106
+ - ar
107
+ - en
108
  ---
109
 
110
  # SentenceTransformer based on FacebookAI/xlm-roberta-large
111
 
112
+ 🚀 This **v2.0** from the previously released version of (omarelshehy/arabic-english-sts-matryoshka)[https://huggingface.co/omarelshehy/arabic-english-sts-matryoshka]
113
 
114
+ 📊 Metrics (MTEB) in this version are better especially on **ar-en** metrics, but again don't just rely on them — test the model yourself and see if it fits your needs! ✅
115
+
116
+ # Model description
117
+
118
+ This is a **Bilingual** (Arabic-English) [sentence-transformers](https://www.SBERT.net) model finetuned from [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for **semantic textual similarity, semantic search, paraphrase mining, text classification, clustering**, and more.
119
+
120
+ The model handles both languages separately 🌐, but also **interchangeably**, which unlocks flexible applications for developers and researchers who want to further build on Arabic models! 💡
121
 
 
122
  - **Model Type:** Sentence Transformer
123
  - **Base model:** [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) <!-- at revision c23d21b0620b635a76227c604d44e43a9f0ee389 -->
124
  - **Maximum Sequence Length:** 512 tokens
125
  - **Output Dimensionality:** 1024 tokens
126
  - **Similarity Function:** Cosine Similarity
127
+
128
+ ## Matryoshka Embeddings 🪆
129
+
130
+ This model supports Matryoshka embeddings, allowing you to truncate embeddings into smaller sizes to optimize performance and memory usage, based on your task requirements. Available truncation sizes include: **1024, 768, 512, 256, 128, and 64**
131
+
132
+ You can select the appropriate embedding size for your use case, ensuring flexibility in resource management.
133
 
134
 
135
  ## Usage
 
147
  from sentence_transformers import SentenceTransformer
148
 
149
  # Download from the 🤗 Hub
150
+ matryoshka_dim = 786
151
+ model = SentenceTransformer("omarelshehy/arabic-english-sts-matryoshka", truncate_dim=matryoshka_dim)
152
  # Run inference
153
  sentences = [
154
+ "She enjoyed reading books by the window as the rain poured outside.",
155
+ "كانت تستمتع بقراءة الكتب بجانب النافذة بينما كانت الأمطار تتساقط في الخارج.",
156
+ "Reading by the window was her favorite thing, especially during rainy days."
157
  ]
158
  embeddings = model.encode(sentences)
159
  print(embeddings.shape)