omarelshehy
/

arabic-english-sts-matryoshka-v2.0

@@ -102,23 +102,34 @@ model-index:
       value: 82.18717939041626
     task:
       type: STS
 ---
 # SentenceTransformer based on FacebookAI/xlm-roberta-large
-This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
-## Model Details
-### Model Description
 - **Model Type:** Sentence Transformer
 - **Base model:** [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) <!-- at revision c23d21b0620b635a76227c604d44e43a9f0ee389 -->
 - **Maximum Sequence Length:** 512 tokens
 - **Output Dimensionality:** 1024 tokens
 - **Similarity Function:** Cosine Similarity
-<!-- - **Training Dataset:** Unknown -->
-<!-- - **Language:** Unknown -->
-<!-- - **License:** Unknown -->
 ## Usage
@@ -136,12 +147,13 @@ Then you can load this model and run inference.
 from sentence_transformers import SentenceTransformer
 # Download from the 🤗 Hub
-model = SentenceTransformer("sentence_transformers_model_id")
 # Run inference
 sentences = [
-    'في حين أن كل عام يجلب جولة جديدة من المعارك الحزبية في واشنطن حول قانون الضرائب، برنامج EITC هو اقتراح واحد الذي يرضي قطعة واسعة من الطيف السياسي.',
-    'اقتراح برنامج EITC يرضي قسم واسع من الطيف السياسي.',
-    'The proposal of the EITC program satisfies a very narrow section of the political spectrum.',
 ]
 embeddings = model.encode(sentences)
 print(embeddings.shape)

       value: 82.18717939041626
     task:
       type: STS
+language:
+- ar
+- en
 ---
 # SentenceTransformer based on FacebookAI/xlm-roberta-large
+🚀 This **v2.0** from the previously released version of (omarelshehy/arabic-english-sts-matryoshka)[https://huggingface.co/omarelshehy/arabic-english-sts-matryoshka]
+📊 Metrics (MTEB) in this version are better especially on **ar-en** metrics, but again don't just rely on them — test the model yourself and see if it fits your needs! ✅
+# Model description
+This is a **Bilingual** (Arabic-English) [sentence-transformers](https://www.SBERT.net) model finetuned from [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for **semantic textual similarity, semantic search, paraphrase mining, text classification, clustering**, and more.
+The model handles both languages separately 🌐, but also **interchangeably**, which unlocks flexible applications for developers and researchers who want to further build on Arabic models! 💡
 - **Model Type:** Sentence Transformer
 - **Base model:** [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) <!-- at revision c23d21b0620b635a76227c604d44e43a9f0ee389 -->
 - **Maximum Sequence Length:** 512 tokens
 - **Output Dimensionality:** 1024 tokens
 - **Similarity Function:** Cosine Similarity
+## Matryoshka Embeddings 🪆
+This model supports Matryoshka embeddings, allowing you to truncate embeddings into smaller sizes to optimize performance and memory usage, based on your task requirements. Available truncation sizes include: **1024, 768, 512, 256, 128, and 64**
+You can select the appropriate embedding size for your use case, ensuring flexibility in resource management.
 ## Usage
 from sentence_transformers import SentenceTransformer
 # Download from the 🤗 Hub
+matryoshka_dim = 786
+model = SentenceTransformer("omarelshehy/arabic-english-sts-matryoshka", truncate_dim=matryoshka_dim)
 # Run inference
 sentences = [
+    "She enjoyed reading books by the window as the rain poured outside.",
+    "كانت تستمتع بقراءة الكتب بجانب النافذة بينما كانت الأمطار تتساقط في الخارج.",
+    "Reading by the window was her favorite thing, especially during rainy days."
 ]
 embeddings = model.encode(sentences)
 print(embeddings.shape)