sentence-transformers
/

clip-ViT-B-32-multilingual-v1

Sentence Similarity

sentence-transformers

feature-extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

nreimers commited on Nov 2, 2021

Commit

584f97e

•

1 Parent(s): fc29aef

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -88,6 +88,12 @@ For a demo of multilingual image search, have a look at: [Image_Search-multiling
 For more details on image search and zero-shot image classification, have a look at the documentation on [SBERT.net](https://www.sbert.net/examples/applications/image-search/README.html).
 ## Full Model Architecture

 For more details on image search and zero-shot image classification, have a look at the documentation on [SBERT.net](https://www.sbert.net/examples/applications/image-search/README.html).
+## Training
+This model has been created using [Multilingual Knowledge Distillation](https://arxiv.org/abs/2004.09813). As teacher model, we used the original `clip-ViT-B-32` and then trained a [multilingual DistilBERT](https://huggingface.co/distilbert-base-multilingual-cased) model as student model. Using parallel data, the multilingual student model learns to align the teachers vector space across many languages. As a result, you get an text embedding model that works for 50+ languages.
+The image encoder from CLIP is unchanged, i.e. you can use the original CLIP image encoder to encode images.
+Have a look at the [SBERT.net - Multilingual-Models documentation](https://www.sbert.net/examples/training/multilingual/README.html) on more details and for **training code**.
 ## Full Model Architecture