File size: 1,501 Bytes
976782c b2c307a ab9209f b2c307a ab9209f b2c307a 976782c b2c307a 976782c b2c307a 976782c b2c307a ab9209f b2c307a 976782c b2c307a ab9209f b2c307a 976782c b2c307a ab9209f b2c307a ab9209f b2c307a 976782c b2c307a 976782c b2c307a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
---
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
---
# clip-ViT-B-32
This is the Image & Text model [CLIP](https://arxiv.org/abs/2103.00020), which maps text and images to a shared vector space. For applications of the models, have a look in our documentation [SBERT.net - Image Search](https://www.sbert.net/examples/applications/image-search/README.html)
## Usage
After installing [sentence-transformers](https://sbert.net) (`pip install sentence-transformers`), the usage of this model is easy:
```python
from sentence_transformers import SentenceTransformer, util
from PIL import Image
#Load CLIP model
model = SentenceTransformer('clip-ViT-B-32')
#Encode an image:
img_emb = model.encode(Image.open('two_dogs_in_snow.jpg'))
#Encode text descriptions
text_emb = model.encode(['Two dogs in the snow', 'A cat on a table', 'A picture of London at night'])
#Compute cosine similarities
cos_scores = util.cos_sim(img_emb, text_emb)
print(cos_scores)
```
See our [SBERT.net - Image Search](https://www.sbert.net/examples/applications/image-search/README.html) documentation for more examples how the model can be used for image search, zero-shot image classification, image clustering and image deduplication.
## Performance
In the following table we find the zero-shot ImageNet validation set accuracy:
| Model | Top 1 Performance |
| --- | :---: |
| clip-ViT-B-32 | 63.3 |
| clip-ViT-B-16 | 68.1 |
| clip-ViT-L-14 | 75.4 | |