nreimers commited on
Commit
b2c307a
1 Parent(s): dbd2f22

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -32
README.md CHANGED
@@ -6,48 +6,41 @@ tags:
6
  - sentence-similarity
7
  ---
8
 
9
- # {MODEL_NAME}
10
 
11
- This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a None dimensional dense vector space and can be used for tasks like clustering or semantic search.
12
 
13
- <!--- Describe your model here -->
14
 
15
- ## Usage (Sentence-Transformers)
16
-
17
- Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
18
-
19
- ```
20
- pip install -U sentence-transformers
21
- ```
22
-
23
- Then you can use the model like this:
24
 
 
25
  ```python
26
- from sentence_transformers import SentenceTransformer
27
- sentences = ["This is an example sentence", "Each sentence is converted"]
28
-
29
- model = SentenceTransformer('{MODEL_NAME}')
30
- embeddings = model.encode(sentences)
31
- print(embeddings)
32
- ```
33
-
34
-
35
 
36
- ## Evaluation Results
 
37
 
38
- <!--- Describe how your model was evaluated -->
 
39
 
40
- For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
 
41
 
 
 
 
 
42
 
 
43
 
44
- ## Full Model Architecture
45
- ```
46
- SentenceTransformer(
47
- (0): CLIPModel()
48
- )
49
- ```
50
 
51
- ## Citing & Authors
52
 
53
- <!--- Describe where people can find more information -->
 
 
 
 
 
6
  - sentence-similarity
7
  ---
8
 
9
+ # clip-ViT-B-32
10
 
11
+ This is the Image & Text model [CLIP](https://arxiv.org/abs/2103.00020), which maps text and images to a shared vector space. For applications of the models, have a look in our documentation [SBERT.net - Image Search](https://www.sbert.net/examples/applications/image-search/README.html)
12
 
13
+ ## Usage
14
 
15
+ After installing [sentence-transformers](https://sbert.net) (`pip install sentence-transformers`), the usage of this model is easy:
 
 
 
 
 
 
 
 
16
 
17
+
18
  ```python
19
+ from sentence_transformers import SentenceTransformer, util
20
+ from PIL import Image
 
 
 
 
 
 
 
21
 
22
+ #Load CLIP model
23
+ model = SentenceTransformer('clip-ViT-B-32')
24
 
25
+ #Encode an image:
26
+ img_emb = model.encode(Image.open('two_dogs_in_snow.jpg'))
27
 
28
+ #Encode text descriptions
29
+ text_emb = model.encode(['Two dogs in the snow', 'A cat on a table', 'A picture of London at night'])
30
 
31
+ #Compute cosine similarities
32
+ cos_scores = util.cos_sim(img_emb, text_emb)
33
+ print(cos_scores)
34
+ ```
35
 
36
+ See our [SBERT.net - Image Search](https://www.sbert.net/examples/applications/image-search/README.html) documentation for more examples how the model can be used for image search, zero-shot image classification, image clustering and image deduplication.
37
 
38
+ ## Performance
 
 
 
 
 
39
 
40
+ In the following table we find the zero-shot ImageNet validation set accuracy:
41
 
42
+ | Model | Top 1 Performance |
43
+ | --- | :---: |
44
+ | clip-ViT-B-32 | 63.3 |
45
+ | clip-ViT-B-16 | 68.1 |
46
+ | clip-ViT-L-14 | 75.4 |