Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ tags:
|
|
8 |
---
|
9 |
# TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance
|
10 |
|
11 |
-
|
12 |
|
13 |
**TinyCLIP** is a novel **cross-modal distillation** method for large-scale language-image pre-trained models. The method introduces two core techniques: **affinity mimicking** and **weight inheritance**. This work unleashes the capacity of small CLIP models, fully leveraging large-scale models as well as pre-training data and striking the best trade-off between speed and accuracy.
|
14 |
|
@@ -19,7 +19,7 @@ tags:
|
|
19 |
|
20 |
## Use with Transformers
|
21 |
|
22 |
-
```
|
23 |
from PIL import Image
|
24 |
import requests
|
25 |
|
@@ -38,21 +38,13 @@ logits_per_image = outputs.logits_per_image # this is the image-text similarity
|
|
38 |
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
|
39 |
```
|
40 |
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
## Highlights
|
46 |
<p align="center">
|
47 |
<img src="./figure/fig1.jpg" width="500">
|
48 |
</p>
|
49 |
|
50 |
* TinyCLIP ViT-45M/32 uses only **half parameters** of ViT-B/32 to achieves **comparable zero-shot performance**.
|
51 |
-
* TinyCLIP ResNet-19M reduces the parameters by **50\%** while getting
|
52 |
-
|
53 |
-
## News
|
54 |
-
* *Oct.2023* Training code is released.
|
55 |
-
* *Sep.2023* This is preliminary released code, including inference code and checkpoints.
|
56 |
|
57 |
## Model Zoo
|
58 |
| Model | Weight inheritance | Pretrain | IN-1K Acc@1(%) | MACs(G) | Throughput(pairs/s) | Link |
|
@@ -71,20 +63,8 @@ TinyCLIP ViT-45M/32 Text-18M | auto | LAION+YFCC-400M | 62.7 | 1.9 | 3,685 | [M
|
|
71 |
|
72 |
Note: The configs of models with auto inheritance are generated automatically.
|
73 |
|
74 |
-
##
|
75 |
-
|
76 |
-
|
77 |
-
### Install dependencies and prepare dataset
|
78 |
-
- [Preparation](./docs/PREPARATION.md)
|
79 |
-
|
80 |
-
### Evaluate it
|
81 |
-
- [Evaluation](./docs/EVALUATION.md)
|
82 |
-
|
83 |
-
### An example for inference
|
84 |
-
- [Inference](./inference.py)
|
85 |
-
|
86 |
-
### Pretrain it
|
87 |
-
- [Pretraining](./docs/PRETRAINING.md)
|
88 |
|
89 |
## Citation
|
90 |
If this repo is helpful for you, please consider to cite it. :mega: Thank you! :)
|
@@ -105,4 +85,4 @@ If this repo is helpful for you, please consider to cite it. :mega: Thank you! :
|
|
105 |
Our code is based on [CLIP](https://github.com/openai/CLIP), [OpenCLIP](https://github.com/mlfoundations/open_clip), [CoFi](https://github.com/princeton-nlp/CoFiPruning) and [PyTorch](https://github.com/pytorch/pytorch). Thank contributors for their awesome contribution!
|
106 |
|
107 |
## License
|
108 |
-
- [License](
|
|
|
8 |
---
|
9 |
# TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance
|
10 |
|
11 |
+
**[ICCV 2023]** - [TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance](https://openaccess.thecvf.com/content/ICCV2023/html/Wu_TinyCLIP_CLIP_Distillation_via_Affinity_Mimicking_and_Weight_Inheritance_ICCV_2023_paper.html)
|
12 |
|
13 |
**TinyCLIP** is a novel **cross-modal distillation** method for large-scale language-image pre-trained models. The method introduces two core techniques: **affinity mimicking** and **weight inheritance**. This work unleashes the capacity of small CLIP models, fully leveraging large-scale models as well as pre-training data and striking the best trade-off between speed and accuracy.
|
14 |
|
|
|
19 |
|
20 |
## Use with Transformers
|
21 |
|
22 |
+
```python
|
23 |
from PIL import Image
|
24 |
import requests
|
25 |
|
|
|
38 |
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
|
39 |
```
|
40 |
|
|
|
|
|
|
|
|
|
41 |
## Highlights
|
42 |
<p align="center">
|
43 |
<img src="./figure/fig1.jpg" width="500">
|
44 |
</p>
|
45 |
|
46 |
* TinyCLIP ViT-45M/32 uses only **half parameters** of ViT-B/32 to achieves **comparable zero-shot performance**.
|
47 |
+
* TinyCLIP ResNet-19M reduces the parameters by **50\%** while getting **2x** inference speedup, and obtains **56.4\%** accuracy on ImageNet.
|
|
|
|
|
|
|
|
|
48 |
|
49 |
## Model Zoo
|
50 |
| Model | Weight inheritance | Pretrain | IN-1K Acc@1(%) | MACs(G) | Throughput(pairs/s) | Link |
|
|
|
63 |
|
64 |
Note: The configs of models with auto inheritance are generated automatically.
|
65 |
|
66 |
+
## Official PyTorch Implementation
|
67 |
+
https://github.com/microsoft/Cream/tree/main/TinyCLIP
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
68 |
|
69 |
## Citation
|
70 |
If this repo is helpful for you, please consider to cite it. :mega: Thank you! :)
|
|
|
85 |
Our code is based on [CLIP](https://github.com/openai/CLIP), [OpenCLIP](https://github.com/mlfoundations/open_clip), [CoFi](https://github.com/princeton-nlp/CoFiPruning) and [PyTorch](https://github.com/pytorch/pytorch). Thank contributors for their awesome contribution!
|
86 |
|
87 |
## License
|
88 |
+
- [License](https://github.com/microsoft/Cream/blob/main/TinyCLIP/LICENSE)
|