ai-forever
/

ruclip-vit-base-patch32-384

Inference Endpoints

Model card Files Files and versions Community

ai-forever commited on Jan 10, 2022

Commit

06f1f6e

•

1 Parent(s): 6191103

Create README.md

Files changed (1) hide show

README.md +66 -0

README.md ADDED Viewed

	@@ -0,0 +1,66 @@

+# ruclip-vit-base-patch32-384
+**RuCLIP** (**Ru**ssian **C**ontrastive **L**anguage–**I**mage **P**retraining) is a multimodal model
+for obtaining images and text similarities and rearranging captions and pictures.
+RuCLIP builds on a large body of work on zero-shot transfer, computer vision, natural language processing and
+multimodal learning.
+Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams.
+* Task: `text ranking`; `image ranking`; `zero-shot image classification`;
+* Type: `encoder`
+* Num Parameters: `150M`
+* Training Data Volume: `240 million text-image pairs`
+* Language: `Russian`
+* Context Length: `77`
+* Transformer Layers: `12`
+* Transformer Width: `512`
+* Transformer Heads: `8`
+* Image Size: `384`
+* Vision Layers: `12`
+* Vision Width: `768`
+* Vision Patch Size: `32`
+## Usage [Github](https://github.com/sberbank-ai/ru-clip)
+```
+pip install ruclip
+```
+```python
+clip, processor = ruclip.load("ruclip-vit-base-patch32-384", device="cuda")
+```
+## Performance
+We have evaluated the performance on the following datasets:
+| Dataset       | Metric Name    | Metric Result               |
+|:--------------|:---------------|:----------------------------|
+| Food101       | acc            | 0.642                       |
+| CIFAR10       | acc            | 0.862                       |
+| CIFAR100      | acc            | 0.529                       |
+| Birdsnap      | acc            | 0.161                       |
+| SUN397        | acc            | 0.510                       |
+| Stanford Cars | acc            | 0.572                       |
+| DTD           | acc            | 0.390	                     |
+| MNIST         | acc            | 0.404	                     |
+| STL10         | acc            | 0.946	                     |
+| PCam          | acc            | 0.506                       |
+| CLEVR         | acc            | 0.188                       |
+| Rendered SST2 | acc            | 0.508                       |
+| ImageNet      | acc            | 0.451                       |
+| FGVC Aircraft | mean-per-class | 0.053                       |
+| Oxford Pets   | mean-per-class | 0.587                       |
+| Caltech101    | mean-per-class | 0.834	                     |
+| Flowers102    | mean-per-class | 0.449                       |
+| HatefulMemes  | roc-auc        | 0.537                       |
+# Authors
++ Alex Shonenkov: [Github](https://github.com/shonenkov), [Kaggle GM](https://www.kaggle.com/shonenkov)
++ Daniil Chesakov: [Github](https://github.com/Danyache)
++ Denis Dimitrov: [Github](https://github.com/denndimitrov)
++ Igor Pavlov: [Github](https://github.com/boomb0om)