File size: 2,501 Bytes
334d425 06f1f6e 334d425 06f1f6e 334d425 06f1f6e 334d425 06f1f6e 334d425 06f1f6e 334d425 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
---
language:
- ru
- en
library_name: transformers
pipeline_tag: feature-extraction
---
# ruclip-vit-base-patch32-384
**RuCLIP** (**Ru**ssian **C**ontrastive **L**anguage–**I**mage **P**retraining) is a multimodal model
for obtaining images and text similarities and rearranging captions and pictures.
RuCLIP builds on a large body of work on zero-shot transfer, computer vision, natural language processing and
multimodal learning.
Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams.
- Task: `text ranking`; `image ranking`; `zero-shot image classification`;
- Type: `encoder`
- Num Parameters: `150M`
- Training Data Volume: `240 million text-image pairs`
- Language: `Russian`
- Context Length: `77`
- Transformer Layers: `12`
- Transformer Width: `512`
- Transformer Heads: `8`
- Image Size: `384`
- Vision Layers: `12`
- Vision Width: `768`
- Vision Patch Size: `32`
## Usage [Github](https://github.com/sberbank-ai/ru-clip)
```
pip install ruclip
```
```python
clip, processor = ruclip.load("ruclip-vit-base-patch32-384", device="cuda")
```
## Performance
We have evaluated the performance on the following datasets:
| Dataset | Metric Name | Metric Result |
| :------------ | :------------- | :------------ |
| Food101 | acc | 0.642 |
| CIFAR10 | acc | 0.862 |
| CIFAR100 | acc | 0.529 |
| Birdsnap | acc | 0.161 |
| SUN397 | acc | 0.510 |
| Stanford Cars | acc | 0.572 |
| DTD | acc | 0.390 |
| MNIST | acc | 0.404 |
| STL10 | acc | 0.946 |
| PCam | acc | 0.506 |
| CLEVR | acc | 0.188 |
| Rendered SST2 | acc | 0.508 |
| ImageNet | acc | 0.451 |
| FGVC Aircraft | mean-per-class | 0.053 |
| Oxford Pets | mean-per-class | 0.587 |
| Caltech101 | mean-per-class | 0.834 |
| Flowers102 | mean-per-class | 0.449 |
| HatefulMemes | roc-auc | 0.537 |
# Authors
- Alex Shonenkov: [Github](https://github.com/shonenkov), [Kaggle GM](https://www.kaggle.com/shonenkov)
- Daniil Chesakov: [Github](https://github.com/Danyache)
- Denis Dimitrov: [Github](https://github.com/denndimitrov)
- Igor Pavlov: [Github](https://github.com/boomb0om) |