Cyber commited on
Commit
334d425
1 Parent(s): 1f7f08e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -46
README.md CHANGED
@@ -1,24 +1,33 @@
 
 
 
 
 
 
 
 
1
  # ruclip-vit-base-patch32-384
2
 
3
- **RuCLIP** (**Ru**ssian **C**ontrastive **L**anguage–**I**mage **P**retraining) is a multimodal model
4
- for obtaining images and text similarities and rearranging captions and pictures.
5
- RuCLIP builds on a large body of work on zero-shot transfer, computer vision, natural language processing and
6
- multimodal learning.
 
 
7
 
8
- Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams.
9
- * Task: `text ranking`; `image ranking`; `zero-shot image classification`;
10
- * Type: `encoder`
11
- * Num Parameters: `150M`
12
- * Training Data Volume: `240 million text-image pairs`
13
- * Language: `Russian`
14
- * Context Length: `77`
15
- * Transformer Layers: `12`
16
- * Transformer Width: `512`
17
- * Transformer Heads: `8`
18
- * Image Size: `384`
19
- * Vision Layers: `12`
20
- * Vision Width: `768`
21
- * Vision Patch Size: `32`
22
 
23
  ## Usage [Github](https://github.com/sberbank-ai/ru-clip)
24
 
@@ -30,37 +39,34 @@ pip install ruclip
30
  clip, processor = ruclip.load("ruclip-vit-base-patch32-384", device="cuda")
31
  ```
32
 
33
-
34
  ## Performance
35
- We have evaluated the performance on the following datasets:
36
-
37
- | Dataset | Metric Name | Metric Result |
38
- |:--------------|:---------------|:----------------------------|
39
- | Food101 | acc | 0.642 |
40
- | CIFAR10 | acc | 0.862 |
41
- | CIFAR100 | acc | 0.529 |
42
- | Birdsnap | acc | 0.161 |
43
- | SUN397 | acc | 0.510 |
44
- | Stanford Cars | acc | 0.572 |
45
- | DTD | acc | 0.390 |
46
- | MNIST | acc | 0.404 |
47
- | STL10 | acc | 0.946 |
48
- | PCam | acc | 0.506 |
49
- | CLEVR | acc | 0.188 |
50
- | Rendered SST2 | acc | 0.508 |
51
- | ImageNet | acc | 0.451 |
52
- | FGVC Aircraft | mean-per-class | 0.053 |
53
- | Oxford Pets | mean-per-class | 0.587 |
54
- | Caltech101 | mean-per-class | 0.834 |
55
- | Flowers102 | mean-per-class | 0.449 |
56
- | HatefulMemes | roc-auc | 0.537 |
57
-
58
 
 
59
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
 
61
  # Authors
62
 
63
- + Alex Shonenkov: [Github](https://github.com/shonenkov), [Kaggle GM](https://www.kaggle.com/shonenkov)
64
- + Daniil Chesakov: [Github](https://github.com/Danyache)
65
- + Denis Dimitrov: [Github](https://github.com/denndimitrov)
66
- + Igor Pavlov: [Github](https://github.com/boomb0om)
 
1
+ ---
2
+ language:
3
+ - ru
4
+ - en
5
+ library_name: transformers
6
+ pipeline_tag: feature-extraction
7
+ ---
8
+
9
  # ruclip-vit-base-patch32-384
10
 
11
+ **RuCLIP** (**Ru**ssian **C**ontrastive **L**anguage–**I**mage **P**retraining) is a multimodal model
12
+ for obtaining images and text similarities and rearranging captions and pictures.
13
+ RuCLIP builds on a large body of work on zero-shot transfer, computer vision, natural language processing and
14
+ multimodal learning.
15
+
16
+ Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams.
17
 
18
+ - Task: `text ranking`; `image ranking`; `zero-shot image classification`;
19
+ - Type: `encoder`
20
+ - Num Parameters: `150M`
21
+ - Training Data Volume: `240 million text-image pairs`
22
+ - Language: `Russian`
23
+ - Context Length: `77`
24
+ - Transformer Layers: `12`
25
+ - Transformer Width: `512`
26
+ - Transformer Heads: `8`
27
+ - Image Size: `384`
28
+ - Vision Layers: `12`
29
+ - Vision Width: `768`
30
+ - Vision Patch Size: `32`
 
31
 
32
  ## Usage [Github](https://github.com/sberbank-ai/ru-clip)
33
 
 
39
  clip, processor = ruclip.load("ruclip-vit-base-patch32-384", device="cuda")
40
  ```
41
 
 
42
  ## Performance
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
+ We have evaluated the performance on the following datasets:
45
 
46
+ | Dataset | Metric Name | Metric Result |
47
+ | :------------ | :------------- | :------------ |
48
+ | Food101 | acc | 0.642 |
49
+ | CIFAR10 | acc | 0.862 |
50
+ | CIFAR100 | acc | 0.529 |
51
+ | Birdsnap | acc | 0.161 |
52
+ | SUN397 | acc | 0.510 |
53
+ | Stanford Cars | acc | 0.572 |
54
+ | DTD | acc | 0.390 |
55
+ | MNIST | acc | 0.404 |
56
+ | STL10 | acc | 0.946 |
57
+ | PCam | acc | 0.506 |
58
+ | CLEVR | acc | 0.188 |
59
+ | Rendered SST2 | acc | 0.508 |
60
+ | ImageNet | acc | 0.451 |
61
+ | FGVC Aircraft | mean-per-class | 0.053 |
62
+ | Oxford Pets | mean-per-class | 0.587 |
63
+ | Caltech101 | mean-per-class | 0.834 |
64
+ | Flowers102 | mean-per-class | 0.449 |
65
+ | HatefulMemes | roc-auc | 0.537 |
66
 
67
  # Authors
68
 
69
+ - Alex Shonenkov: [Github](https://github.com/shonenkov), [Kaggle GM](https://www.kaggle.com/shonenkov)
70
+ - Daniil Chesakov: [Github](https://github.com/Danyache)
71
+ - Denis Dimitrov: [Github](https://github.com/denndimitrov)
72
+ - Igor Pavlov: [Github](https://github.com/boomb0om)