ai-forever
/

ruclip-vit-base-patch32-384

Transformers

PyTorch

Inference Endpoints

Model card Files Files and versions Community

Improve README.md

by Cyber - opened Aug 5, 2023

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+52

-46

Files changed (1) hide show

README.md +52 -46

README.md CHANGED Viewed

@@ -1,24 +1,33 @@
 # ruclip-vit-base-patch32-384
-**RuCLIP** (**Ru**ssian **C**ontrastive **L**anguage–**I**mage **P**retraining) is a multimodal model
-for obtaining images and text similarities and rearranging captions and pictures.
-RuCLIP builds on a large body of work on zero-shot transfer, computer vision, natural language processing and
-multimodal learning.
-Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams.
-* Task: `text ranking`; `image ranking`; `zero-shot image classification`;
-* Type: `encoder`
-* Num Parameters: `150M`
-* Training Data Volume: `240 million text-image pairs`
-* Language: `Russian`
-* Context Length: `77`
-* Transformer Layers: `12`
-* Transformer Width: `512`
-* Transformer Heads: `8`
-* Image Size: `384`
-* Vision Layers: `12`
-* Vision Width: `768`
-* Vision Patch Size: `32`
 ## Usage [Github](https://github.com/sberbank-ai/ru-clip)
@@ -30,37 +39,34 @@ pip install ruclip
 clip, processor = ruclip.load("ruclip-vit-base-patch32-384", device="cuda")
 ```
 ## Performance
-We have evaluated the performance on the following datasets:
-| Dataset       | Metric Name    | Metric Result               |
-|:--------------|:---------------|:----------------------------|
-| Food101       | acc            | 0.642                       |
-| CIFAR10       | acc            | 0.862                       |
-| CIFAR100      | acc            | 0.529                       |
-| Birdsnap      | acc            | 0.161                       |
-| SUN397        | acc            | 0.510                       |
-| Stanford Cars | acc            | 0.572                       |
-| DTD           | acc            | 0.390	                     |
-| MNIST         | acc            | 0.404	                     |
-| STL10         | acc            | 0.946	                     |
-| PCam          | acc            | 0.506                       |
-| CLEVR         | acc            | 0.188                       |
-| Rendered SST2 | acc            | 0.508                       |
-| ImageNet      | acc            | 0.451                       |
-| FGVC Aircraft | mean-per-class | 0.053                       |
-| Oxford Pets   | mean-per-class | 0.587                       |
-| Caltech101    | mean-per-class | 0.834	                     |
-| Flowers102    | mean-per-class | 0.449                       |
-| HatefulMemes  | roc-auc        | 0.537                       |
 # Authors
-+ Alex Shonenkov: [Github](https://github.com/shonenkov), [Kaggle GM](https://www.kaggle.com/shonenkov)
-+ Daniil Chesakov: [Github](https://github.com/Danyache)
-+ Denis Dimitrov: [Github](https://github.com/denndimitrov)
-+ Igor Pavlov: [Github](https://github.com/boomb0om)

+---
+language:
+- ru
+- en
+library_name: transformers
+pipeline_tag: feature-extraction
+---
 # ruclip-vit-base-patch32-384
+**RuCLIP** (**Ru**ssian **C**ontrastive **L**anguage–**I**mage **P**retraining) is a multimodal model
+for obtaining images and text similarities and rearranging captions and pictures.
+RuCLIP builds on a large body of work on zero-shot transfer, computer vision, natural language processing and
+multimodal learning.
+Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams.
+- Task: `text ranking`; `image ranking`; `zero-shot image classification`;
+- Type: `encoder`
+- Num Parameters: `150M`
+- Training Data Volume: `240 million text-image pairs`
+- Language: `Russian`
+- Context Length: `77`
+- Transformer Layers: `12`
+- Transformer Width: `512`
+- Transformer Heads: `8`
+- Image Size: `384`
+- Vision Layers: `12`
+- Vision Width: `768`
+- Vision Patch Size: `32`
 ## Usage [Github](https://github.com/sberbank-ai/ru-clip)
 clip, processor = ruclip.load("ruclip-vit-base-patch32-384", device="cuda")
 ```
 ## Performance
+We have evaluated the performance on the following datasets:
+| Dataset       | Metric Name    | Metric Result |
+| :------------ | :------------- | :------------ |
+| Food101       | acc            | 0.642         |
+| CIFAR10       | acc            | 0.862         |
+| CIFAR100      | acc            | 0.529         |
+| Birdsnap      | acc            | 0.161         |
+| SUN397        | acc            | 0.510         |
+| Stanford Cars | acc            | 0.572         |
+| DTD           | acc            | 0.390         |
+| MNIST         | acc            | 0.404         |
+| STL10         | acc            | 0.946         |
+| PCam          | acc            | 0.506         |
+| CLEVR         | acc            | 0.188         |
+| Rendered SST2 | acc            | 0.508         |
+| ImageNet      | acc            | 0.451         |
+| FGVC Aircraft | mean-per-class | 0.053         |
+| Oxford Pets   | mean-per-class | 0.587         |
+| Caltech101    | mean-per-class | 0.834         |
+| Flowers102    | mean-per-class | 0.449         |
+| HatefulMemes  | roc-auc        | 0.537         |
 # Authors
+- Alex Shonenkov: [Github](https://github.com/shonenkov), [Kaggle GM](https://www.kaggle.com/shonenkov)
+- Daniil Chesakov: [Github](https://github.com/Danyache)
+- Denis Dimitrov: [Github](https://github.com/denndimitrov)
+- Igor Pavlov: [Github](https://github.com/boomb0om)