Safetensors
matveymih commited on
Commit
02f7bf9
1 Parent(s): e109ff2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -5
README.md CHANGED
@@ -33,11 +33,17 @@ To further enhance the multimodal capabilities of the model, we use learnable cu
33
 
34
  1. Pre-training the adapter on Image Captioning tasks (LAION, CC-4M, etc.).
35
  2. Once the adapter has learned to map visual embeddings to the language model's textual space, we proceed to unfreeze Mistral for improved understanding of dialog formats and complex queries.
36
- 3. The dataset consists of data in English and Russian, the English part has the following structure:
37
-
38
- <p align="left">
39
- <img src="https://raw.githubusercontent.com/AIRI-Institute/OmniFusion/main/content/datasets.png" width="70%">
40
- </p>
 
 
 
 
 
 
41
 
42
  ### Results
43
 
 
33
 
34
  1. Pre-training the adapter on Image Captioning tasks (LAION, CC-4M, etc.).
35
  2. Once the adapter has learned to map visual embeddings to the language model's textual space, we proceed to unfreeze Mistral for improved understanding of dialog formats and complex queries.
36
+ 3. The dataset consists of data in English and Russian and has the following structure:
37
+
38
+ | Task | Dataset Source | #Samples |
39
+ | --------------| ---------------------------------- | --------- |
40
+ | Caption | ShareGPT4V | 100K |
41
+ | VQA | COCO, SAM-9K | 20K, 9K |
42
+ | WebQA | WebData | 1.5K |
43
+ | OCRQA | TextVQA, OCRVQA | 120K |
44
+ | Conversation | LLaVA-v1.5-665K, OCRVQA | 665K |
45
+ | DocVQA | Proprietary data (ru) | 20K |
46
+ | Text-only SFT | Proprietary data (ru), Alpaca (en) | 10K |
47
 
48
  ### Results
49