Update README.md
Browse files
README.md
CHANGED
@@ -33,11 +33,17 @@ To further enhance the multimodal capabilities of the model, we use learnable cu
|
|
33 |
|
34 |
1. Pre-training the adapter on Image Captioning tasks (LAION, CC-4M, etc.).
|
35 |
2. Once the adapter has learned to map visual embeddings to the language model's textual space, we proceed to unfreeze Mistral for improved understanding of dialog formats and complex queries.
|
36 |
-
3. The dataset consists of data in English and Russian
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
|
42 |
### Results
|
43 |
|
|
|
33 |
|
34 |
1. Pre-training the adapter on Image Captioning tasks (LAION, CC-4M, etc.).
|
35 |
2. Once the adapter has learned to map visual embeddings to the language model's textual space, we proceed to unfreeze Mistral for improved understanding of dialog formats and complex queries.
|
36 |
+
3. The dataset consists of data in English and Russian and has the following structure:
|
37 |
+
|
38 |
+
| Task | Dataset Source | #Samples |
|
39 |
+
| --------------| ---------------------------------- | --------- |
|
40 |
+
| Caption | ShareGPT4V | 100K |
|
41 |
+
| VQA | COCO, SAM-9K | 20K, 9K |
|
42 |
+
| WebQA | WebData | 1.5K |
|
43 |
+
| OCRQA | TextVQA, OCRVQA | 120K |
|
44 |
+
| Conversation | LLaVA-v1.5-665K, OCRVQA | 665K |
|
45 |
+
| DocVQA | Proprietary data (ru) | 20K |
|
46 |
+
| Text-only SFT | Proprietary data (ru), Alpaca (en) | 10K |
|
47 |
|
48 |
### Results
|
49 |
|