Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Introduction
|
2 |
|
3 |
-
The Infinity-VL-2B
|
4 |
|
5 |
The model was trained on our self-built Infinity-MM dataset, which contains approximately 40 million image-text pairs. This dataset is a combination of open-source data collected from the internet and synthetic instruction data generated using open-source VLM models.
|
6 |
|
@@ -8,9 +18,9 @@ We plan to open-source the Infinity-MM dataset, training scripts, and related re
|
|
8 |
|
9 |
# Evaluation
|
10 |
|
11 |
-
We evaluated the model using the [VLMEvalKit]
|
12 |
|
13 |
-
| Test sets | MiniCPM-V-2 | InternVL2-2B | XinYuan-VL-2B | Qwen2-VL-2B-Instruct | Infinity-VL-2B
|
14 |
|:----------------:|:--------------:|:---------------:|:----------------:|:-----------------------:|:-----------------:|
|
15 |
| MMMU\_DEV\_VAL | 39.56 | 34.89 | 43.56 | 41.67 | **45.89** |
|
16 |
| MMStar | 41.6 | 50.2 | 51.87 | 47.8 | **54.4** |
|
@@ -18,7 +28,7 @@ We evaluated the model using the [VLMEvalKit] (GitHub - open-compass/VLMEvalKit:
|
|
18 |
| MathVista\_MINI | 39 | 45 | 47.1 | 47.9 | **57.8** |
|
19 |
| HallusionBench | 36.83 | 38.06 | 36.03 | 41.52 | **42.64** |
|
20 |
| OCRBench | 613 | 784 | 782 | **810** | 776 |
|
21 |
-
|
|
22 |
| MMVet | 44.04 | 41.1 | 42.66 | **50.73** | 44.27 |
|
23 |
| DocVQA\_TEST | 71.02 | 86.87 | 87.63 | **89.87** | **76.56** |
|
24 |
| ChartQA\_TEST | 59.64 | 71.4 | 57.08 | 73.52 | 76.56 |
|
@@ -30,7 +40,7 @@ We evaluated the model using the [VLMEvalKit] (GitHub - open-compass/VLMEvalKit:
|
|
30 |
| MMT-Bench\_ALL | 54.46 | 53.31 | **57.24** | 54.78 | 56.19 |
|
31 |
| MathVision | 15.43 | 12.6 | 16.32 | 17.47 | **18.52** |
|
32 |
| OCRVQA\_TESTCORE | 54.43 | 40.23 | 67.64 | **68.68** | 63.83 |
|
33 |
-
|Average| 52.09
|
34 |
|
35 |
|
36 |
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
tags:
|
6 |
+
- multimodal
|
7 |
+
library_name: transformers
|
8 |
+
---
|
9 |
+
|
10 |
+
|
11 |
# Introduction
|
12 |
|
13 |
+
The Infinity-VL-2B model is a vision-language model (VLM) trained based on the [LLava-one-vision](https://llava-vl.github.io/blog/2024-08-05-llava-onevision/) framework. The [Qwen2.5-1.5B-instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) model is chose as the LLM, while [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) is utilized as the vision tower.
|
14 |
|
15 |
The model was trained on our self-built Infinity-MM dataset, which contains approximately 40 million image-text pairs. This dataset is a combination of open-source data collected from the internet and synthetic instruction data generated using open-source VLM models.
|
16 |
|
|
|
18 |
|
19 |
# Evaluation
|
20 |
|
21 |
+
We evaluated the model using the [VLMEvalKit](GitHub - open-compass/VLMEvalKit: Open-source evaluation toolkit of large vision-language models (LV) tool. Whenever possible, we prioritized using the GPT-4 API for test sets that support API-based evaluation.
|
22 |
|
23 |
+
| Test sets | MiniCPM-V-2 | InternVL2-2B | XinYuan-VL-2B | Qwen2-VL-2B-Instruct | Infinity-VL-2B |
|
24 |
|:----------------:|:--------------:|:---------------:|:----------------:|:-----------------------:|:-----------------:|
|
25 |
| MMMU\_DEV\_VAL | 39.56 | 34.89 | 43.56 | 41.67 | **45.89** |
|
26 |
| MMStar | 41.6 | 50.2 | 51.87 | 47.8 | **54.4** |
|
|
|
28 |
| MathVista\_MINI | 39 | 45 | 47.1 | 47.9 | **57.8** |
|
29 |
| HallusionBench | 36.83 | 38.06 | 36.03 | 41.52 | **42.64** |
|
30 |
| OCRBench | 613 | 784 | 782 | **810** | 776 |
|
31 |
+
| AI2D\_TEST | 64.8 | 74.38 | 74.22 | **74.64** | 74.38 |
|
32 |
| MMVet | 44.04 | 41.1 | 42.66 | **50.73** | 44.27 |
|
33 |
| DocVQA\_TEST | 71.02 | 86.87 | 87.63 | **89.87** | **76.56** |
|
34 |
| ChartQA\_TEST | 59.64 | 71.4 | 57.08 | 73.52 | 76.56 |
|
|
|
40 |
| MMT-Bench\_ALL | 54.46 | 53.31 | **57.24** | 54.78 | 56.19 |
|
41 |
| MathVision | 15.43 | 12.6 | 16.32 | 17.47 | **18.52** |
|
42 |
| OCRVQA\_TESTCORE | 54.43 | 40.23 | 67.64 | **68.68** | 63.83 |
|
43 |
+
|Average| 52.09 | 57.79 |60.68 | 61.96 |**62.92** |
|
44 |
|
45 |
|
46 |
|