Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ library_name: transformers
|
|
10 |
|
11 |
# Introduction
|
12 |
|
13 |
-
The Aquila-VL-2B
|
14 |
|
15 |
The model was trained on our self-built Infinity-MM dataset, which contains approximately 40 million image-text pairs. This dataset is a combination of open-source data collected from the internet and synthetic instruction data generated using open-source VLM models.
|
16 |
|
|
|
10 |
|
11 |
# Introduction
|
12 |
|
13 |
+
The Aquila-VL-2B model is a vision-language model (VLM) trained based on the [LLava-one-vision](https://llava-vl.github.io/blog/2024-08-05-llava-onevision/) framework. The [Qwen2.5-1.5B-instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) model is chose as the LLM, while [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) is utilized as the vision tower.
|
14 |
|
15 |
The model was trained on our self-built Infinity-MM dataset, which contains approximately 40 million image-text pairs. This dataset is a combination of open-source data collected from the internet and synthetic instruction data generated using open-source VLM models.
|
16 |
|