Update README.md
Browse files
README.md
CHANGED
@@ -10,17 +10,15 @@ library_name: transformers
|
|
10 |
|
11 |
# Introduction
|
12 |
|
13 |
-
The Aquila-VL-2B model is a vision-language model (VLM) trained based on the [LLava-one-vision](https://llava-vl.github.io/blog/2024-08-05-llava-onevision/) framework. The [Qwen2.5-1.5B-instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) model is chose as the LLM, while [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) is utilized as the vision tower.
|
14 |
|
15 |
The model was trained on our self-built Infinity-MM dataset, which contains approximately 40 million image-text pairs. This dataset is a combination of open-source data collected from the internet and synthetic instruction data generated using open-source VLM models.
|
16 |
|
17 |
|
18 |
-
We have open-sourced Infinity-MM dataset and related resources. We hope you enjoy using them!
|
19 |
|
20 |
## News
|
21 |
-
- `2024/10/25`: The
|
22 |
-
|
23 |
-
<!-- We plan to open-source the Infinity-MM dataset, training scripts, and related resources in the near future. For more technical details, stay tuned for our upcoming technical report. -->
|
24 |
|
25 |
# Evaluation
|
26 |
|
@@ -62,8 +60,6 @@ For comparison models, evaluations were conducted in a local environment, so the
|
|
62 |
|
63 |
* We plan to train models of various sizes.
|
64 |
* Future training will incorporate multi-image and video data.
|
65 |
-
<!-- * A comprehensive technical report will be released. -->
|
66 |
-
<!-- * We will open-source the Infinity-MM dataset and training code. -->
|
67 |
|
68 |
|
69 |
# Disclaimer
|
|
|
10 |
|
11 |
# Introduction
|
12 |
|
13 |
+
The **Aquila-VL-2B** model is a vision-language model (VLM) trained based on the [LLava-one-vision](https://llava-vl.github.io/blog/2024-08-05-llava-onevision/) framework. The [Qwen2.5-1.5B-instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) model is chose as the LLM, while [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) is utilized as the vision tower.
|
14 |
|
15 |
The model was trained on our self-built Infinity-MM dataset, which contains approximately 40 million image-text pairs. This dataset is a combination of open-source data collected from the internet and synthetic instruction data generated using open-source VLM models.
|
16 |
|
17 |
|
18 |
+
We have open-sourced [Infinity-MM](https://huggingface.co/datasets/BAAI/Infinity-MM) dataset and related resources. We hope you enjoy using them!
|
19 |
|
20 |
## News
|
21 |
+
- `2024/10/25`: The [Aquila-VL-2B](https://huggingface.co/BAAI/Aquila-VL-2B-llava-qwen) model and [Infinity-MM](https://huggingface.co/datasets/BAAI/Infinity-MM) dataset are now available. We have also released the [technical report](https://arxiv.org/abs/2410.18558) simultaneously.
|
|
|
|
|
22 |
|
23 |
# Evaluation
|
24 |
|
|
|
60 |
|
61 |
* We plan to train models of various sizes.
|
62 |
* Future training will incorporate multi-image and video data.
|
|
|
|
|
63 |
|
64 |
|
65 |
# Disclaimer
|