BAAI
/

ldwang commited on
Commit
2ed1c57
1 Parent(s): 2e6bc2e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -7
README.md CHANGED
@@ -10,17 +10,15 @@ library_name: transformers
10
 
11
  # Introduction
12
 
13
- The Aquila-VL-2B model is a vision-language model (VLM) trained based on the [LLava-one-vision](https://llava-vl.github.io/blog/2024-08-05-llava-onevision/) framework. The [Qwen2.5-1.5B-instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) model is chose as the LLM, while [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) is utilized as the vision tower.
14
 
15
  The model was trained on our self-built Infinity-MM dataset, which contains approximately 40 million image-text pairs. This dataset is a combination of open-source data collected from the internet and synthetic instruction data generated using open-source VLM models.
16
 
17
 
18
- We have open-sourced Infinity-MM dataset and related resources. We hope you enjoy using them!
19
 
20
  ## News
21
- - `2024/10/25`: The **Aquila-VL-2B** model and Infinity-MM dataset are now available. We have also released the technical report simultaneously.
22
-
23
- <!-- We plan to open-source the Infinity-MM dataset, training scripts, and related resources in the near future. For more technical details, stay tuned for our upcoming technical report. -->
24
 
25
  # Evaluation
26
 
@@ -62,8 +60,6 @@ For comparison models, evaluations were conducted in a local environment, so the
62
 
63
  * We plan to train models of various sizes.
64
  * Future training will incorporate multi-image and video data.
65
- <!-- * A comprehensive technical report will be released. -->
66
- <!-- * We will open-source the Infinity-MM dataset and training code. -->
67
 
68
 
69
  # Disclaimer
 
10
 
11
  # Introduction
12
 
13
+ The **Aquila-VL-2B** model is a vision-language model (VLM) trained based on the [LLava-one-vision](https://llava-vl.github.io/blog/2024-08-05-llava-onevision/) framework. The [Qwen2.5-1.5B-instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) model is chose as the LLM, while [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) is utilized as the vision tower.
14
 
15
  The model was trained on our self-built Infinity-MM dataset, which contains approximately 40 million image-text pairs. This dataset is a combination of open-source data collected from the internet and synthetic instruction data generated using open-source VLM models.
16
 
17
 
18
+ We have open-sourced [Infinity-MM](https://huggingface.co/datasets/BAAI/Infinity-MM) dataset and related resources. We hope you enjoy using them!
19
 
20
  ## News
21
+ - `2024/10/25`: The [Aquila-VL-2B](https://huggingface.co/BAAI/Aquila-VL-2B-llava-qwen) model and [Infinity-MM](https://huggingface.co/datasets/BAAI/Infinity-MM) dataset are now available. We have also released the [technical report](https://arxiv.org/abs/2410.18558) simultaneously.
 
 
22
 
23
  # Evaluation
24
 
 
60
 
61
  * We plan to train models of various sizes.
62
  * Future training will incorporate multi-image and video data.
 
 
63
 
64
 
65
  # Disclaimer