BAAI
/

Aquila-VL-2B-llava-qwen

Image-Text-to-Text

text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ldwang commited on 8 days ago

Commit

2ed1c57

•

1 Parent(s): 2e6bc2e

Update README.md

Files changed (1) hide show

README.md +3 -7

README.md CHANGED Viewed

@@ -10,17 +10,15 @@ library_name: transformers
 # Introduction
-The Aquila-VL-2B model is a vision-language model (VLM) trained based on the [LLava-one-vision](https://llava-vl.github.io/blog/2024-08-05-llava-onevision/) framework. The [Qwen2.5-1.5B-instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) model is chose as the LLM, while [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) is utilized as the vision tower.
 The model was trained on our self-built Infinity-MM dataset, which contains approximately 40 million image-text pairs. This dataset is a combination of open-source data collected from the internet and synthetic instruction data generated using open-source VLM models.
-We have open-sourced Infinity-MM dataset and related resources. We hope you enjoy using them!
 ## News
-- `2024/10/25`:  The **Aquila-VL-2B** model and Infinity-MM dataset are now available.  We have also released the technical report simultaneously.
-<!-- We plan to open-source the Infinity-MM dataset, training scripts, and related resources in the near future. For more technical details, stay tuned for our upcoming technical report. -->
 # Evaluation
@@ -62,8 +60,6 @@ For comparison models, evaluations were conducted in a local environment, so the
 * We plan to train models of various sizes.
 * Future training will incorporate multi-image and video data.
-<!-- * A comprehensive technical report will be released. -->
-<!-- * We will open-source the Infinity-MM dataset and training code. -->
 # Disclaimer

 # Introduction
+The **Aquila-VL-2B** model is a vision-language model (VLM) trained based on the [LLava-one-vision](https://llava-vl.github.io/blog/2024-08-05-llava-onevision/) framework. The [Qwen2.5-1.5B-instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) model is chose as the LLM, while [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) is utilized as the vision tower.
 The model was trained on our self-built Infinity-MM dataset, which contains approximately 40 million image-text pairs. This dataset is a combination of open-source data collected from the internet and synthetic instruction data generated using open-source VLM models.
+We have open-sourced [Infinity-MM](https://huggingface.co/datasets/BAAI/Infinity-MM) dataset and related resources. We hope you enjoy using them!
 ## News
+- `2024/10/25`:  The [Aquila-VL-2B](https://huggingface.co/BAAI/Aquila-VL-2B-llava-qwen) model and [Infinity-MM](https://huggingface.co/datasets/BAAI/Infinity-MM) dataset are now available.  We have also released the [technical report](https://arxiv.org/abs/2410.18558) simultaneously.
 # Evaluation
 * We plan to train models of various sizes.
 * Future training will incorporate multi-image and video data.
 # Disclaimer