Update README.md
Browse files
README.md
CHANGED
@@ -10,9 +10,10 @@ pipeline_tag: image-text-to-text
|
|
10 |
`xGen-MM` is a series of the latest foundational Large Multimodal Models (LMMs) developed by Salesforce AI Research. This series advances upon the successful designs of the `BLIP` series, incorporating fundamental enhancements that ensure a more robust and superior foundation. These models have been trained at scale on high-quality image caption datasets and interleaved image-text data.
|
11 |
|
12 |
In the v1.5 (08/2024) release, we present a series of XGen-MM models including:
|
|
|
|
|
13 |
- [π€ xGen-MM-base](https://huggingface.co/Salesforce/xgen-mm-phi3-mini-base-r-v1.5): `xgen-mm-phi3-mini-base-r-v1.5`
|
14 |
- [π€ xGen-MM-instruct](https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-singleimg-r-v1.5): `xgen-mm-phi3-mini-instruct-singleimg-r-v1.5`
|
15 |
-
- [π€ xGen-MM-instruct-interleave (our main instruct model)](https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-multi-r-v1.5): `xgen-mm-phi3-mini-instruct-interleave-r-v1.5`
|
16 |
- [π€ xGen-MM-instruct-dpo](https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-dpo-r-v1.5): `xgen-mm-phi3-mini-instruct-dpo-r-v1.5`
|
17 |
|
18 |
In addition to the models, our team also released a series of datasets for multi-modal pre-training, including:
|
@@ -77,7 +78,7 @@ The instruct model is fine-tuned on a mixture of around 1 million samples from m
|
|
77 |
|
78 |
# How to use
|
79 |
|
80 |
-
Please check out our [inference notebook](demo.ipynb) for example code to use our model. We also provide example script for [batch inference](batch_inference.ipynb).
|
81 |
|
82 |
# Reproducibility:
|
83 |
|
@@ -95,7 +96,7 @@ We strongly recommend users assess safety and fairness before applying to downst
|
|
95 |
|
96 |
Our code and weights are released under the [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0.txt) license.
|
97 |
|
98 |
-
# Code
|
99 |
Our training code is based on [OpenFlamingo: An open-source framework for training large multimodal models.](https://github.com/mlfoundations/open_flamingo), and part of our data preprocessing code is adapted from [LLaVA](https://github.com/haotian-liu/LLaVA).
|
100 |
The evaluation code for the instruct models is based on [VLMEvalKit: Open-source evaluation toolkit of large vision-language models (LVLMs)](https://github.com/open-compass/VLMEvalKit).
|
101 |
|
|
|
10 |
`xGen-MM` is a series of the latest foundational Large Multimodal Models (LMMs) developed by Salesforce AI Research. This series advances upon the successful designs of the `BLIP` series, incorporating fundamental enhancements that ensure a more robust and superior foundation. These models have been trained at scale on high-quality image caption datasets and interleaved image-text data.
|
11 |
|
12 |
In the v1.5 (08/2024) release, we present a series of XGen-MM models including:
|
13 |
+
- [π€ xGen-MM-instruct-interleave (our main instruct model)](https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-multi-r-v1.5): `xgen-mm-phi3-mini-instruct-interleave-r-v1.5`
|
14 |
+
- This model has higher overall scores than [xGen-MM-instruct](https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-singleimg-r-v1.5) on both single-image and multi-image benchmarks.
|
15 |
- [π€ xGen-MM-base](https://huggingface.co/Salesforce/xgen-mm-phi3-mini-base-r-v1.5): `xgen-mm-phi3-mini-base-r-v1.5`
|
16 |
- [π€ xGen-MM-instruct](https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-singleimg-r-v1.5): `xgen-mm-phi3-mini-instruct-singleimg-r-v1.5`
|
|
|
17 |
- [π€ xGen-MM-instruct-dpo](https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-dpo-r-v1.5): `xgen-mm-phi3-mini-instruct-dpo-r-v1.5`
|
18 |
|
19 |
In addition to the models, our team also released a series of datasets for multi-modal pre-training, including:
|
|
|
78 |
|
79 |
# How to use
|
80 |
|
81 |
+
Please check out our [inference notebook](demo.ipynb) for example code to use our model. We also provide an example script for [batch inference](batch_inference.ipynb).
|
82 |
|
83 |
# Reproducibility:
|
84 |
|
|
|
96 |
|
97 |
Our code and weights are released under the [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0.txt) license.
|
98 |
|
99 |
+
# Code acknowledgment
|
100 |
Our training code is based on [OpenFlamingo: An open-source framework for training large multimodal models.](https://github.com/mlfoundations/open_flamingo), and part of our data preprocessing code is adapted from [LLaVA](https://github.com/haotian-liu/LLaVA).
|
101 |
The evaluation code for the instruct models is based on [VLMEvalKit: Open-source evaluation toolkit of large vision-language models (LVLMs)](https://github.com/open-compass/VLMEvalKit).
|
102 |
|