Salesforce
/

xgen-mm-phi3-mini-base-r-v1

Image-Text-to-Text

feature-extraction

Model card Files Files and versions Community

xurantju commited on May 23

Commit

78f61d3

•

1 Parent(s): 48ec795

update readme

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -16,15 +16,15 @@ These models have been trained at scale on high-quality image caption datasets a
 * The **instruct** fine-tuned model, `xgen-mm-phi3-mini-instruct-r-v1`, achieves state-of-the-art performance among open-source and closed-source VLMs under 5b parameters.
 * `xgen-mm-phi3-mini-instruct-r-v1` supports flexible high-resolution image encoding with efficient visual token sampling.
-More technical details will come with a technical report soon.
 # Datasets
 | Dataset Type| Dataset(s) Used                          |
 |--------|------------------------------------------|
-| Pretrain | caption data: (datacomp, cc12m, cc3m, SBU, vg) && interleaved data: obelics |
-| Instruction Tuning    | LLaVA-Instruct-150K, ShareGPT4V captions, a mixture of academic VQA data including OCR/Document/Chart-focused tasks, publicly available text-only instruction data |
 # Results

 * The **instruct** fine-tuned model, `xgen-mm-phi3-mini-instruct-r-v1`, achieves state-of-the-art performance among open-source and closed-source VLMs under 5b parameters.
 * `xgen-mm-phi3-mini-instruct-r-v1` supports flexible high-resolution image encoding with efficient visual token sampling.
+The model is for research purposes, more technical details will come with a technical report soon.
 # Datasets
 | Dataset Type| Dataset(s) Used                          |
 |--------|------------------------------------------|
+| Pretrain | caption data: high-quality image caption datasets and interleaved datasets |
+| Instruction Tuning    | visual instruction following and caption datasets, a mixture of academic VQA data including OCR/Document/Chart-focused tasks, publicly available text-only instruction data |
 # Results