TIGER-Lab
/

Mantis-8B-Idefics2

Image-Text-to-Text

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

DongfuJiang commited on May 23

Commit

25b996a

•

1 Parent(s): 81bec39

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -29,7 +29,7 @@ It's fine-tuned on [Mantis-Instruct](https://huggingface.co/datasets/TIGER-Lab/M
 ## Summary
-- Mantis-Idefics2 is LMM with **interleaved text and image as inputs**, trained on Mantis-Instruct under academic-level resources (i.e. 36 hours on 16xA100-40G).
 - Mantis is trained to have multi-image skills including co-reference, reasoning, comparing, temporal understanding.
 - Mantis reaches the state-of-the-art performance on five multi-image benchmarks (NLVR2, Q-Bench, BLINK, MVBench, Mantis-Eval), and also maintain a strong single-image performance on par with CogVLM and Emu2.

 ## Summary
+- Mantis-Idefics2 is an LMM with **interleaved text and image as inputs**, trained on Mantis-Instruct under academic-level resources (i.e. 36 hours on 16xA100-40G).
 - Mantis is trained to have multi-image skills including co-reference, reasoning, comparing, temporal understanding.
 - Mantis reaches the state-of-the-art performance on five multi-image benchmarks (NLVR2, Q-Bench, BLINK, MVBench, Mantis-Eval), and also maintain a strong single-image performance on par with CogVLM and Emu2.