Xkev
/

Llama-3.2V-11B-cot

Image-Text-to-Text

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Xkev commited on 9 days ago

Commit

830021d

•

1 Parent(s): d0a4395

Update README.md

Files changed (1) hide show

README.md +7 -1

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ pipeline_tag: visual-question-answering
 <!-- Provide a quick summary of what the model is/does. -->
-This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
 ## Model Details
@@ -19,6 +19,12 @@ This modelcard aims to be a base template for new models. It has been generated
 - **License:** apache-2.0
 - **Finetuned from model:** meta-llama/Llama-3.2-11B-Vision-Instruct
 ## Reproduction
 <!-- This section describes the evaluation protocols and provides the results. -->

 <!-- Provide a quick summary of what the model is/does. -->
+Llama-3.2V-11B-cot is an early version of [LLaVA-o1](https://github.com/PKU-YuanGroup/LLaVA-o1), which is a visual language model capable of spontaneous, systematic reasoning.
 ## Model Details
 - **License:** apache-2.0
 - **Finetuned from model:** meta-llama/Llama-3.2-11B-Vision-Instruct
+## Benchmark Results
+| MMStar | MMBench | MMVet | MathVista | AI2D | Hallusion | Average |
+|--------|---------|-------|-----------|------|-----------|---------|
+| 57.6   | 75.0    | 60.3  | 54.8      | 85.7 | 47.8      | 63.5    |
 ## Reproduction
 <!-- This section describes the evaluation protocols and provides the results. -->