fromplutowithlove
commited on
Commit
•
996b6a9
1
Parent(s):
ee47981
Fix spelling error in README
Browse files
README.md
CHANGED
@@ -59,7 +59,7 @@ Below are the comparison results on existing multi-image benchmarks. On average,
|
|
59 |
|
60 |
**BLINK**: a benchmark with 14 visual tasks that humans can solve very quickly but are still hard for current multimodal LLMs.
|
61 |
|
62 |
-
| Benchmark | Phi-3.5-vision-
|
63 |
|--|--|--|--|--|--|--|--|--|--|
|
64 |
| Art Style | 87.2 | 62.4 | 55.6 | 52.1 | 64.1 | 70.1 | 59.8 | 70.9 | 73.3 |
|
65 |
| Counting | 54.2 | 56.7 | 54.2 | 66.7 | 51.7 | 55.0 | 59.2 | 65.0 | 65.0 |
|
@@ -79,7 +79,7 @@ Below are the comparison results on existing multi-image benchmarks. On average,
|
|
79 |
|
80 |
**Video-MME**: comprehensively assess the capabilities of MLLMs in processing video data, covering a wide range of visual domains, temporal durations, and data modalities.
|
81 |
|
82 |
-
| Benchmark | Phi-3.5-vision-
|
83 |
|--|--|--|--|--|--|--|--|--|--|
|
84 |
| short (<2min) | 60.8 | 62.3 | 60.7 | 61.7 | 72.2 | 70.1 | 66.3 | 73.3 | 77.7 |
|
85 |
| medium (4-15min) | 47.7 | 47.1 | 46.4 | 49.6 | 62.7 | 59.6 | 54.7 | 61.2 | 68.0 |
|
|
|
59 |
|
60 |
**BLINK**: a benchmark with 14 visual tasks that humans can solve very quickly but are still hard for current multimodal LLMs.
|
61 |
|
62 |
+
| Benchmark | Phi-3.5-vision-instruct | LlaVA-Interleave-Qwen-7B | InternVL-2-4B | InternVL-2-8B | Gemini-1.5-Flash | GPT-4o-mini | Claude-3.5-Sonnet | Gemini-1.5-Pro | GPT-4o |
|
63 |
|--|--|--|--|--|--|--|--|--|--|
|
64 |
| Art Style | 87.2 | 62.4 | 55.6 | 52.1 | 64.1 | 70.1 | 59.8 | 70.9 | 73.3 |
|
65 |
| Counting | 54.2 | 56.7 | 54.2 | 66.7 | 51.7 | 55.0 | 59.2 | 65.0 | 65.0 |
|
|
|
79 |
|
80 |
**Video-MME**: comprehensively assess the capabilities of MLLMs in processing video data, covering a wide range of visual domains, temporal durations, and data modalities.
|
81 |
|
82 |
+
| Benchmark | Phi-3.5-vision-instruct | LlaVA-Interleave-Qwen-7B | InternVL-2-4B | InternVL-2-8B | Gemini-1.5-Flash | GPT-4o-mini | Claude-3.5-Sonnet | Gemini-1.5-Pro | GPT-4o |
|
83 |
|--|--|--|--|--|--|--|--|--|--|
|
84 |
| short (<2min) | 60.8 | 62.3 | 60.7 | 61.7 | 72.2 | 70.1 | 66.3 | 73.3 | 77.7 |
|
85 |
| medium (4-15min) | 47.7 | 47.1 | 46.4 | 49.6 | 62.7 | 59.6 | 54.7 | 61.2 | 68.0 |
|