[email protected]
commited on
Commit
•
732e074
1
Parent(s):
9cf8c9e
Add text only results
Browse files
README.md
CHANGED
@@ -34,6 +34,8 @@ We provide the results from both the Huggingface codebase and the Megatron codeb
|
|
34 |
|
35 |
Results (as of September 17th, 2024) in the multimodal benchmarks are as follows:
|
36 |
|
|
|
|
|
37 |
| Benchmark | MMMU (val / test) | MathVista | OCRBench | AI2D | ChartQA | DocVQA | TextVQA | RealWorldQA | VQAv2 |
|
38 |
|------------------------------|-------------------|-----------|----------|------|---------|--------|---------|-------------|-------|
|
39 |
| NVLM-D 1.0 72B (Huggingface) | 58.7 / 54.9 | 65.2 | 852 | 94.2 | 86.0 | 92.6 | 82.6 | 69.5 | 85.4 |
|
@@ -47,6 +49,28 @@ Results (as of September 17th, 2024) in the multimodal benchmarks are as follows
|
|
47 |
| Claude 3.5 Sonnet | 68.3 / - | 67.7 | 788 | 94.7 | 90.8 | 95.2 | - | - | - |
|
48 |
| Gemini 1.5 Pro (Aug 2024) | 62.2 / - | 63.9 | 754 | 94.4 | 87.2 | 93.1 | 78.7 | 70.4 | 80.2 |
|
49 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
50 |
|
51 |
|
52 |
## How to use
|
|
|
34 |
|
35 |
Results (as of September 17th, 2024) in the multimodal benchmarks are as follows:
|
36 |
|
37 |
+
### Vision-language Benchmarks
|
38 |
+
|
39 |
| Benchmark | MMMU (val / test) | MathVista | OCRBench | AI2D | ChartQA | DocVQA | TextVQA | RealWorldQA | VQAv2 |
|
40 |
|------------------------------|-------------------|-----------|----------|------|---------|--------|---------|-------------|-------|
|
41 |
| NVLM-D 1.0 72B (Huggingface) | 58.7 / 54.9 | 65.2 | 852 | 94.2 | 86.0 | 92.6 | 82.6 | 69.5 | 85.4 |
|
|
|
49 |
| Claude 3.5 Sonnet | 68.3 / - | 67.7 | 788 | 94.7 | 90.8 | 95.2 | - | - | - |
|
50 |
| Gemini 1.5 Pro (Aug 2024) | 62.2 / - | 63.9 | 754 | 94.4 | 87.2 | 93.1 | 78.7 | 70.4 | 80.2 |
|
51 |
|
52 |
+
### Text-only Benchmarks
|
53 |
+
|
54 |
+
| Tasks | Backbone LLM | MMLU | GSM8K | MATH | HumanEval | Avg. Accuracy |
|
55 |
+
|------------------------------|--------------|------|-------|------|-----------|------------------|
|
56 |
+
| **Proprietary** | | | | | | |
|
57 |
+
| GPT-4.0 | N/A | 88.7 | - | 76.6 | 90.2 | - |
|
58 |
+
| Gemini Pro 1.5 (Aug 2024) | N/A | 85.9 | 90.8 | 67.7 | 84.1 | 82.1 |
|
59 |
+
| Claude 3.5 Sonnet | N/A | 88.7 | 96.4 | 71.1 | 92.0 | 87.0 |
|
60 |
+
| **Open LLM** | | | | | | |
|
61 |
+
| (a) Nous-Hermes-2-Yi-34B | N/A | 75.5 | 78.6 | 21.8 | 43.3 | 54.8 |
|
62 |
+
| (b) Qwen-72B-Instruct | N/A | 82.3 | 91.1 | 59.7 | 86.0 | 79.8 |
|
63 |
+
| (c) Llama-3-70B-Instruct | N/A | 82.0 | 93.0 | 51.0 | 81.7 | 76.6 |
|
64 |
+
| (d) Llama-3.1-70B-Instruct | N/A | 83.6 | 95.1 | 68.0 | 80.5 | 81.8 |
|
65 |
+
| (e) Llama-3.1-405B-Instruct | N/A | 87.3 | 96.8 | 73.8 | 89.0 | 86.7 |
|
66 |
+
| **Open Multimodal LLM** | | | | | | |
|
67 |
+
| VILA-1.5 40B | (a) | 73.3 | 67.5 | 16.8 | 34.1 | 🥶 47.9 (-6.9) |
|
68 |
+
| LLaVA-OneVision 72B | (b) | 80.6 | 89.9 | 49.2 | 74.4 | 🥶 73.5 (-6.3) |
|
69 |
+
| InternVL-2-Llama3-76B | (c) | 78.5 | 87.1 | 42.5 | 71.3 | 🥶 69.9 (-6.7) |
|
70 |
+
| *Llama 3-V 70B | (d) | 83.6 | 95.1 | 68.0 | 80.5 | 🙂 81.8 (0) |
|
71 |
+
| *Llama 3-V 405B | (e) | 87.3 | 96.8 | 73.8 | 89.0 | 🙂 86.7 (0) |
|
72 |
+
| NVLM-D 1.0 72B (Megatron) | (b) | 82.0 | 92.9 | 73.1 | 88.4 | 🥳 84.1 (+4.3) |
|
73 |
+
| NVLM-D 1.0 72B (Huggingface) | (b) | 81.7 | 93.2 | 73.1 | 89.0 | 🥳 84.3 (+4.5) |
|
74 |
|
75 |
|
76 |
## How to use
|