nvidia
/

NVLM-D-72B

@@ -34,6 +34,8 @@ We provide the results from both the Huggingface codebase and the Megatron codeb
 Results (as of September 17th, 2024) in the multimodal benchmarks are as follows:
 | Benchmark                    | MMMU (val / test) | MathVista | OCRBench | AI2D | ChartQA | DocVQA | TextVQA | RealWorldQA | VQAv2 |
 |------------------------------|-------------------|-----------|----------|------|---------|--------|---------|-------------|-------|
 | NVLM-D 1.0 72B (Huggingface) | 58.7 / 54.9       | 65.2      | 852      | 94.2 | 86.0    | 92.6   | 82.6    | 69.5        | 85.4  |
@@ -47,6 +49,28 @@ Results (as of September 17th, 2024) in the multimodal benchmarks are as follows
 | Claude 3.5 Sonnet            | 68.3 / -          | 67.7      | 788      | 94.7 | 90.8    | 95.2   | -       | -           | -     |
 | Gemini 1.5 Pro (Aug 2024)    | 62.2 / -          | 63.9      | 754      | 94.4 | 87.2    | 93.1   | 78.7    | 70.4        | 80.2  |
 ## How to use

 Results (as of September 17th, 2024) in the multimodal benchmarks are as follows:
+### Vision-language Benchmarks
 | Benchmark                    | MMMU (val / test) | MathVista | OCRBench | AI2D | ChartQA | DocVQA | TextVQA | RealWorldQA | VQAv2 |
 |------------------------------|-------------------|-----------|----------|------|---------|--------|---------|-------------|-------|
 | NVLM-D 1.0 72B (Huggingface) | 58.7 / 54.9       | 65.2      | 852      | 94.2 | 86.0    | 92.6   | 82.6    | 69.5        | 85.4  |
 | Claude 3.5 Sonnet            | 68.3 / -          | 67.7      | 788      | 94.7 | 90.8    | 95.2   | -       | -           | -     |
 | Gemini 1.5 Pro (Aug 2024)    | 62.2 / -          | 63.9      | 754      | 94.4 | 87.2    | 93.1   | 78.7    | 70.4        | 80.2  |
+### Text-only Benchmarks
+| Tasks                        | Backbone LLM | MMLU | GSM8K | MATH | HumanEval | Avg. Accuracy    |
+|------------------------------|--------------|------|-------|------|-----------|------------------|
+| **Proprietary**              |              |      |       |      |           |                  |
+| GPT-4.0                      | N/A          | 88.7 | -     | 76.6 | 90.2      | -                |
+| Gemini Pro 1.5 (Aug 2024)    | N/A          | 85.9 | 90.8  | 67.7 | 84.1      | 82.1             |
+| Claude 3.5 Sonnet            | N/A          | 88.7 | 96.4  | 71.1 | 92.0      | 87.0             |
+| **Open LLM**                 |              |      |       |      |           |                  |
+| (a) Nous-Hermes-2-Yi-34B     | N/A          | 75.5 | 78.6  | 21.8 | 43.3      | 54.8             |
+| (b) Qwen-72B-Instruct        | N/A          | 82.3 | 91.1  | 59.7 | 86.0      | 79.8             |
+| (c) Llama-3-70B-Instruct     | N/A          | 82.0 | 93.0  | 51.0 | 81.7      | 76.6             |
+| (d) Llama-3.1-70B-Instruct   | N/A          | 83.6 | 95.1  | 68.0 | 80.5      | 81.8             |
+| (e) Llama-3.1-405B-Instruct  | N/A          | 87.3 | 96.8  | 73.8 | 89.0      | 86.7             |
+| **Open Multimodal LLM**      |              |      |       |      |           |                  |
+| VILA-1.5 40B                 | (a)          | 73.3 | 67.5  | 16.8 | 34.1      | 🥶 47.9   (-6.9) |
+| LLaVA-OneVision 72B          | (b)          | 80.6 | 89.9  | 49.2 | 74.4      | 🥶 73.5   (-6.3) |
+| InternVL-2-Llama3-76B        | (c)          | 78.5 | 87.1  | 42.5 | 71.3      | 🥶 69.9   (-6.7) |
+| *Llama 3-V 70B               | (d)          | 83.6 | 95.1  | 68.0 | 80.5      | 🙂 81.8   (0)    |
+| *Llama 3-V 405B              | (e)          | 87.3 | 96.8  | 73.8 | 89.0      | 🙂 86.7   (0)    |
+| NVLM-D 1.0 72B (Megatron)    | (b)          | 82.0 | 92.9  | 73.1 | 88.4      | 🥳 84.1   (+4.3) |
+| NVLM-D 1.0 72B (Huggingface) | (b)          | 81.7 | 93.2  | 73.1 | 89.0      | 🥳 84.3   (+4.5) |
 ## How to use