add results table
#1
by
benlipkin
- opened
README.md
CHANGED
@@ -37,6 +37,21 @@ NuminaMath 7B CoT is the model from Stage 1 and was fine-tuned on [AI-MO/NuminaM
|
|
37 |
- **License:** Apache 2.0
|
38 |
- **Finetuned from model:** [deepseek-ai/deepseek-math-7b-base](https://huggingface.co/deepseek-ai/deepseek-math-7b-base)
|
39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
### Model Sources
|
41 |
|
42 |
<!-- Provide the basic links for the model. -->
|
|
|
37 |
- **License:** Apache 2.0
|
38 |
- **Finetuned from model:** [deepseek-ai/deepseek-math-7b-base](https://huggingface.co/deepseek-ai/deepseek-math-7b-base)
|
39 |
|
40 |
+
## Model performance
|
41 |
+
|
42 |
+
| | | NuminaMath-7B-CoT | NuminaMath-7B-TIR | Qwen2-7B-Instruct | Llama3-8B-Instruct | DeepSeekMath-7B-Instruct | DeepSeekMath-7B-RL | DART-Math-7B-CoT |
|
43 |
+
| --- | --- | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
44 |
+
| **GSM8k** | 0-shot | 76.3% | 84.6% | 82.3% | 79.6% | 82.8% | **88.2%** | 86.6% |
|
45 |
+
| Grade school math |
|
46 |
+
| **MATH** | 0-shot | 55.8% | **68.1%** | 49.6% | 30.0% | 46.8% | 51.7% | 53.6% |
|
47 |
+
| Math problem-solving |
|
48 |
+
| **AMC 2023** | 0-shot | 11/40 | **20/40** | 10/40 | 2/40 | 7/40 | 9/40 | 11/40 |
|
49 |
+
| Competition-level math | maj@64 | 18/40 | **31/40** | 13/40 | 9/40 | 13/40 | 14/40 | 16/40 |
|
50 |
+
| **AIME 2024** | 0-shot | 0/30 | **5/30** | 1/30 | 0/30 | 1/30 | 1/30 | 1/30 |
|
51 |
+
| Competition-level math | maj@64 | 1/30 | **10/30** | 4/30 | 2/30 | 1/30 | 1/30 | 1/30 |
|
52 |
+
|
53 |
+
*Table: Comparison of various 7B and 8B parameter language models on different math benchmarks. All scores except those for NuminaMath-7B-TIR are reported without tool-integrated reasoning.*
|
54 |
+
|
55 |
### Model Sources
|
56 |
|
57 |
<!-- Provide the basic links for the model. -->
|