benlipkin commited on
Commit
0d7a595
1 Parent(s): 8a0609f

add results table

Browse files
Files changed (1) hide show
  1. README.md +15 -0
README.md CHANGED
@@ -109,6 +109,21 @@ This model is a fine-tuned version of [deepseek-ai/deepseek-math-7b-base](https:
109
  - **License:** Apache 2.0
110
  - **Finetuned from model:** [deepseek-ai/deepseek-math-7b-base](https://huggingface.co/deepseek-ai/deepseek-math-7b-base)
111
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
  ### Model Sources
113
 
114
  <!-- Provide the basic links for the model. -->
 
109
  - **License:** Apache 2.0
110
  - **Finetuned from model:** [deepseek-ai/deepseek-math-7b-base](https://huggingface.co/deepseek-ai/deepseek-math-7b-base)
111
 
112
+ ## Model performance
113
+
114
+ | | | NuminaMath-7B-CoT | NuminaMath-7B-TIR | Qwen2-7B-Instruct | Llama3-8B-Instruct | DeepSeekMath-7B-Instruct | DeepSeekMath-7B-RL | DART-Math-7B-CoT |
115
+ | --- | --- | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
116
+ | **GSM8k** | 0-shot | 76.3% | 84.6% | 82.3% | 79.6% | 82.8% | **88.2%** | 86.6% |
117
+ | Grade school math |
118
+ | **MATH** | 0-shot | 55.8% | **68.1%** | 49.6% | 30.0% | 46.8% | 51.7% | 53.6% |
119
+ | Math problem-solving |
120
+ | **AMC 2023** | 0-shot | 11/40 | **20/40** | 10/40 | 2/40 | 7/40 | 9/40 | 11/40 |
121
+ | Competition-level math | maj@64 | 18/40 | **31/40** | 13/40 | 9/40 | 13/40 | 14/40 | 16/40 |
122
+ | **AIME 2024** | 0-shot | 0/30 | **5/30** | 1/30 | 0/30 | 1/30 | 1/30 | 1/30 |
123
+ | Competition-level math | maj@64 | 1/30 | **10/30** | 4/30 | 2/30 | 1/30 | 1/30 | 1/30 |
124
+
125
+ *Table: Comparison of various 7B and 8B parameter language models on different math benchmarks. All scores except those for NuminaMath-7B-TIR are reported without tool-integrated reasoning.*
126
+
127
  ### Model Sources
128
 
129
  <!-- Provide the basic links for the model. -->