yuexiang96
commited on
Commit
•
b188bd1
1
Parent(s):
a177faf
Update README.md
Browse files
README.md
CHANGED
@@ -37,17 +37,26 @@ The models are fine-tuned with the MathInstruct dataset using the original Llama
|
|
37 |
The models are evaluated using open-ended and multiple-choice math problems from several datasets. Here are the results:
|
38 |
|
39 |
|
40 |
-
| Model
|
41 |
-
|
42 |
-
|
|
43 |
-
|
|
44 |
-
|
|
45 |
-
| MAmmoTH
|
46 |
-
|
|
47 |
-
|
|
48 |
-
| MAmmoTH |
|
49 |
-
|
50 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
|
52 |
## Usage
|
53 |
You can use the models through Huggingface's Transformers library. Use the pipeline function to create a text-generation pipeline with the model of your choice, then feed in a math problem to get the solution.
|
|
|
37 |
The models are evaluated using open-ended and multiple-choice math problems from several datasets. Here are the results:
|
38 |
|
39 |
|
40 |
+
| **Model** | **Decoding** | **GSM** | **MATH** | **AQuA** | **NumG** | **SVA** | **Mat** | **Sim** | **SAT** | **MMLU** | **AVG** |
|
41 |
+
|-----------------------|--------------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|
|
42 |
+
| **MAmmoTH-7B** | CoT | 50.5 | 10.4 | 43.7 | 44.0 | 47.3 | 9.2 | 18.9 | 32.7 | 39.9 | 33.0 |
|
43 |
+
| | PoT | 51.6 | 28.7 | 43.3 | 52.3 | 65.1 | 41.9 | 48.2 | 39.1 | 44.6 | 46.1 |
|
44 |
+
| | **Hybrid** | **53.6** | **31.5** | **44.5** | **61.2** | **67.7** | **46.3** | **41.2** | **42.7** | **42.6** | **47.9** |
|
45 |
+
| **MAmmoTH-Coder-7B** | CoT | 22.4 | 7.9 | 36.2 | 36.0 | 37.0 | 8.2 | 7.2 | 32.7 | 34.6 | 24.7 |
|
46 |
+
| | PoT | 58.8 | 32.1 | 47.2 | 57.1 | 71.1 | 53.9 | 44.6 | 40.0 | 47.8 | 50.3 |
|
47 |
+
| | **Hybrid** | **59.4** | **33.4** | **47.2** | **66.4** | **71.4** | **55.4** | **45.9** | **40.5** | **48.3** | **52.0** |
|
48 |
+
| **MAmmoTH-13B** | CoT | 56.3 | 12.9 | 45.3 | 45.6 | 53.8 | 11.7 | 22.4 | 43.6 | 42.3 | 37.1 |
|
49 |
+
| | PoT | 61.3 | 32.6 | 48.8 | 59.6 | 72.2 | 48.5 | 40.3 | 46.8 | 45.4 | 50.6 |
|
50 |
+
| | **Hybrid** | **62.0** | **34.2** | **51.6** | **68.7** | **72.4** | **49.2** | **43.2** | **46.8** | **47.6** | **52.9** |
|
51 |
+
| **MAmmoTH-Coder-13B** | CoT | 32.1 | 10.2 | 40.6 | 36.2 | 43.0 | 9.6 | 10.1 | 40.9 | 36.6 | 28.8 |
|
52 |
+
| | PoT | 64.3 | 35.2 | 46.8 | 54.2 | 73.2 | 60.0 | 44.2 | 48.2 | 48.2 | 52.7 |
|
53 |
+
| | **Hybrid** | **64.7** | **36.3** | **46.9** | **66.8** | **73.7** | **61.5** | **47.1** | **48.6** | **48.3** | **54.9** |
|
54 |
+
| **MAmmoTH-Coder-33B** | CoT | 34.3 | 11.6 | 39.0 | 36.2 | 44.6 | 10.8 | 10.9 | 46.4 | 42.9 | 30.7 |
|
55 |
+
| | PoT | 72.3 | 42.8 | 53.8 | 59.6 | 84.0 | 64.7 | 50.6 | 58.6 | 52.7 | 59.9 |
|
56 |
+
| | **Hybrid** | **72.7** | **43.6** | **54.7** | **71.6** | **84.3** | **65.4** | **51.8** | **60.9** | **53.8** | **62.1** |
|
57 |
+
| **MAmmoTH-70B** | CoT | 72.4 | 21.1 | 57.9 | 58.9 | 71.6 | 20.0 | 31.9 | 57.3 | 52.1 | 49.2 |
|
58 |
+
| | PoT | 76.7 | 40.1 | 60.2 | 64.3 | 81.7 | 55.3 | 45.3 | 64.1 | 53.5 | 60.1 |
|
59 |
+
| | **Hybrid** | **76.9** | **41.8** | **65.0** | **74.4** | **82.4** | **55.6** | **51.4** | **66.4** | **56.7** | **63.4** |
|
60 |
|
61 |
## Usage
|
62 |
You can use the models through Huggingface's Transformers library. Use the pipeline function to create a text-generation pipeline with the model of your choice, then feed in a math problem to get the solution.
|