TIGER-Lab
/

MAmmoTH-7B

@@ -37,17 +37,26 @@ The models are fine-tuned with the MathInstruct dataset using the original Llama
 The models are evaluated using open-ended and multiple-choice math problems from several datasets. Here are the results:
-| Model         	| Size 	| Base       	| GSM8K 	| MATH 	| AQuA 	| NumGLUE 	| IID Avg    	| SVAMP 	| Mathematics 	| SimulEq 	| SAT-Math 	| MMLU-Math 	| OOD Avg    	|
-|-------------------|-------|---------------|-----------|-------|-------|-----------|---------------|-----------|---------------|-----------|-----------|---------------|---------------|
-|               	|      	|            	|       	|      	|      	|         	|            	|       	|             	|         	|          	|           	|            	|
-| MAmmoTH       	| 7B   	| Llama-2    	| 51.7  	| 31.2 	| 42.9 	| 53.1    	| 44.7       	| 66.7  	| 44.8        	| 42      	| 36.4     	| 38.6      	| 45.7       	|
-| MAmmoTH-Coder 	| 7B   	| Code-Llama 	| 58.8  	| 35.2 	| 43   	| 57.1    	| 48.5       	| 71.1  	| 53.9        	| 44.6    	| 40       	| 40.5      	| 50.2       	|
-| MAmmoTH       	| 13B  	| Llama-2    	| 61.7  	| 36   	| 44.8 	| 59.6    	| 50.5       	| 72.4  	| 48.7        	| 40.5    	| 42.7     	| 45.3      	| 49.9       	|
-| MAmmoTH-Coder 	| 13B  	| Code-Llama 	| 64.3  	| 38.6 	| 46.1 	| 54.2    	| 50.8       	| 73.2  	| 60          	| 44.1    	| 40.9     	| 45.2      	| 52.6       	|
-| MAmmoTH-Coder 	| 34B  	| Code-Llama 	| 72.3  	| 46.8 	| 50.8 	| 59.6    	| 57.3       	| 84    	| 64.7        	| 50.6    	| 51.8     	| 50.2      	| 60.3       	|
-| MAmmoTH       	| 70B  	| Llama-2    	| 76.7  	| 44.2 	| 61.4 	| 64.3    	| 61.7       	| 81.7  	| 55.3        	| 45.3    	| 58.6     	| 52.3      	| 58.6       	|
 ## Usage
 You can use the models through Huggingface's Transformers library. Use the pipeline function to create a text-generation pipeline with the model of your choice, then feed in a math problem to get the solution.

 The models are evaluated using open-ended and multiple-choice math problems from several datasets. Here are the results:
+| **Model**             	| **Decoding** 	| **GSM**  	| **MATH** 	| **AQuA** 	| **NumG** 	| **SVA**  	| **Mat**  	| **Sim**  	| **SAT**  	| **MMLU** 	| **AVG**  	|
+|-----------------------|--------------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|
+| **MAmmoTH-7B**        	| CoT          	| 50.5     	| 10.4     	| 43.7     	| 44.0     	| 47.3     	| 9.2      	| 18.9     	| 32.7     	| 39.9     	| 33.0     	|
+|                       	| PoT          	| 51.6     	| 28.7     	| 43.3     	| 52.3     	| 65.1     	| 41.9     	| 48.2     	| 39.1     	| 44.6     	| 46.1     	|
+|                       	| **Hybrid**   	| **53.6** 	| **31.5** 	| **44.5** 	| **61.2** 	| **67.7** 	| **46.3** 	| **41.2** 	| **42.7** 	| **42.6** 	| **47.9** 	|
+| **MAmmoTH-Coder-7B**  	| CoT          	| 22.4     	| 7.9      	| 36.2     	| 36.0     	| 37.0     	| 8.2      	| 7.2      	| 32.7     	| 34.6     	| 24.7     	|
+|                       	| PoT          	| 58.8     	| 32.1     	| 47.2     	| 57.1     	| 71.1     	| 53.9     	| 44.6     	| 40.0     	| 47.8     	| 50.3     	|
+|                       	| **Hybrid**   	| **59.4** 	| **33.4** 	| **47.2** 	| **66.4** 	| **71.4** 	| **55.4** 	| **45.9** 	| **40.5** 	| **48.3** 	| **52.0** 	|
+| **MAmmoTH-13B**       	| CoT          	| 56.3     	| 12.9     	| 45.3     	| 45.6     	| 53.8     	| 11.7     	| 22.4     	| 43.6     	| 42.3     	| 37.1     	|
+|                       	| PoT          	| 61.3     	| 32.6     	| 48.8     	| 59.6     	| 72.2     	| 48.5     	| 40.3     	| 46.8     	| 45.4     	| 50.6     	|
+|                       	| **Hybrid**   	| **62.0** 	| **34.2** 	| **51.6** 	| **68.7** 	| **72.4** 	| **49.2** 	| **43.2** 	| **46.8** 	| **47.6** 	| **52.9** 	|
+| **MAmmoTH-Coder-13B** 	| CoT          	| 32.1     	| 10.2     	| 40.6     	| 36.2     	| 43.0     	| 9.6      	| 10.1     	| 40.9     	| 36.6     	| 28.8     	|
+|                       	| PoT          	| 64.3     	| 35.2     	| 46.8     	| 54.2     	| 73.2     	| 60.0     	| 44.2     	| 48.2     	| 48.2     	| 52.7     	|
+|                       	| **Hybrid**   	| **64.7** 	| **36.3** 	| **46.9** 	| **66.8** 	| **73.7** 	| **61.5** 	| **47.1** 	| **48.6** 	| **48.3** 	| **54.9** 	|
+| **MAmmoTH-Coder-33B** 	| CoT          	| 34.3     	| 11.6     	| 39.0     	| 36.2     	| 44.6     	| 10.8     	| 10.9     	| 46.4     	| 42.9     	| 30.7     	|
+|                       	| PoT          	| 72.3     	| 42.8     	| 53.8     	| 59.6     	| 84.0     	| 64.7     	| 50.6     	| 58.6     	| 52.7     	| 59.9     	|
+|                       	| **Hybrid**   	| **72.7** 	| **43.6** 	| **54.7** 	| **71.6** 	| **84.3** 	| **65.4** 	| **51.8** 	| **60.9** 	| **53.8** 	| **62.1** 	|
+| **MAmmoTH-70B**       	| CoT          	| 72.4     	| 21.1     	| 57.9     	| 58.9     	| 71.6     	| 20.0     	| 31.9     	| 57.3     	| 52.1     	| 49.2     	|
+|                       	| PoT          	| 76.7     	| 40.1     	| 60.2     	| 64.3     	| 81.7     	| 55.3     	| 45.3     	| 64.1     	| 53.5     	| 60.1     	|
+|                       	| **Hybrid**   	| **76.9** 	| **41.8** 	| **65.0** 	| **74.4** 	| **82.4** 	| **55.6** 	| **51.4** 	| **66.4** 	| **56.7** 	| **63.4** 	|
 ## Usage
 You can use the models through Huggingface's Transformers library. Use the pipeline function to create a text-generation pipeline with the model of your choice, then feed in a math problem to get the solution.