Update README.md
Browse files
README.md
CHANGED
@@ -73,7 +73,9 @@ GPT-4All Benchmark Set
|
|
73 |
|piqa | 0|acc |0.7922|± |0.0095|
|
74 |
| | |acc_norm|0.8112|± |0.0091|
|
75 |
|winogrande | 0|acc |0.7293|± |0.0125|
|
76 |
-
|
|
|
|
|
77 |
AGI-Eval
|
78 |
```
|
79 |
| Task |Version| Metric |Value | |Stderr|
|
@@ -94,6 +96,7 @@ AGI-Eval
|
|
94 |
| | |acc_norm|0.4029|± |0.0343|
|
95 |
|agieval_sat_math | 0|acc |0.3273|± |0.0317|
|
96 |
| | |acc_norm|0.2636|± |0.0298|
|
|
|
97 |
```
|
98 |
BigBench Reasoning Test
|
99 |
```
|
@@ -118,6 +121,7 @@ BigBench Reasoning Test
|
|
118 |
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2048|± |0.0114|
|
119 |
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1297|± |0.0080|
|
120 |
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.4500|± |0.0288|
|
|
|
121 |
```
|
122 |
|
123 |
This is a slight improvement on GPT4ALL Suite and BigBench Suite, with a degredation in AGIEval compared to the original hermes.
|
|
|
73 |
|piqa | 0|acc |0.7922|± |0.0095|
|
74 |
| | |acc_norm|0.8112|± |0.0091|
|
75 |
|winogrande | 0|acc |0.7293|± |0.0125|
|
76 |
+
Average: 0.7036
|
77 |
+
```
|
78 |
+
|
79 |
AGI-Eval
|
80 |
```
|
81 |
| Task |Version| Metric |Value | |Stderr|
|
|
|
96 |
| | |acc_norm|0.4029|± |0.0343|
|
97 |
|agieval_sat_math | 0|acc |0.3273|± |0.0317|
|
98 |
| | |acc_norm|0.2636|± |0.0298|
|
99 |
+
Average: 0.3556
|
100 |
```
|
101 |
BigBench Reasoning Test
|
102 |
```
|
|
|
121 |
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2048|± |0.0114|
|
122 |
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1297|± |0.0080|
|
123 |
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.4500|± |0.0288|
|
124 |
+
Average: 36.75
|
125 |
```
|
126 |
|
127 |
This is a slight improvement on GPT4ALL Suite and BigBench Suite, with a degredation in AGIEval compared to the original hermes.
|