teknium commited on
Commit
bcad6ff
1 Parent(s): 8412b96

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -73,7 +73,9 @@ GPT-4All Benchmark Set
73
  |piqa | 0|acc |0.7922|± |0.0095|
74
  | | |acc_norm|0.8112|± |0.0091|
75
  |winogrande | 0|acc |0.7293|± |0.0125|
76
- ```
 
 
77
  AGI-Eval
78
  ```
79
  | Task |Version| Metric |Value | |Stderr|
@@ -94,6 +96,7 @@ AGI-Eval
94
  | | |acc_norm|0.4029|± |0.0343|
95
  |agieval_sat_math | 0|acc |0.3273|± |0.0317|
96
  | | |acc_norm|0.2636|± |0.0298|
 
97
  ```
98
  BigBench Reasoning Test
99
  ```
@@ -118,6 +121,7 @@ BigBench Reasoning Test
118
  |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2048|± |0.0114|
119
  |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1297|± |0.0080|
120
  |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.4500|± |0.0288|
 
121
  ```
122
 
123
  This is a slight improvement on GPT4ALL Suite and BigBench Suite, with a degredation in AGIEval compared to the original hermes.
 
73
  |piqa | 0|acc |0.7922|± |0.0095|
74
  | | |acc_norm|0.8112|± |0.0091|
75
  |winogrande | 0|acc |0.7293|± |0.0125|
76
+ Average: 0.7036
77
+ ```
78
+
79
  AGI-Eval
80
  ```
81
  | Task |Version| Metric |Value | |Stderr|
 
96
  | | |acc_norm|0.4029|± |0.0343|
97
  |agieval_sat_math | 0|acc |0.3273|± |0.0317|
98
  | | |acc_norm|0.2636|± |0.0298|
99
+ Average: 0.3556
100
  ```
101
  BigBench Reasoning Test
102
  ```
 
121
  |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2048|± |0.0114|
122
  |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1297|± |0.0080|
123
  |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.4500|± |0.0288|
124
+ Average: 36.75
125
  ```
126
 
127
  This is a slight improvement on GPT4ALL Suite and BigBench Suite, with a degredation in AGIEval compared to the original hermes.