teknium
/

OpenHermes-13B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

teknium commited on Sep 24, 2023

Commit

bcad6ff

•

1 Parent(s): 8412b96

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -73,7 +73,9 @@ GPT-4All Benchmark Set
 |piqa         |      0|acc     |0.7922|±  |0.0095|
 |             |       |acc_norm|0.8112|±  |0.0091|
 |winogrande   |      0|acc     |0.7293|±  |0.0125|
-```
 AGI-Eval
 ```
 |             Task             |Version| Metric |Value |   |Stderr|
@@ -94,6 +96,7 @@ AGI-Eval
 |                              |       |acc_norm|0.4029|±  |0.0343|
 |agieval_sat_math              |      0|acc     |0.3273|±  |0.0317|
 |                              |       |acc_norm|0.2636|±  |0.0298|
 ```
 BigBench Reasoning Test
 ```
@@ -118,6 +121,7 @@ BigBench Reasoning Test
 |bigbench_tracking_shuffled_objects_five_objects |      0|multiple_choice_grade|0.2048|±  |0.0114|
 |bigbench_tracking_shuffled_objects_seven_objects|      0|multiple_choice_grade|0.1297|±  |0.0080|
 |bigbench_tracking_shuffled_objects_three_objects|      0|multiple_choice_grade|0.4500|±  |0.0288|
 ```
 This is a slight improvement on GPT4ALL Suite and BigBench Suite, with a degredation in AGIEval compared to the original hermes.

 |piqa         |      0|acc     |0.7922|±  |0.0095|
 |             |       |acc_norm|0.8112|±  |0.0091|
 |winogrande   |      0|acc     |0.7293|±  |0.0125|
+Average: 0.7036
+```
 AGI-Eval
 ```
 |             Task             |Version| Metric |Value |   |Stderr|
 |                              |       |acc_norm|0.4029|±  |0.0343|
 |agieval_sat_math              |      0|acc     |0.3273|±  |0.0317|
 |                              |       |acc_norm|0.2636|±  |0.0298|
+Average: 0.3556
 ```
 BigBench Reasoning Test
 ```
 |bigbench_tracking_shuffled_objects_five_objects |      0|multiple_choice_grade|0.2048|±  |0.0114|
 |bigbench_tracking_shuffled_objects_seven_objects|      0|multiple_choice_grade|0.1297|±  |0.0080|
 |bigbench_tracking_shuffled_objects_three_objects|      0|multiple_choice_grade|0.4500|±  |0.0288|
+Average: 36.75
 ```
 This is a slight improvement on GPT4ALL Suite and BigBench Suite, with a degredation in AGIEval compared to the original hermes.