teknium
/

OpenHermes-7B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

teknium commited on Sep 24, 2023

Commit

59ea029

•

1 Parent(s): 9206f0f

Update README.md

Files changed (1) hide show

README.md +26 -0

README.md CHANGED Viewed

@@ -69,6 +69,32 @@ GPT-4All Benchmark Set
 Average: 0.679
 ```
 TruthfulQA:
 ```
 hf-causal-experimental (pretrained=teknium/OpenHermes-7B,dtype=float16), limit: None, provide_description: False, num_fewshot: 0, batch_size: 8

 Average: 0.679
 ```
+BigBench:
+```
+|                      Task                      |Version|       Metric        |Value |   |Stderr|
+|------------------------------------------------|------:|---------------------|-----:|---|-----:|
+|bigbench_causal_judgement                       |      0|multiple_choice_grade|0.5000|±  |0.0364|
+|bigbench_date_understanding                     |      0|multiple_choice_grade|0.5908|±  |0.0256|
+|bigbench_disambiguation_qa                      |      0|multiple_choice_grade|0.3023|±  |0.0286|
+|bigbench_geometric_shapes                       |      0|multiple_choice_grade|0.1003|±  |0.0159|
+|                                                |       |exact_str_match      |0.0000|±  |0.0000|
+|bigbench_logical_deduction_five_objects         |      0|multiple_choice_grade|0.2520|±  |0.0194|
+|bigbench_logical_deduction_seven_objects        |      0|multiple_choice_grade|0.1871|±  |0.0148|
+|bigbench_logical_deduction_three_objects        |      0|multiple_choice_grade|0.3833|±  |0.0281|
+|bigbench_movie_recommendation                   |      0|multiple_choice_grade|0.2500|±  |0.0194|
+|bigbench_navigate                               |      0|multiple_choice_grade|0.5000|±  |0.0158|
+|bigbench_reasoning_about_colored_objects        |      0|multiple_choice_grade|0.4370|±  |0.0111|
+|bigbench_ruin_names                             |      0|multiple_choice_grade|0.2679|±  |0.0209|
+|bigbench_salient_translation_error_detection    |      0|multiple_choice_grade|0.2495|±  |0.0137|
+|bigbench_snarks                                 |      0|multiple_choice_grade|0.5249|±  |0.0372|
+|bigbench_sports_understanding                   |      0|multiple_choice_grade|0.5406|±  |0.0159|
+|bigbench_temporal_sequences                     |      0|multiple_choice_grade|0.2470|±  |0.0136|
+|bigbench_tracking_shuffled_objects_five_objects |      0|multiple_choice_grade|0.1944|±  |0.0112|
+|bigbench_tracking_shuffled_objects_seven_objects|      0|multiple_choice_grade|0.1509|±  |0.0086|
+|bigbench_tracking_shuffled_objects_three_objects|      0|multiple_choice_grade|0.3833|±  |0.0281|
+Average: 0.3367
+```
 TruthfulQA:
 ```
 hf-causal-experimental (pretrained=teknium/OpenHermes-7B,dtype=float16), limit: None, provide_description: False, num_fewshot: 0, batch_size: 8