pansophic
/

rocket-3B

Text Generation

Inference Endpoints

Model card Files Files and versions Community

pansophic commited on Nov 21, 2023

Commit

951ff2d

•

1 Parent(s): 5a5385d

Update README.md

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -64,7 +64,6 @@ In AlpacaEval, Rocket 🦝 achieves a near 80% win rate, coupled with an average
 ## Other benchmarks
-Despite its impressive performance on MT-Bench and AlpacaEval benchmarks, the model experiences some challenges when evaluated on other benchmark tests.
 | Metric                | Value                     |
 |-----------------------|---------------------------|
@@ -72,9 +71,9 @@ Despite its impressive performance on MT-Bench and AlpacaEval benchmarks, the mo
 | ARC (25-shot)         | 50.51          |
 | HellaSwag (10-shot)   | 73.91    |
 | MMLU (5-shot)         | 61.07         |
-| TruthfulQA (0-shot)   | 57.45   |
 | Winogrande (5-shot)   | 63.22   |
-| GSM8K (5-shot)        | 12.74        |
 | DROP (3-shot)         | 9.66         |

 ## Other benchmarks
 | Metric                | Value                     |
 |-----------------------|---------------------------|
 | ARC (25-shot)         | 50.51          |
 | HellaSwag (10-shot)   | 73.91    |
 | MMLU (5-shot)         | 61.07         |
+| TruthfulQA (mc2) (0-shot)   | 54.38   |
 | Winogrande (5-shot)   | 63.22   |
+| GSM8K (5-shot)        | 37.91        |
 | DROP (3-shot)         | 9.66         |