leaderboard-pr-bot's picture
Adding Evaluation Results
64f0363
|
raw
history blame
683 Bytes

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 44.13
ARC (25-shot) 53.67
HellaSwag (10-shot) 78.21
MMLU (5-shot) 45.9
TruthfulQA (0-shot) 46.13
Winogrande (5-shot) 73.8
GSM8K (5-shot) 4.7
DROP (3-shot) 6.53