ritabratamaiti leaderboard-pr-bot commited on
Commit
f8869a8
1 Parent(s): 933fb9d

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (5f384a8c0a62406c890c227a8e1f1c9ec506c061)
- Update (7d3751da83305965a4f0e0b8abd81e843639d2d4)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +15 -1
README.md CHANGED
@@ -178,4 +178,18 @@ Larger and more permissive versions of Calypso will be released in the future. I
178
 
179
  ---
180
 
181
- **Disclaimer:** This model card is provided for informational purposes only. Users are responsible for using the model in accordance with applicable laws and ethical considerations.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
178
 
179
  ---
180
 
181
+ **Disclaimer:** This model card is provided for informational purposes only. Users are responsible for using the model in accordance with applicable laws and ethical considerations.
182
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
183
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Xilabs__calypso-3b-alpha-v2)
184
+
185
+ | Metric | Value |
186
+ |-----------------------|---------------------------|
187
+ | Avg. | 37.52 |
188
+ | ARC (25-shot) | 41.55 |
189
+ | HellaSwag (10-shot) | 71.48 |
190
+ | MMLU (5-shot) | 25.82 |
191
+ | TruthfulQA (0-shot) | 35.73 |
192
+ | Winogrande (5-shot) | 65.27 |
193
+ | GSM8K (5-shot) | 0.68 |
194
+ | DROP (3-shot) | 22.08 |
195
+