MaziyarPanahi
/

calme-2.3-qwen2-72b

Text Generation

text-generation-inference

Model card Files Files and versions Community

MaziyarPanahi commited on Sep 29

Commit

b738e7d

•

1 Parent(s): e85021f

Update README.md

Files changed (1) hide show

README.md +11 -16

README.md CHANGED Viewed

@@ -135,10 +135,19 @@ This model is suitable for a wide range of applications, including but not limit
 Coming soon
-# 🏆 [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
-Leaderboard 2: coming soon!
 |    Tasks     |Version|Filter|n-shot|Metric|Value |   |Stderr|
@@ -202,17 +211,3 @@ model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/calme-2.3-qwen2-72b"
 As with any large language model, users should be aware of potential biases and limitations. We recommend implementing appropriate safeguards and human oversight when deploying this model in production environments.
-# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
-Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__calme-2.3-qwen2-72b)
-|      Metric       |Value|
-|-------------------|----:|
-|Avg.               |30.17|
-|IFEval (0-Shot)    |38.50|
-|BBH (3-Shot)       |51.23|
-|MATH Lvl 5 (4-Shot)|14.73|
-|GPQA (0-shot)      |16.22|
-|MuSR (0-shot)      |11.24|
-|MMLU-PRO (5-shot)  |49.10|

 Coming soon
+# 🏆 [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__calme-2.3-qwen2-72b)
+|      Metric       |Value|
+|-------------------|----:|
+|Avg.               |30.17|
+|IFEval (0-Shot)    |38.50|
+|BBH (3-Shot)       |51.23|
+|MATH Lvl 5 (4-Shot)|14.73|
+|GPQA (0-shot)      |16.22|
+|MuSR (0-shot)      |11.24|
+|MMLU-PRO (5-shot)  |49.10|
 |    Tasks     |Version|Filter|n-shot|Metric|Value |   |Stderr|
 As with any large language model, users should be aware of potential biases and limitations. We recommend implementing appropriate safeguards and human oversight when deploying this model in production environments.