Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Gregor Betz
commited on
Commit
β’
058891a
1
Parent(s):
f621b6a
description
Browse files- src/display/about.py +5 -2
src/display/about.py
CHANGED
@@ -51,10 +51,13 @@ Each `regime` has a different _accuracy gain Ξ_, and the leaderboard reports (f
|
|
51 |
|
52 |
Performance leaderboards like the [π€ Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) or [YALL](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) do a great job in ranking models according task performance.
|
53 |
|
|
|
|
|
54 |
|π€ Open LLM Leaderboard |`/\/` Open CoT Leaderboard |
|
55 |
|---|---|
|
56 |
-
|Can `model` solve task
|
57 |
-
|Measures
|
|
|
58 |
|Covers broad spectrum of `tasks`.|Focuses on critical thinking `tasks`.|
|
59 |
|
60 |
|
|
|
51 |
|
52 |
Performance leaderboards like the [π€ Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) or [YALL](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) do a great job in ranking models according task performance.
|
53 |
|
54 |
+
Unlike these leaderboards, the `/\/` Open CoT Leaderboard assess a model's ability to effectively reason about a `task`:
|
55 |
+
|
56 |
|π€ Open LLM Leaderboard |`/\/` Open CoT Leaderboard |
|
57 |
|---|---|
|
58 |
+
|Can `model` solve `task`?|Can `model` do CoT to improve in `task`?|
|
59 |
+
|Measures `task` performance.|Measures ability to reason (about `task`).|
|
60 |
+
|Metric: absolute accuracy.|Metric: relative accuracy gain.|
|
61 |
|Covers broad spectrum of `tasks`.|Focuses on critical thinking `tasks`.|
|
62 |
|
63 |
|