Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Gregor Betz
commited on
Commit
•
9218171
1
Parent(s):
749c594
description
Browse files- src/display/about.py +8 -8
src/display/about.py
CHANGED
@@ -55,16 +55,16 @@ Unlike these leaderboards, the `/\/` Open CoT Leaderboard assess a model's abili
|
|
55 |
|
56 |
|
57 |
### 🤗 Open LLM Leaderboard
|
58 |
-
a. Can `model` solve `task`?
|
59 |
-
b. Metric: absolute accuracy.
|
60 |
-
c. Measures `task` performance.
|
61 |
-
d. Covers broad spectrum of `tasks`.
|
62 |
|
63 |
### `/\/` Open CoT Leaderboard
|
64 |
-
a. Can `model` do CoT to improve in `task`?
|
65 |
-
b. Metric: relative accuracy gain.
|
66 |
-
c. Measures ability to reason (about `task`).
|
67 |
-
d. Focuses on critical thinking `tasks`.
|
68 |
|
69 |
|
70 |
## Test dataset selection (`tasks`)
|
|
|
55 |
|
56 |
|
57 |
### 🤗 Open LLM Leaderboard
|
58 |
+
* a. Can `model` solve `task`?
|
59 |
+
* b. Metric: absolute accuracy.
|
60 |
+
* c. Measures `task` performance.
|
61 |
+
* d. Covers broad spectrum of `tasks`.
|
62 |
|
63 |
### `/\/` Open CoT Leaderboard
|
64 |
+
* a. Can `model` do CoT to improve in `task`?
|
65 |
+
* b. Metric: relative accuracy gain.
|
66 |
+
* c. Measures ability to reason (about `task`).
|
67 |
+
* d. Focuses on critical thinking `tasks`.
|
68 |
|
69 |
|
70 |
## Test dataset selection (`tasks`)
|