Spaces:

logikon
/

open_cot_leaderboard

Running on CPU Upgrade

Gregor Betz commited on Jan 30

Commit

ad554f1

•

1 Parent(s): 992caee

description

Files changed (1) hide show

src/display/about.py CHANGED Viewed

@@ -34,7 +34,7 @@ See the "About" tab for more details and motivation.
 """
 # Which evaluations are you running? how can people reproduce what you have?
-LLM_BENCHMARKS_TEXT = """
 ## How it works (roughly)
 To assess the reasoning skill of a given `model`, we carry out the following steps for each `task` (test dataset) and different CoT `regimes`. (A CoT `regime` consists in a prompt chain and decoding parameters used to generate a reasoning trace.)
@@ -53,6 +53,10 @@ Performance leaderboards like the [🤗 Open LLM Leaderboard](https://huggingfac
 Unlike these leaderboards, the `/\/` Open CoT Leaderboard assess a model's ability to effectively reason about a `task`:
 ### 🤗 Open LLM Leaderboard
 * Can `model` solve `task`?
 * Measures `task` performance.

 """
 # Which evaluations are you running? how can people reproduce what you have?
+LLM_BENCHMARKS_TEXT = f"""
 ## How it works (roughly)
 To assess the reasoning skill of a given `model`, we carry out the following steps for each `task` (test dataset) and different CoT `regimes`. (A CoT `regime` consists in a prompt chain and decoding parameters used to generate a reasoning trace.)
 Unlike these leaderboards, the `/\/` Open CoT Leaderboard assess a model's ability to effectively reason about a `task`:
+| Leaderboard | Measures | Metric | Focus |
+|:---|:---|:---|:---|
+| 🤗 Open LLM Leaderboard | Task performance | Absolute accuracy | Task performance |
 ### 🤗 Open LLM Leaderboard
 * Can `model` solve `task`?
 * Measures `task` performance.