Gregor Betz commited on
Commit
c91d7f4
β€’
1 Parent(s): 5b98e6a

update readme and about

Browse files
Files changed (2) hide show
  1. README.md +9 -1
  2. src/display/about.py +4 -13
README.md CHANGED
@@ -8,7 +8,15 @@ sdk_version: 4.4.0
8
  app_file: app.py
9
  pinned: true
10
  license: apache-2.0
11
- ---
 
 
 
 
 
 
 
 
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
14
 
 
8
  app_file: app.py
9
  pinned: true
10
  license: apache-2.0
11
+ duplicated_from: logikon/open_cot_leaderboard
12
+ fullWidth: true
13
+ space_ci:
14
+ private: true
15
+ secrets:
16
+ - HF_TOKEN
17
+ tags:
18
+ - leaderboard
19
+ short_description: Track, rank and evaluate open LLMs' CoT quality---
20
 
21
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
22
 
src/display/about.py CHANGED
@@ -53,18 +53,6 @@ Performance leaderboards like the [πŸ€— Open LLM Leaderboard](https://huggingfac
53
 
54
  Unlike these leaderboards, the `/\/` Open CoT Leaderboard assesses a model's ability to effectively reason about a `task`:
55
 
56
-
57
- ### πŸ€— Open LLM Leaderboard vs. `/\/` Open CoT Leaderboard
58
- * πŸ€—: Can `model` solve `task`?
59
- `/\/`: Can `model` do CoT to improve in `task`?
60
- * πŸ€—: Metric: absolute accuracy.
61
- `/\/`: Metric: relative accuracy gain.
62
- * πŸ€—: Measures `task` performance.
63
- `/\/`: Measures ability to reason (about `task`).
64
- * πŸ€—: Covers broad spectrum of `tasks`.
65
- `/\/`: Focuses on critical thinking `tasks`.
66
-
67
-
68
  ### πŸ€— Open LLM Leaderboard
69
  * a. Can `model` solve `task`?
70
  * b. Metric: absolute accuracy.
@@ -84,7 +72,10 @@ The test dataset porblems in the CoT Leaderboard can be solved through clear thi
84
 
85
 
86
  ## Reproducibility
87
- To reproduce our results, check out the repository [cot-eval](https://github.com/logikon-ai/cot-eval).
 
 
 
88
 
89
  """
90
 
 
53
 
54
  Unlike these leaderboards, the `/\/` Open CoT Leaderboard assesses a model's ability to effectively reason about a `task`:
55
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  ### πŸ€— Open LLM Leaderboard
57
  * a. Can `model` solve `task`?
58
  * b. Metric: absolute accuracy.
 
72
 
73
 
74
  ## Reproducibility
75
+ To learn more about the evaluation piepline and reproduce our results, check out the repository [cot-eval](https://github.com/logikon-ai/cot-eval).
76
+
77
+ ## Acknowledgements
78
+ We're grateful to community members for running evaluations and reporting results. To contribute, join us at [`cot-leaderboard`](https://huggingface.co/cot-leaderboard) organization.
79
 
80
  """
81