Add CHANGELOG
Browse files- app.py +6 -0
- content.py +24 -0
app.py
CHANGED
@@ -404,6 +404,12 @@ We chose these benchmarks as they test a variety of reasoning and general knowle
|
|
404 |
],
|
405 |
submission_result,
|
406 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
407 |
|
408 |
block.load(
|
409 |
refresh,
|
|
|
404 |
],
|
405 |
submission_result,
|
406 |
)
|
407 |
+
|
408 |
+
with gr.Row():
|
409 |
+
changelog = gr.Markdown(CHANGELOG_TEXT)
|
410 |
+
|
411 |
+
|
412 |
+
|
413 |
|
414 |
block.load(
|
415 |
refresh,
|
content.py
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
CHANGELOG_TEXT = f"""
|
2 |
+
# Changelog
|
3 |
+
|
4 |
+
## [2023-05-24]
|
5 |
+
- Add a baseline (all 25.0).
|
6 |
+
|
7 |
+
## [2023-05-23]
|
8 |
+
- Fixed a CSS issue that made the leaderboard hard to read in dark mode.
|
9 |
+
|
10 |
+
## [2023-05-22]
|
11 |
+
- Display a success/error message after submitting evaluation requests.
|
12 |
+
- Reject duplicate submission.
|
13 |
+
- Do not display results that have incomplete results.
|
14 |
+
- Display different queues for jobs that are RUNNING, PENDING, FINISHED status.
|
15 |
+
|
16 |
+
## [2023-05-15]
|
17 |
+
- Fixed a typo: from "TruthQA" to "TruthfulQA"
|
18 |
+
|
19 |
+
## [2023-05-10]
|
20 |
+
- Fixed a bug that prevented auto-refresh.
|
21 |
+
|
22 |
+
## [2023-05-10]
|
23 |
+
- Released the leaderboard to public.
|
24 |
+
"""
|