Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
Spaces:
Duplicated from
nan/leaderboard
AIR-Bench
/
leaderboard
like
60
Running
on
CPU Upgrade
App
Files
Files
Community
28
f765492
leaderboard
/
src
/
benchmarks.py
Commit History
feat: add versioning for the long-doc
bf586e3
nan
commited on
Oct 17
feat: implement the version selector for qa
7845083
nan
commited on
Oct 16
refactor: rename the benchmarks enum
270c122
nan
commited on
Oct 15
refactor: refactor the benchmarks
3fcf957
nan
commited on
Oct 15
refactor: refactor the envs
4791ac5
nan
commited on
Oct 15
refactor: refactor the envs
ba13e25
nan
commited on
Oct 15
refactor: refactor the benchmarks
649e0fb
nan
commited on
Oct 15
feat-add-v2405 (
#26
)
0785fe4
verified
nan
hanhainebula
commited on
Oct 6
feat-switch-to-ndcg-for-qa-0607 (
#19
)
973bd2a
verified
nan
commited on
Jun 7
feat-use-recall-as-default-metric-0605 (
#18
)
bbfe4c1
verified
nan
commited on
Jun 5
fix a bug in METRIC_LIST
443f557
hanhainebula
commited on
May 22
disable law-zh
8b7258f
verified
hanhainebula
commited on
May 20
Fix bug in dataset_dict: "gpt-3" -> "gpt3"
8102fce
verified
hanhainebula
commited on
May 19
Fix bug in dataset_dict: "health" -> "healthcare"
4a44211
verified
hanhainebula
commited on
May 19
Add msmarco for qa task
43fbed5
verified
hanhainebula
commited on
May 14
feat: improve the layout
32ebf18
nan
commited on
May 12
feat: adapt to the latest data format
1a2dba5
nan
commited on
May 11
chore: clean up
a96f80a
nan
commited on
May 10
feat: fix the table updating
f30cbcc
nan
commited on
May 10
feat: adapt UI in app.py
e8879cc
nan
commited on
May 9
feat: adapt the utils in app.py
9c49811
nan
commited on
May 9
feat: seperate the qa and longdoc tasks
9134169
nan
commited on
May 9
feat: adapt the data loading part
8b7a945
nan
commited on
May 9