Spaces:
AIR-Bench
/
Running on CPU Upgrade

leaderboard / src /benchmarks.py

Commit History

feat: add versioning for the long-doc
bf586e3

nan commited on

feat: implement the version selector for qa
7845083

nan commited on

refactor: rename the benchmarks enum
270c122

nan commited on

refactor: refactor the benchmarks
3fcf957

nan commited on

refactor: refactor the envs
4791ac5

nan commited on

refactor: refactor the envs
ba13e25

nan commited on

refactor: refactor the benchmarks
649e0fb

nan commited on

feat-switch-to-ndcg-for-qa-0607 (#19)
973bd2a
verified

nan commited on

feat-use-recall-as-default-metric-0605 (#18)
bbfe4c1
verified

nan commited on

fix a bug in METRIC_LIST
443f557

hanhainebula commited on

Fix bug in dataset_dict: "gpt-3" -> "gpt3"
8102fce
verified

hanhainebula commited on

Fix bug in dataset_dict: "health" -> "healthcare"
4a44211
verified

hanhainebula commited on

Add msmarco for qa task
43fbed5
verified

hanhainebula commited on

feat: improve the layout
32ebf18

nan commited on

feat: adapt to the latest data format
1a2dba5

nan commited on

chore: clean up
a96f80a

nan commited on

feat: fix the table updating
f30cbcc

nan commited on

feat: adapt UI in app.py
e8879cc

nan commited on

feat: adapt the utils in app.py
9c49811

nan commited on

feat: seperate the qa and longdoc tasks
9134169

nan commited on

feat: adapt the data loading part
8b7a945

nan commited on