Spaces:
Duplicated from nan/leaderboard

AIR-Bench
/

leaderboard

Running on CPU Upgrade

App Files Files Community

leaderboard / src /benchmarks.py

Commit History

feat: add versioning for the long-doc

bf586e3

nan commited on Oct 17

feat: implement the version selector for qa

7845083

nan commited on Oct 16

refactor: rename the benchmarks enum

270c122

nan commited on Oct 15

refactor: refactor the benchmarks

3fcf957

nan commited on Oct 15

refactor: refactor the envs

4791ac5

nan commited on Oct 15

refactor: refactor the envs

ba13e25

nan commited on Oct 15

refactor: refactor the benchmarks

649e0fb

nan commited on Oct 15

feat-add-v2405 (#26)

0785fe4
verified

hanhainebula commited on Oct 6

feat-switch-to-ndcg-for-qa-0607 (#19)

973bd2a
verified

nan commited on Jun 7

feat-use-recall-as-default-metric-0605 (#18)

bbfe4c1
verified

nan commited on Jun 5

fix a bug in METRIC_LIST

443f557

hanhainebula commited on May 22

disable law-zh

8b7258f
verified

hanhainebula commited on May 20

Fix bug in dataset_dict: "gpt-3" -> "gpt3"

8102fce
verified

hanhainebula commited on May 19

Fix bug in dataset_dict: "health" -> "healthcare"

4a44211
verified

hanhainebula commited on May 19

Add msmarco for qa task

43fbed5
verified

hanhainebula commited on May 14

feat: improve the layout

32ebf18

nan commited on May 12

feat: adapt to the latest data format

1a2dba5

nan commited on May 11

chore: clean up

a96f80a

nan commited on May 10

feat: fix the table updating

f30cbcc

nan commited on May 10

feat: adapt UI in app.py

e8879cc

nan commited on May 9

feat: adapt the utils in app.py

9c49811

nan commited on May 9

feat: seperate the qa and longdoc tasks

9134169

nan commited on May 9

feat: adapt the data loading part

8b7a945

nan commited on May 9