Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
`this space` => `this dataset`
#2
by
pierric
HF staff
- opened
- content.py +1 -1
content.py
CHANGED
@@ -7,7 +7,7 @@ GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with aug
|
|
7 |
GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve.
|
8 |
It is therefore divided in 3 levels, where level 1 should be breakable by very good LLMs, and level 3 indicate a strong jump in model capabilities. Each level is divided into a fully public dev set for validation, and a test set with private answers and metadata.
|
9 |
|
10 |
-
GAIA data can be found in [this
|
11 |
|
12 |
## Submissions
|
13 |
Results can be submitted for both validation and test. Scores are expressed as the percentage of correct answers for a given split.
|
|
|
7 |
GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve.
|
8 |
It is therefore divided in 3 levels, where level 1 should be breakable by very good LLMs, and level 3 indicate a strong jump in model capabilities. Each level is divided into a fully public dev set for validation, and a test set with private answers and metadata.
|
9 |
|
10 |
+
GAIA data can be found in [this dataset](https://huggingface.co/datasets/gaia-benchmark/GAIA). Questions are contained in `metadata.jsonl`. Some questions come with an additional file, that can be found in the same folder and whose id is given in the field `file_name`.
|
11 |
|
12 |
## Submissions
|
13 |
Results can be submitted for both validation and test. Scores are expressed as the percentage of correct answers for a given split.
|