Possible future contamination problem

#7
by supercharge19 - opened

If there are only a handful of example (questions) which can easily be answered by humans and you have released that to public what is stopping ranking seekers to contaminate their models by manually writing answers to them and then training models over that data?

GAIA org

Nothing.
However, manually answering the questions is (i) conceptually easy but also extremely tedious (ii) difficult to hide (we ask model owners to provide reasoning trace, scores might be suspicious etc.) (iii) not robust since we plan to renew the test set in case of contamination

I was thinking that you would have a higher number of questions and while mention that you have only 300 and even "leak" questions but strictly guard other questions and answers and not even mention how many are there.

In complement to @gregmialz 's very good answer, we actually need people to know what the questions from the test set are, so they can use their models on them and give us their answers :)

clefourrier changed discussion status to closed

Sign up or log in to comment