Update contamination_report.csv

#26

What are you reporting:

  • Evaluation dataset(s) found in a pre-trained model. (e.g. FLAN T5 has been trained on ANLI)

Evaluation dataset(s): openai_humaneval

Contaminated model(s): gpt-3.5-turbo-1106, gpt-3.5-turbo-0613

Contaminated split(s): 41.47%, 23.79%

Briefly describe your method to detect data contamination

  • Model-based approach

Model-based approaches

The cited paper highlights how ChatGPT, when tested with the HumanEval dataset, shows high contamination levels. This is evident from the high Average Peak and Leak Ratios, especially compared to the clean CodeForces2305 dataset where ChatGPT's performance drops. The TED method proves effective in identifying and mitigating these contamination issues. The values can be verified from Table 5 of the cited paper.

Citation

Is there a paper that reports the data contamination or describes the method used to detect data contamination?

URL: https://arxiv.org/pdf/2402.15938
Citation: @misc{dong2024generalization, title={Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models}, author={Yihong Dong and Xue Jiang and Huanyu Liu and Zhi Jin and Ge Li}, year={2024}, eprint={2402.15938}, archivePrefix={arXiv}, primaryClass={cs.CL} }

Important! If you wish to be listed as an author in the final report, please complete this information for all the authors of this Pull Request.

  • Full name: Suryansh Sharma
  • Institution: Indian Institute of Technology Kharagpur
  • Email: [email protected]
Workshop on Data Contamination org

Hi @suryanshs16103 ,

The evidence you are trying to add is already in the database. Please check this PR.

Please, before creating a new PR, check whether the evidence you want to add is or is not already in the database. I will close this PR.

Best,
Oscar

OSainz changed pull request status to closed

Sign up or log in to comment