Code contamination in HumanEval and MBPP

#12

What are you reporting:

  • Evaluation dataset(s) found in a pre-training corpus. (e.g. COPA found in ThePile)
  • Evaluation dataset(s) found in a pre-trained model. (e.g. FLAN T5 has been trained on ANLI)

Contaminated Evaluation Dataset(s):

  • openai_humaneval
  • mbpp

Contaminated Corpora:

  • EleutherAI/pile
  • bigcode/the-stack

Approach:

  • Data-based approach
  • Model-based approach

Description of your method, 3-4 sentences. Evidence of data contamination:

An example in the test data (i.e., those of MBPP or HumanEval), is noted as contaminated if the aggregated similarity score is 100, i.e., a perfect match exists on the surface- or semantic-level. Levenshtein similarity score is used to measure surface-level similarity between programs and Dolos toolkit), which is a source code plagiarism detection tool for education purposes, measure semantic similarity between programs.

Note: False positives do exist despite 100% match due to either the example being too simple and obvious, or being flagged as being similar to the gold program despite being quite different.

Citation

Is there a paper that reports the data contamination or describes the method used to detect data contamination? Yes

url: https://arxiv.org/abs/2403.04811

  title={Quantifying contamination in evaluating code generation capabilities of language models},
  author={Riddell, Martin and Ni, Ansong and Cohan, Arman},
  journal={arXiv preprint arXiv:2403.04811},
  year={2024}
}

Important! If you wish to be listed as an author in the final report, please complete this information for all the authors of this Pull Request.

Full name: Ameya Prabhu
Institution: Tübingen AI Center, University of Tübingen
Email: [email protected]

AmeyaPrabhu changed pull request title from Update contamination_report.csv to Code contamination in HumanEval and MBPP
Workshop on Data Contamination org

Thank you for your contribution!

Best,
Oscar

OSainz changed pull request status to merged

Sign up or log in to comment