Update contamination_report.csv
Browse filesWhat are you reporting:
- [ ] Evaluation dataset(s) found in a pre-training corpus. (e.g. COPA found in ThePile)
- [x] Evaluation dataset(s) found in a pre-trained model. (e.g. FLAN T5 has been trained on ANLI)
**Contaminated model(s):** GPT-4
**Contaminated Corpora**:
- openai_humaneval (25%)
- ucinlp/drop (21%)
**Approach**:
- [x] Data-based approach
- [ ] Model-based approach
**Description of your method, 3-4 sentences. Evidence of data contamination**:
OpenAI tech report measures cross-contamination between our evaluation dataset and the pre-training data using substring match. Both evaluation and training data are processed by removing all spaces and symbols, 28 keeping only characters (including numbers). For each evaluation example, they randomly select three substrings of 50 characters (or use the entire example if it’s less than 50 characters). A match is identified if any of the three sampled evaluation substrings is a substring of the processed training example. This yields a list of contaminated examples.
**Citation**
Is there a paper that reports the data contamination or describes the method used to detect data contamination? Yes
**url**: [https://arxiv.org/abs/2303.08774](https://arxiv.org/abs/2303.08774)
```@article{achiam2023gpt,
title={Gpt-4 technical report},
author={Achiam, Josh and Adler, Steven and Agarwal, Sandhini and Ahmad, Lama and Akkaya, Ilge and Aleman, Florencia Leoni and Almeida, Diogo and Altenschmidt, Janko and Altman, Sam and Anadkat, Shyamal and others},
journal={arXiv preprint arXiv:2303.08774},
year={2023}
}
```
Important! If you wish to be listed as an author in the final report, please complete this information for all the authors of this Pull Request.
Full name: Ameya Prabhu
Institution: Tübingen AI Center, University of Tübingen
Email: [email protected]
- contamination_report.csv +5 -0
@@ -462,4 +462,9 @@ bigbio/mednli;;GPT-3.5;model;0.0;0.0;0.0;model-based;https://arxiv.org/pdf/2308.
|
|
462 |
|
463 |
RadNLI;;GPT-4;model;0.0;0.0;0.0;model-based;https://arxiv.org/pdf/2308.08493;8
|
464 |
RadNLI;;GPT-3.5;model;0.0;0.0;0.0;model-based;https://arxiv.org/pdf/2308.08493;8
|
|
|
|
|
|
|
|
|
|
|
465 |
|
|
|
462 |
|
463 |
RadNLI;;GPT-4;model;0.0;0.0;0.0;model-based;https://arxiv.org/pdf/2308.08493;8
|
464 |
RadNLI;;GPT-3.5;model;0.0;0.0;0.0;model-based;https://arxiv.org/pdf/2308.08493;8
|
465 |
+
|
466 |
+
|
467 |
+
openai_humaneval;;GPT-4;model;0.0;0.0;25%;data-based;https://arxiv.org/abs/2303.08774;
|
468 |
+
ucinlp/drop;;GPT-4;model;0.0;0.0;21%;data-based;https://arxiv.org/abs/2303.08774;
|
469 |
+
|
470 |
|