AmeyaPrabhu commited on
Commit
c36b27e
1 Parent(s): 473e687

Mistral 7B Arc Easy Contamination based on "Proving Test Set Contamination in Black Box Language Models"

Browse files

# What are you reporting:

- [ ] Evaluation dataset(s) found in a pre-training corpus. (e.g. COPA found in ThePile)
- [x] Evaluation dataset(s) found in a pre-trained model. (e.g. FLAN T5 has been trained on ANLI)

Contaminated Evaluation Dataset(s):

- ibragim-bad/arc_easy

Contaminated Model:

- Mistral 7B

Approach:

- [ ] Data-based approach
- [x] Model-based approach

**Description of your method, 3-4 sentences. Evidence of data contamination:**

They perform a statistical test on log probs of the model, where they compare the log prob of the dataset under its original ordering to the log probability under random permutations. Specifically, they have a shared version where they test that the log-probability under the canonical ordering X is higher than the average log probability under a random permutation.

**Citation:**

Is there a paper that reports the data contamination or describes the method used to detect data contamination? Yes

**url**: https://arxiv.org/abs/2310.17623

```@article{oren2023proving,
title={Proving test set contamination in black box language models},
author={Oren, Yonatan and Meister, Nicole and Chatterji, Niladri and Ladhak, Faisal and Hashimoto, Tatsunori B},
journal={arXiv preprint arXiv:2310.17623},
year={2023}
}
```

Important! If you wish to be listed as an author in the final report, please complete this information for all the authors of this Pull Request.

Full name: Ameya Prabhu
Institution: Tübingen AI Center, University of Tübingen
Email: [email protected]

Files changed (1) hide show
  1. contamination_report.csv +2 -0
contamination_report.csv CHANGED
@@ -597,3 +597,5 @@ ibragim-bad/arc_challenge;;FLAN;model;;15.6;;data-based;https://arxiv.org/abs/21
597
  facebook/anli;dev_r3;FLAN;model;;40.2;;data-based;https://arxiv.org/abs/2109.01652;13
598
  facebook/anli;dev_r2;FLAN;model;;97.9;;data-based;https://arxiv.org/abs/2109.01652;13
599
  facebook/anli;dev_r1;FLAN;model;;98.6;;data-based;https://arxiv.org/abs/2109.01652;13
 
 
 
597
  facebook/anli;dev_r3;FLAN;model;;40.2;;data-based;https://arxiv.org/abs/2109.01652;13
598
  facebook/anli;dev_r2;FLAN;model;;97.9;;data-based;https://arxiv.org/abs/2109.01652;13
599
  facebook/anli;dev_r1;FLAN;model;;98.6;;data-based;https://arxiv.org/abs/2109.01652;13
600
+
601
+ ibragim-bad/arc_easy;;Mistral 7B;model;;;100.0;model-based;https://arxiv.org/abs/2310.17623;14