Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
Spaces:
CONDA-Workshop
/
Data-Contamination-Database
like
16
Sleeping
App
Files
Files
Community
29
935e79b
Data-Contamination-Database
14 contributors
History:
17 commits
Iker
vishaal27
Add data from "Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus" (
#6
)
935e79b
verified
8 months ago
.gitattributes
Safe
1.52 kB
initial commit
9 months ago
.gitignore
Safe
12 Bytes
Style + gitignore
9 months ago
README.md
Safe
352 Bytes
Initital commit
9 months ago
app.py
Safe
6.23 kB
Increase tab font size
8 months ago
contamination_report.csv
Safe
34.5 kB
Add data from "Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus" (#6)
8 months ago
dataset.py
Safe
9.64 kB
Add PR links to previous commits
8 months ago
markdown.py
Safe
9.83 kB
update urls
8 months ago
requirements.txt
Safe
73 Bytes
Initital commit
9 months ago
utils.py
Safe
6.11 kB
Get token from environment
8 months ago