metadata
datasets:
- Locutusque/TM-DATA-V2
- LLM360/TxT360
- mlfoundations/dclm-baseline-1.0
- Skylion007/openwebtext
- JeanKaddour/minipile
language:
- en
license: apache-2.0
still in training. Trained on about ~17 billion tokens so far.