datasets: | |
- Locutusque/TM-DATA-V2 | |
- LLM360/TxT360 | |
- mlfoundations/dclm-baseline-1.0 | |
- Skylion007/openwebtext | |
- JeanKaddour/minipile | |
language: | |
- en | |
license: apache-2.0 | |
still in training. Trained on about ~17 billion tokens so far. |
datasets: | |
- Locutusque/TM-DATA-V2 | |
- LLM360/TxT360 | |
- mlfoundations/dclm-baseline-1.0 | |
- Skylion007/openwebtext | |
- JeanKaddour/minipile | |
language: | |
- en | |
license: apache-2.0 | |
still in training. Trained on about ~17 billion tokens so far. |