Update README.md
Browse files
README.md
CHANGED
@@ -29,7 +29,8 @@ DCLM-Baseline-7B is a 7 billion parameter language model trained on the DCLM-Bas
|
|
29 |
|
30 |
### Model Sources
|
31 |
|
32 |
-
- **Repository:** https://github.com/
|
|
|
33 |
- **Paper:** [DataComp-LM: In search of the next generation of training sets for language models](https://arxiv.org/abs/2406.11794)
|
34 |
|
35 |
## Uses
|
@@ -54,17 +55,18 @@ print(tokenizer.decode(outputs[0]))
|
|
54 |
|
55 |
The model was trained using the following setup:
|
56 |
|
57 |
-
- **Architecture:** Decoder-only Transformer
|
58 |
- **Framework:** PyTorch with OpenLM
|
59 |
- **Optimizer:** AdamW
|
60 |
- **Learning Rate:** 2e-3 (peak)
|
61 |
- **Weight Decay:** 0.05
|
62 |
- **Batch Size:** 2048 sequences
|
63 |
- **Sequence Length:** 2048 tokens
|
64 |
-
- **Total Training Tokens:** 2.
|
65 |
- **Hardware:** Trained on H100 GPUs
|
66 |
|
67 |
For more detailed training information, please refer to Section 3.4 and Appendix F of the DCLM paper.
|
|
|
68 |
|
69 |
## Evaluation
|
70 |
|
|
|
29 |
|
30 |
### Model Sources
|
31 |
|
32 |
+
- **Repository:** https://github.com/mlfoundations/dclm
|
33 |
+
- **Dataset:** https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0
|
34 |
- **Paper:** [DataComp-LM: In search of the next generation of training sets for language models](https://arxiv.org/abs/2406.11794)
|
35 |
|
36 |
## Uses
|
|
|
55 |
|
56 |
The model was trained using the following setup:
|
57 |
|
58 |
+
- **Architecture:** Decoder-only Transformer
|
59 |
- **Framework:** PyTorch with OpenLM
|
60 |
- **Optimizer:** AdamW
|
61 |
- **Learning Rate:** 2e-3 (peak)
|
62 |
- **Weight Decay:** 0.05
|
63 |
- **Batch Size:** 2048 sequences
|
64 |
- **Sequence Length:** 2048 tokens
|
65 |
+
- **Total Training Tokens:** 2.5T
|
66 |
- **Hardware:** Trained on H100 GPUs
|
67 |
|
68 |
For more detailed training information, please refer to Section 3.4 and Appendix F of the DCLM paper.
|
69 |
+
To ensure our trained model is broadly useful, including for math and coding tasks, we combine our 3.8T [DCLM-BASELINE](https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0) with the [StarCoder](https://huggingface.co/datasets/bigcode/starcoderdata) and [ProofPile2](https://huggingface.co/datasets/EleutherAI/proof-pile-2) data to arrive at a 4.1T token dataset.
|
70 |
|
71 |
## Evaluation
|
72 |
|