apple
/

DCLM-7B-8k

vaishaal commited on Jul 16

Commit

b514b5e

•

1 Parent(s): 5967e4e

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ DCLM-Baseline-7B is a 7 billion parameter language model trained on the DCLM-Bas
 | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
 |------|-----------------|--------|-------------|-----------------|----------------|
-| 7B   | 2.6T            | 32     | 4096        | 32              | 2048           |
 ### Model Description
@@ -44,7 +44,7 @@ The model was trained using the following setup:
 - **Learning Rate:** 2e-3 (peak)
 - **Weight Decay:** 0.05
 - **Batch Size:** 2048 sequences
-- **Sequence Length:** 2048 tokens
 - **Total Training Tokens:** 2.6T
 - **Hardware:** Trained on H100 GPUs

 | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
 |------|-----------------|--------|-------------|-----------------|----------------|
+| 7B   | 2.6T            | 32     | 4096        | 32              | 8192           |
 ### Model Description
 - **Learning Rate:** 2e-3 (peak)
 - **Weight Decay:** 0.05
 - **Batch Size:** 2048 sequences
+- **Sequence Length:** 8192 tokens
 - **Total Training Tokens:** 2.6T
 - **Hardware:** Trained on H100 GPUs