Update README.md
Browse files
README.md
CHANGED
@@ -15,7 +15,7 @@ DCLM-Baseline-7B is a 7 billion parameter language model trained on the DCLM-Bas
|
|
15 |
|
16 |
| Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
|
17 |
|------|-----------------|--------|-------------|-----------------|----------------|
|
18 |
-
| 7B | 2.6T | 32 | 4096 | 32 |
|
19 |
|
20 |
|
21 |
### Model Description
|
@@ -44,7 +44,7 @@ The model was trained using the following setup:
|
|
44 |
- **Learning Rate:** 2e-3 (peak)
|
45 |
- **Weight Decay:** 0.05
|
46 |
- **Batch Size:** 2048 sequences
|
47 |
-
- **Sequence Length:**
|
48 |
- **Total Training Tokens:** 2.6T
|
49 |
- **Hardware:** Trained on H100 GPUs
|
50 |
|
|
|
15 |
|
16 |
| Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
|
17 |
|------|-----------------|--------|-------------|-----------------|----------------|
|
18 |
+
| 7B | 2.6T | 32 | 4096 | 32 | 8192 |
|
19 |
|
20 |
|
21 |
### Model Description
|
|
|
44 |
- **Learning Rate:** 2e-3 (peak)
|
45 |
- **Weight Decay:** 0.05
|
46 |
- **Batch Size:** 2048 sequences
|
47 |
+
- **Sequence Length:** 8192 tokens
|
48 |
- **Total Training Tokens:** 2.6T
|
49 |
- **Hardware:** Trained on H100 GPUs
|
50 |
|