Update README.md
#4
by
forrest-gradient
- opened
README.md
CHANGED
@@ -44,8 +44,8 @@ For training data, we generate long contexts by augmenting [SlimPajama](https://
|
|
44 |
| RoPE theta | 15.3 M | 207.1 M | 1.06B | 2.80B | 45.2B |
|
45 |
| Batch Size | 1 | 1 | 16 | 16 | 2 |
|
46 |
| Gradient Accumulation Steps | 32 | 16 | 1 | 1 | 2 |
|
47 |
-
| Steps | 30 | 24 | 50 | 50 |
|
48 |
-
| Total Tokens | 62914560 | 100663296 | 419430400 | 838860800 |
|
49 |
| Learning Rate | 2.00E-05 | 2.00E-05 | 2.00E-05 | 2.00E-05 | 2.00E-05 |
|
50 |
| # GPUs | 8 | 32 | 512 | 512 | 512 |
|
51 |
| GPU Type | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S |
|
|
|
44 |
| RoPE theta | 15.3 M | 207.1 M | 1.06B | 2.80B | 45.2B |
|
45 |
| Batch Size | 1 | 1 | 16 | 16 | 2 |
|
46 |
| Gradient Accumulation Steps | 32 | 16 | 1 | 1 | 2 |
|
47 |
+
| Steps | 30 | 24 | 50 | 50 | 12 |
|
48 |
+
| Total Tokens | 62914560 | 100663296 | 419430400 | 838860800 | 201326592 |
|
49 |
| Learning Rate | 2.00E-05 | 2.00E-05 | 2.00E-05 | 2.00E-05 | 2.00E-05 |
|
50 |
| # GPUs | 8 | 32 | 512 | 512 | 512 |
|
51 |
| GPU Type | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S |
|