Update README.md
Browse files
README.md
CHANGED
@@ -33,7 +33,7 @@ We continual pretrain on the expanded vocabulary [homebrewltd/llama3.2-3B-s-whis
|
|
33 |
## Training process
|
34 |
**Training Metrics Image**: Below is a snapshot of the training loss curve visualized.
|
35 |
|
36 |
-
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65713d70f56f9538679e5a56/
|
37 |
|
38 |
**MMLU**:
|
39 |
|
@@ -63,9 +63,9 @@ We utilize [torchtune](https://github.com/pytorch/torchtune) library for the lat
|
|
63 |
| **Epoch** | 1 |
|
64 |
| **Global batch size** | 480 |
|
65 |
| **Learning Rate** | 2e-4 |
|
66 |
-
| **Learning Scheduler** |
|
67 |
| **Optimizer** | AdamW fused |
|
68 |
-
| **Warmup Steps** |
|
69 |
| **Weight Decay** | 0.01 |
|
70 |
| **Max Sequence Length** | 512 |
|
71 |
|
|
|
33 |
## Training process
|
34 |
**Training Metrics Image**: Below is a snapshot of the training loss curve visualized.
|
35 |
|
36 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65713d70f56f9538679e5a56/etosaWAQ8TASXOEUADGpi.png)
|
37 |
|
38 |
**MMLU**:
|
39 |
|
|
|
63 |
| **Epoch** | 1 |
|
64 |
| **Global batch size** | 480 |
|
65 |
| **Learning Rate** | 2e-4 |
|
66 |
+
| **Learning Scheduler** | LambdaLR with warmup |
|
67 |
| **Optimizer** | AdamW fused |
|
68 |
+
| **Warmup Steps** | 80 |
|
69 |
| **Weight Decay** | 0.01 |
|
70 |
| **Max Sequence Length** | 512 |
|
71 |
|