jan-hq commited on
Commit
6bb9783
1 Parent(s): 2d1a375

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -33,7 +33,7 @@ We continual pretrain on the expanded vocabulary [homebrewltd/llama3.2-3B-s-whis
33
  ## Training process
34
  **Training Metrics Image**: Below is a snapshot of the training loss curve visualized.
35
 
36
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65713d70f56f9538679e5a56/gtpDSs750SkMPJO0-UtFq.png)
37
 
38
  **MMLU**:
39
 
@@ -63,9 +63,9 @@ We utilize [torchtune](https://github.com/pytorch/torchtune) library for the lat
63
  | **Epoch** | 1 |
64
  | **Global batch size** | 480 |
65
  | **Learning Rate** | 2e-4 |
66
- | **Learning Scheduler** | Cosine with warmup |
67
  | **Optimizer** | AdamW fused |
68
- | **Warmup Steps** | 50 |
69
  | **Weight Decay** | 0.01 |
70
  | **Max Sequence Length** | 512 |
71
 
 
33
  ## Training process
34
  **Training Metrics Image**: Below is a snapshot of the training loss curve visualized.
35
 
36
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65713d70f56f9538679e5a56/etosaWAQ8TASXOEUADGpi.png)
37
 
38
  **MMLU**:
39
 
 
63
  | **Epoch** | 1 |
64
  | **Global batch size** | 480 |
65
  | **Learning Rate** | 2e-4 |
66
+ | **Learning Scheduler** | LambdaLR with warmup |
67
  | **Optimizer** | AdamW fused |
68
+ | **Warmup Steps** | 80 |
69
  | **Weight Decay** | 0.01 |
70
  | **Max Sequence Length** | 512 |
71