victormiller
commited on
Commit
•
32b8445
1
Parent(s):
74bef6c
Update README.md
Browse files
README.md
CHANGED
@@ -5,8 +5,11 @@ license: apache-2.0
|
|
5 |
license: apache-2.0
|
6 |
---
|
7 |
# LLM360 Research Suite: K2 Loss Spike 2
|
8 |
-
|
|
|
|
|
9 |
|
|
|
10 |
<img src="k2_spike_1.png" alt="k2 spike 1"/>
|
11 |
|
12 |
# Purpose
|
|
|
5 |
license: apache-2.0
|
6 |
---
|
7 |
# LLM360 Research Suite: K2 Loss Spike 2
|
8 |
+
We encountered two major loss spikes while training K2.
|
9 |
+
* The [first loss spike](https://huggingface.co/LLM360/K2-Spike-1/) occured after X checkpoints and lasted over ~34 checkpoints. We restarted training at checkpoint X and training returned to normal.
|
10 |
+
* The second loss spike occured after restarting training to fix the first loss spike at checkpoint X and lasted from ~8 checkpoints.
|
11 |
|
12 |
+
We are releasing these checkpoints so others can study this interesting phenomena in large model training.
|
13 |
<img src="k2_spike_1.png" alt="k2 spike 1"/>
|
14 |
|
15 |
# Purpose
|