victormiller commited on
Commit
32b8445
1 Parent(s): 74bef6c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -5,8 +5,11 @@ license: apache-2.0
5
  license: apache-2.0
6
  ---
7
  # LLM360 Research Suite: K2 Loss Spike 2
8
- During the first K2 training phase, we encountered two loss spikes. This repo contains 8 checkpoints that capture the training dynamics during the loss spikes.
 
 
9
 
 
10
  <img src="k2_spike_1.png" alt="k2 spike 1"/>
11
 
12
  # Purpose
 
5
  license: apache-2.0
6
  ---
7
  # LLM360 Research Suite: K2 Loss Spike 2
8
+ We encountered two major loss spikes while training K2.
9
+ * The [first loss spike](https://huggingface.co/LLM360/K2-Spike-1/) occured after X checkpoints and lasted over ~34 checkpoints. We restarted training at checkpoint X and training returned to normal.
10
+ * The second loss spike occured after restarting training to fix the first loss spike at checkpoint X and lasted from ~8 checkpoints.
11
 
12
+ We are releasing these checkpoints so others can study this interesting phenomena in large model training.
13
  <img src="k2_spike_1.png" alt="k2 spike 1"/>
14
 
15
  # Purpose