|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
# LLM360 Research Suite: K2 Loss Spike 2 |
|
We encountered two major loss spikes while training K2. |
|
* The [first loss spike](https://huggingface.co/LLM360/K2-Spike-1/) occured after X checkpoints and lasted over ~34 checkpoints. We restarted training at checkpoint X and training returned to normal. |
|
* The second loss spike occured after restarting training to fix the first loss spike at checkpoint X and lasted from ~8 checkpoints. |
|
|
|
We are releasing these checkpoints so others can study this interesting phenomena in large model training. |
|
<img src="k2_spike_1.png" alt="k2 spike 1"/> |
|
|
|
# Purpose |
|
Loss spikes are still a relatively unknown phenomena. By making these spikes and associated training details available, we hope others use these artifacts to further the worlds knowledge on this topic. |
|
|
|
## All Checkpoints |
|
| Checkpoints | | |
|
| ----------- | ----------- | |
|
| [Checkpoint 186](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_186) | [Checkpoint 194](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_194) | |
|
| [Checkpoint 188](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_188) | [Checkpoint 196](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_196) | |
|
| [Checkpoint 190](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_190) | [Checkpoint 198](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_198) | |
|
| [Checkpoint 192](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_192) | [Checkpoint 200](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_200) | |
|
|
|
|
|
[to find all branches: git branch -a] |
|
|
|
## Loss Spike's on the LLM360 Evaluation Suite |
|
|
|
something here |
|
|
|
## About the LLM360 Research Suite |
|
The LLM360 Research Suite is a comprehensive set of large language model (LLM) artifacts from Amber, CrystalCoder, and K2 for academic and industry researchers to explore LLM training dynamics. Additional resources can be found at llm360.ai. |
|
|
|
|
|
|