File size: 2,450 Bytes
751dbe2 7b8a52a 74bef6c 75a7453 d51728b 7520d45 74bef6c 32b8445 5299cb6 74bef6c 3e59d13 74bef6c 0315cf4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
---
license: apache-2.0
---
# LLM360 Research Suite: K2 Loss Spike 2
We encountered two major loss spikes while [training K2](https://huggingface.co/LLM360/K2).
* The [first loss spike](https://huggingface.co/LLM360/K2-Spike-1/) occured after 160 checkpoints and lasted over ~34 checkpoints. We restarted training at checkpoint 160 and training returned to normal.
* The second loss spike occured after restarting training to fix the first loss spike at checkpoint 186 and lasted from ~8 checkpoints.
* For every spike checkpoint, we also uploaded the corresponding normal checkpoint for easy comparison. You could find different checkpoints in different branches.
We are releasing these checkpoints so others can study this interesting phenomena in large model training.
<img src="loss_spike.png" alt="k2 loss spikes"/>
# Purpose
Loss spikes are still a relatively unknown phenomena. By making these spikes and associated training details available, we hope others use these artifacts to further the worlds knowledge on this topic.
## All Checkpoints
| Checkpoints | |
| ----------- | ----------- |
| [Checkpoint 186](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_186) | [Checkpoint 194](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_194) |
| [Checkpoint 188](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_188) | [Checkpoint 196](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_196) |
| [Checkpoint 190](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_190) | [Checkpoint 198](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_198) |
| [Checkpoint 192](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_192) | [Checkpoint 200](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_200) |
[to find all branches: git branch -a]
## Loss Spike's on the LLM360 Evaluation Suite
View all the evaluations on our [Weights & Biases here](https://wandb.ai/llm360/K2?nw=inng96ujjmr)
## About the LLM360 Research Suite
The LLM360 Research Suite is a comprehensive set of large language model (LLM) artifacts from Amber, CrystalCoder, and K2 for academic and industry researchers to explore LLM training dynamics. Additional resources can be found at llm360.ai.
## Citation
**BibTeX:**
```bibtex
@misc{
title={LLM360-K2-65B: Scaling Up Open and Transparent Language Models},
author={The LLM360 Team},
year={2024},
}
``` |