K2-Spike-2 / README.md
victormiller's picture
Update README.md
7b8a52a verified
|
raw
history blame
1.98 kB
metadata
license: apache-2.0

LLM360 Research Suite: K2 Loss Spike 2

We encountered two major loss spikes while training K2.

  • The first loss spike occured after X checkpoints and lasted over ~34 checkpoints. We restarted training at checkpoint X and training returned to normal.
  • The second loss spike occured after restarting training to fix the first loss spike at checkpoint X and lasted from ~8 checkpoints.

We are releasing these checkpoints so others can study this interesting phenomena in large model training. k2 spike 1

Purpose

Loss spikes are still a relatively unknown phenomena. By making these spikes and associated training details available, we hope others use these artifacts to further the worlds knowledge on this topic.

All Checkpoints

[to find all branches: git branch -a]

Loss Spike's on the LLM360 Evaluation Suite

something here

About the LLM360 Research Suite

The LLM360 Research Suite is a comprehensive set of large language model (LLM) artifacts from Amber, CrystalCoder, and K2 for academic and industry researchers to explore LLM training dynamics. Additional resources can be found at llm360.ai.