SmolLM2 nanotron checkpoints

Description

Here you can find the nanotron checkpoints of SmolLM2 1.7B, 360M and 135M models, with their optimizer states. The goal is to facilitate continual-pre-training of these models with nanotron.

For each model size, we release both the final checkpoint and the pre-decay checkpoint. The models were trained using the Warmup-Stable-Decay (WSD) scheduler, so one can take the pre-decay checkpoint and continue training using the same stable learning rate value before performing the decay. For more details on this scheduler, you can check this paper.

Below is the repo structure:

├── 135M
│   ├── final
│   └── pre-decay
├── 1700M
│   ├── final
│   └── pre-decay
└── 360M
    ├── final
    └── pre-decay
        ├── lr_scheduler 📁
        ├── model 📁
        ├── optimizer 📁
        └── random 📁
        ├── checkpoint_metadata.json 📄
        ├── config.yaml 📄
        └── model_config.json 📄

Download and training

To download only one folder, e.g the final checkpoint of the 135M model, you can use huggingface-cli

# pip install -U "huggingface_hub[cli]" 
huggingface-cli download HuggingFaceTB/SmolLM2-nanotron-ckpt --include "135M/final/*" --local-dir ./

For details on launching SmolLM trainings with nanotron, refer to: https://github.com/huggingface/smollm/tree/main/pre-training