--- license: apache-2.0 language: - en --- # SmolLM2 nanotron checkpoints ## Description Here you can find the [nanotron](https://github.com/huggingface/nanotron/) checkpoints of [SmolLM2](https://github.com/huggingface/smollm/) 1.7B, 360M and 135M models, with their optimizer states. The goal is to facilitate continual-pre-training of these models with nanotron. For each model size, we release both the final checkpoint and the pre-decay checkpoint. The models were trained using the Warmup-Stable-Decay (WSD) scheduler, so one can take the pre-decay checkpoint and continue training using the same stable learning rate value before performing the decay. For more details on this scheduler, you can check this [paper](https://arxiv.org/abs/2405.18392). Below is the repo structure: ``` ├── 135M │ ├── final │ └── pre-decay ├── 1700M │ ├── final │ └── pre-decay └── 360M ├── final └── pre-decay ├── lr_scheduler 📁 ├── model 📁 ├── optimizer 📁 └── random 📁 ├── checkpoint_metadata.json 📄 ├── config.yaml 📄 └── model_config.json 📄 ``` ## Download and training To download only one folder, e.g the final checkpoint of the 135M model, you can use `huggingface-cli` ```bash # pip install -U "huggingface_hub[cli]" huggingface-cli download HuggingFaceTB/SmolLM2-nanotron-ckpt --include "135M/final/*" --local-dir ./ ``` For details on launching SmolLM trainings with nanotron, refer to: https://github.com/huggingface/smollm/tree/main/pre-training