HuggingFaceTB
/

SmolLM2-nanotron-ckpt

Model card Files Files and versions Community

loubnabnl HF staff commited on Dec 1, 2024

Commit

1d4b538

·

verified ·

1 Parent(s): 8370447

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -5,10 +5,12 @@ language:
 ---
 # SmolLM2 nanotron checkpoints
 Here you can find the [nanotron](https://github.com/huggingface/nanotron/) checkpoints of [SmolLM2](https://github.com/huggingface/smollm/) 1.7B, 360M and 135M models, with their optimizer states. The goal is to facilitate continual-pre-training of these models with nanotron.
 For each model size, we release both the final checkpoint and the pre-decay checkpoint. The models were trained using the Warmup-Stable-Decay (WSD) scheduler, so one can take the pre-decay checkpoint and continue training using the same stable learning rate value before performing the decay. For more details on this scheduler, you can check this [paper](https://arxiv.org/abs/2405.18392).
 ```
 ├── 135M
 │   ├── final
@@ -27,6 +29,8 @@ For each model size, we release both the final checkpoint and the pre-decay chec
         ├── config.yaml 📄
         └── model_config.json 📄
 ```
 To download only one folder, e.g the final checkpoint of the 135M model, you can use `huggingface-cli`
 ```bash

 ---
 # SmolLM2 nanotron checkpoints
+## Description
 Here you can find the [nanotron](https://github.com/huggingface/nanotron/) checkpoints of [SmolLM2](https://github.com/huggingface/smollm/) 1.7B, 360M and 135M models, with their optimizer states. The goal is to facilitate continual-pre-training of these models with nanotron.
 For each model size, we release both the final checkpoint and the pre-decay checkpoint. The models were trained using the Warmup-Stable-Decay (WSD) scheduler, so one can take the pre-decay checkpoint and continue training using the same stable learning rate value before performing the decay. For more details on this scheduler, you can check this [paper](https://arxiv.org/abs/2405.18392).
+Below is the repo structure:
 ```
 ├── 135M
 │   ├── final
         ├── config.yaml 📄
         └── model_config.json 📄
 ```
+## Download and training
 To download only one folder, e.g the final checkpoint of the 135M model, you can use `huggingface-cli`
 ```bash