English
loubnabnl HF staff commited on
Commit
1d4b538
Β·
verified Β·
1 Parent(s): 8370447

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -5,10 +5,12 @@ language:
5
  ---
6
  # SmolLM2 nanotron checkpoints
7
 
 
8
  Here you can find the [nanotron](https://github.com/huggingface/nanotron/) checkpoints of [SmolLM2](https://github.com/huggingface/smollm/) 1.7B, 360M and 135M models, with their optimizer states. The goal is to facilitate continual-pre-training of these models with nanotron.
9
 
10
  For each model size, we release both the final checkpoint and the pre-decay checkpoint. The models were trained using the Warmup-Stable-Decay (WSD) scheduler, so one can take the pre-decay checkpoint and continue training using the same stable learning rate value before performing the decay. For more details on this scheduler, you can check this [paper](https://arxiv.org/abs/2405.18392).
11
 
 
12
  ```
13
  β”œβ”€β”€ 135M
14
  β”‚ β”œβ”€β”€ final
@@ -27,6 +29,8 @@ For each model size, we release both the final checkpoint and the pre-decay chec
27
  β”œβ”€β”€ config.yaml πŸ“„
28
  └── model_config.json πŸ“„
29
  ```
 
 
30
  To download only one folder, e.g the final checkpoint of the 135M model, you can use `huggingface-cli`
31
 
32
  ```bash
 
5
  ---
6
  # SmolLM2 nanotron checkpoints
7
 
8
+ ## Description
9
  Here you can find the [nanotron](https://github.com/huggingface/nanotron/) checkpoints of [SmolLM2](https://github.com/huggingface/smollm/) 1.7B, 360M and 135M models, with their optimizer states. The goal is to facilitate continual-pre-training of these models with nanotron.
10
 
11
  For each model size, we release both the final checkpoint and the pre-decay checkpoint. The models were trained using the Warmup-Stable-Decay (WSD) scheduler, so one can take the pre-decay checkpoint and continue training using the same stable learning rate value before performing the decay. For more details on this scheduler, you can check this [paper](https://arxiv.org/abs/2405.18392).
12
 
13
+ Below is the repo structure:
14
  ```
15
  β”œβ”€β”€ 135M
16
  β”‚ β”œβ”€β”€ final
 
29
  β”œβ”€β”€ config.yaml πŸ“„
30
  └── model_config.json πŸ“„
31
  ```
32
+
33
+ ## Download and training
34
  To download only one folder, e.g the final checkpoint of the 135M model, you can use `huggingface-cli`
35
 
36
  ```bash