Work in progress! This model has been trained on about 15% of Swedish Fineweb-2 so far. It is intended for my research and has not been evaluated more broadly yet.

Training parameters:

Learning rate: 5e-4
LR scheduler: Cosine
Warmup ratio: 0.05
Batch size: 1
8 A100 (40GB) GPUs
Gradient accumulation steps: 32
Effective batch size: 256
Max. context length: 8192 tokens

Downloads last month: 31

Safetensors

Model size

362M params

Tensor type

F32

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

Model tree for jekunz/smollm-360m-cpt-fineweb-swedish

Base model

HuggingFaceTB/SmolLM2-360M-Instruct

Finetuned

(40)

this model

Dataset used to train jekunz/smollm-360m-cpt-fineweb-swedish

Collection including jekunz/smollm-360m-cpt-fineweb-swedish

SmolLM CPT

Collection

Continued Pre-Training of SmolLM models on the Fineweb-2 portions of Scandinavian languages. • 9 items • Updated 5 days ago