📐 FineMath - a HuggingFaceTB Collection

HuggingFaceTB 's Collections

SmolLM3 pretraining datasets

SmolLM3 evaluation datasets

Dolma LongAttn Graded

Reasoning datasets

SmolVLM2 📺 Smallest video LM ever 🤏🏻

📚 LLM pretraining datasets

🧩 SmolLM2 Intermediate Checkpoints

The Ultimate Collection of Code Classifiers

SmolVLM 256M & 500M

💻 Local SmolLMs

Instruct datasets

🌌 Cosmopedia

Find textbooks in FineWeb with a classifier

FineWeb clustering & synthetic generations

Other: Stanford, OpenStax, khanAcademy, wikihow...

FW generation prompts

Wikipedia Science topics

Wikipedia textbooks

SFT Experiments

Decay mixture experiments

📐 FineMath

updated May 5, 2025

FineMath datasets and ablation models

HuggingFaceTB/finemath

Viewer • Updated Feb 6, 2025 • 48.3M • 15k • 358

Note FineMath datasets
HuggingFaceTB/FineMath-Llama-3B

3B • Updated Nov 27, 2025 • 95 • 22

Note Llama 3B trained on a mix of FineMath and FineWeb-Edu: better at math and similar to Llama in reasoning, knowledge and common sense
HuggingFaceTB/finemath-classifier

Text Classification • 0.1B • Updated Dec 19, 2024 • 1.11k • 13

Note FineMath text classifier to score the mathematical reasoning and educational content
HuggingFaceTB/finemath-ablation-finemath-4plus

3B • Updated Dec 19, 2024 • 26 • 1
HuggingFaceTB/finemath-ablation-finemath-3plus

3B • Updated Dec 19, 2024 • 5
HuggingFaceTB/finemath-ablation-infiwebmath-4plus

3B • Updated Dec 19, 2024 • 17 • 2
HuggingFaceTB/finemath-ablation-infiwebmath-3plus

3B • Updated Dec 19, 2024 • 5

Note Ablations on FineMath subsets (continual pre-training of base Llama 3.2 3B on 60B tokens)
HuggingFaceTB/finemath-ablation-finemath-infimath-3plus

3B • Updated Dec 19, 2024 • 15
HuggingFaceTB/finemath-ablation-finemath-infimath-4plus

3B • Updated Dec 19, 2024 • 16 • 2

Note Ablations on FineMath plus3 and plus4 (continual pre-training of base Llama 3.2 3B on 60B tokens)
HuggingFaceTB/finemath-ablation-fwedu

3B • Updated Dec 19, 2024 • 13
HuggingFaceTB/finemath-ablation-infiwebmath

3B • Updated Dec 19, 2024 • 7
HuggingFaceTB/finemath-ablation-owm

3B • Updated Dec 19, 2024 • 9

Note Ablations on public math datasets and FW-Edu as a baseline (continual pre-training of base Llama 3.2 3B on 60B tokens)
HuggingFaceTB/finemath-ablation-3plus-160B

3B • Updated Dec 19, 2024 • 14
HuggingFaceTB/finemath-ablation-4plus-160B

3B • Updated Dec 19, 2024 • 17

Note Longer ablation for 160B on a mix of 40% fineweb-edu 60% FineMath and Infiwebmath 3plus / 4plus