Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
HuggingFaceTB
's Collections
🧠 SmolLM3
SmolLM3 pretraining datasets
SmolLM3 evaluation datasets
Dolma LongAttn Graded
Reasoning datasets
SmolLM2
SmolVLM2 📺 Smallest video LM ever 🤏🏻
📚 LLM pretraining datasets
SmolVLM
🧩 SmolLM2 Intermediate Checkpoints
The Ultimate Collection of Code Classifiers
SmolVLM 256M & 500M
📐 FineMath
💻 Local SmolLMs
🪐 SmolLM
Instruct datasets
🌌 Cosmopedia
Find textbooks in FineWeb with a classifier
FineWeb clustering & synthetic generations
Other: Stanford, OpenStax, khanAcademy, wikihow...
FW generation prompts
Wikipedia Science topics
Wikipedia textbooks
SFT Experiments
Decay mixture experiments
🌌 Cosmopedia
updated
May 5
Resources for Cosmopedia dataset
Upvote
9
HuggingFaceTB/cosmopedia
Viewer
•
Updated
Aug 12, 2024
•
31.1M
•
5k
•
634
HuggingFaceTB/cosmo-1b
Text Generation
•
2B
•
Updated
Jul 8, 2024
•
408
•
132
Running
6
6
Web clusters
🕸
Browse and explore clustered web samples by educational value
HuggingFaceTB/cosmopedia-100k
Viewer
•
Updated
Feb 19, 2024
•
100k
•
447
•
45
HuggingFaceTB/cosmopedia-meta
Viewer
•
Updated
Feb 20, 2024
•
31.1M
•
26
•
2
HuggingFaceTB/smollm-corpus
Viewer
•
Updated
Sep 6, 2024
•
237M
•
14.3k
•
368
Upvote
9
+5
Share collection
View history
Collection guide
Browse collections