Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
HuggingFaceTB 's Collections
🧠 SmolLM3
SmolLM3 pretraining datasets
SmolLM3 evaluation datasets
Dolma LongAttn Graded
Reasoning datasets
SmolLM2
SmolVLM2 📺 Smallest video LM ever 🤏🏻
📚 LLM pretraining datasets
SmolVLM
🧩 SmolLM2 Intermediate Checkpoints
The Ultimate Collection of Code Classifiers
SmolVLM 256M & 500M
📐 FineMath
💻 Local SmolLMs
🪐 SmolLM
Instruct datasets
🌌 Cosmopedia
Find textbooks in FineWeb with a classifier
FineWeb clustering & synthetic generations
Other: Stanford, OpenStax, khanAcademy, wikihow...
FW generation prompts
Wikipedia Science topics
Wikipedia textbooks
SFT Experiments
Decay mixture experiments

🌌 Cosmopedia

updated May 5

Resources for Cosmopedia dataset

Upvote
9

  • HuggingFaceTB/cosmopedia

    Viewer • Updated Aug 12, 2024 • 31.1M • 5k • 634

  • HuggingFaceTB/cosmo-1b

    Text Generation • 2B • Updated Jul 8, 2024 • 408 • 132

  • Running
    6
    6

    Web clusters

    🕸

    Browse and explore clustered web samples by educational value


  • HuggingFaceTB/cosmopedia-100k

    Viewer • Updated Feb 19, 2024 • 100k • 447 • 45

  • HuggingFaceTB/cosmopedia-meta

    Viewer • Updated Feb 20, 2024 • 31.1M • 26 • 2

  • HuggingFaceTB/smollm-corpus

    Viewer • Updated Sep 6, 2024 • 237M • 14.3k • 368
Upvote
9
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets OCR模型免费转Markdown Pricing 模型下载攻略