Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
BramVanroy 's Collections
CommonCrawl-Creative Commons (C5)
Fietje 2
🐐 GEITje 7B ultra 🤖
SFT & RL datasets for Dutch
Dutch Simplification
Multilingual text-to-AMR
Leesplank 2023-2024
Llama 2 & Falcon finetunes
BLEURT

CommonCrawl-Creative Commons (C5)

updated 6 days ago

Raw CommonCrawl crawls, annotated with Creative Commons license information

Upvote
-

  • BramVanroy/CommonCrawl-CreativeCommons

    Viewer • Updated 7 days ago • 739M • 656 • 31

  • BramVanroy/CommonCrawl-CreativeCommons-fine

    Viewer • Updated 7 days ago • 75.1M • 75 • 1

    Note Only retaining samples that are also present in FineWeb or FineWeb-2


  • BramVanroy/CommonCrawl-CreativeCommons-recommended

    Viewer • Updated 7 days ago • 32.8M • 146 • 1

    Note Strong filters, only retaining FineWeb data, removing non-commercial data, removing Wiki data

Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets OCR模型免费转Markdown Pricing 模型下载攻略