BramVanroy/CommonCrawl-CreativeCommons
Viewer
•
Updated
•
739M
•
656
•
31
Raw CommonCrawl crawls, annotated with Creative Commons license information
Note Only retaining samples that are also present in FineWeb or FineWeb-2
Note Strong filters, only retaining FineWeb data, removing non-commercial data, removing Wiki data