MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources
Paper
•
2509.25531
•
Published
•
8
Totally Free + Zero Barriers + No Login Required