SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets β’ 9 items β’ Updated 2 days ago β’ 46
view article Article Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita π₯ 5 days ago β’ 87
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub 11 days ago β’ 48
Reasoning Datasets Collection Distilled synthetic Reasoning datasets β’ 7 items β’ Updated 21 days ago β’ 54
view article Article Mastering Long Contexts in LLMs with KVPress By nvidia and 1 other β’ Jan 23 β’ 63
view article Article Explore, Curate and Vector Search Any Hugging Face Dataset with Nomic Atlas By MaxNomic and 4 others β’ about 1 month ago β’ 30
Towards Best Practices for Open Datasets for LLM Training Paper β’ 2501.08365 β’ Published Jan 14 β’ 55
high-quality Chinese training datasets Collection a suite of high-quality Chinese datasets, used for pretraining, fine-tuning or preference alignment. And the models trained on these datasets. β’ 12 items β’ Updated Jan 17 β’ 11
view article Article Synthetic Data Generation with FastData and Hugging Face By asoria β’ Jan 7 β’ 14
Reasoning Datasets Collection Reasoning datasets that are trending π₯ β’ 10 items β’ Updated Jan 3 β’ 24
view article Article Finding Moroccan Arabic (Darija) in Fineweb 2 By omarkamali and 3 others β’ Dec 8, 2024 β’ 22
view article Article Bridging the Gap Between Physical Numerical Simulations and Machine Learning: Introducing The Well By rubenohana β’ Dec 2, 2024 β’ 17
OLMo 2 Collection Artifacts for the second set of OLMo models. β’ 22 items β’ Updated 12 days ago β’ 83