view article Article Fine-tuning SmolLM with Group Relative Policy Optimization (GRPO) by following the Methodologies By prithivMLmods • 6 days ago • 16
view article Article Agentic RAG Stack (1/5) - Index and retrieve documents for vector search using Sentence Transformers and DuckDB By davidberenstein1957 • 27 days ago • 18
⛔️🔦 Provenance, Watermarking & Deepfake Detection Collection Technical tools for more control over non-consensual synthetic content • 14 items • Updated Apr 1, 2024 • 43
Synthetic Data Generator Collection A collection of tools and datasets related to no-code the Synthetic Data Generation. • 21 items • Updated 13 days ago • 7
Tulu 3 Datasets Collection All datasets released with Tulu 3 -- state of the art open post-training recipes. • 33 items • Updated 13 days ago • 70
view article Article Let’s make a generation of amazing image generation models By burtenshaw and 4 others • Nov 26, 2024 • 34
view article Article How to optimize your data labelling project with custom interfaces By burtenshaw and 9 others • Oct 16, 2024 • 18
view article Article 🔥 Argilla 2.0: the data-centric tool for AI makers 🤗 By dvilasuero • Jul 30, 2024 • 37
view article Article ⚗️ 🔥 Building High-Quality Datasets with distilabel and Prometheus 2 By burtenshaw • Jun 3, 2024 • 26
view article Article 🧑⚖️ "Replacing Judges with Juries" using distilabel By alvarobartt • May 3, 2024 • 17
view article Article ⚗️ 🧑🏼🌾 Let's grow some Domain Specific Datasets together By burtenshaw • Apr 29, 2024 • 29