view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • 17 days ago • 42
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 12 items • Updated 4 days ago • 79
Reasoning Datasets Collection Distilled synthetic Reasoning datasets • 7 items • Updated 22 days ago • 55
DeepSeek R1 (All Versions) Collection DeepSeek R1 - the most powerful reasoning open-source model - available in GGUF, original & 4-bit formats. Includes Llama & Qwen distilled models. • 29 items • Updated 16 days ago • 195
gemini-2.0-flash-thinking-exp-1219 Datasets Collection Existing datasets with responses regenerated using gemini-2.0-flash-thinking-exp-1219. Currently only single-turn. • 15 items • Updated Jan 16 • 4
gemini-exp-1206 Datasets Collection Existing datasets with responses regenerated using gemini-exp-1206. Currently only single-turn. • 3 items • Updated Jan 16 • 1
story writing favourites Collection Models I personally liked for generating stories in the past. Not a recommendation, many of these are outdated. • 19 items • Updated 11 days ago • 42
LLMs - Best of 2025 Collection Most interesting LLMs to play around with in 2025! (will be updated throughout the year) • 19 items • Updated 9 days ago • 2
Reasoning Models Collection If this really help, please upvote for researchers' hardwork • 14 items • Updated Jan 21 • 1
CoT Datasets Collection If this really help, please upvote for researchers' hardwork • 15 items • Updated Jan 20 • 1
Model Merging Collection Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12, 2024 • 231