NVIDIA Nemotron Collection Open, Production-ready Enterprise Models. Nvidia Open Model license. • 3 items • Updated 2 days ago • 39
👁️ LFM2-VL Collection LFM2-VL is our first series of vision-language models, designed for on-device deployment. • 6 items • Updated 1 day ago • 31
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Paper • 2406.01574 • Published Jun 3, 2024 • 51
view article Article LLM agent experiment with a purpose-built RPG and tool calls. (Work in progress) By neph1 • 15 days ago • 7
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face By abidlabs and 4 others • 23 days ago • 156
view article Article AutoBench Run 2 Results are Out! Surprise: Gemini 2.5 Pro is not the Best Affordable Thinking Model By PeterKruger • Apr 29 • 6
JSON Mode Reasoning Collection A collection of structured outputs reasoning dataset • 3 items • Updated 29 days ago • 3
Tool Use Reasoning Collection A collection of tool use reasoning dataset in Hermes format • 5 items • Updated 29 days ago • 8
view article Article <p style="text-align:center;"> Bourbaki (7b): SOTA 7B Algorithms for Putnam Bench (Part I: Reasoning MDPs)</p> By hba123 and 2 others • Jul 13 • 11
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • Jul 8 • 631
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging Paper • 2410.01215 • Published Oct 2, 2024 • 36