MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents Paper • 2508.13186 • Published 7 days ago • 15
RAVine: Reality-Aligned Evaluation for Agentic Search Paper • 2507.16725 • Published 30 days ago • 28
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning Paper • 2507.16812 • Published 30 days ago • 61
AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training Paper • 2507.01663 • Published Jul 2 • 5
Kimina Prover Preview Collection State-of-the-Art Models for Formal Mathematical Reasoning • 5 items • Updated Apr 28 • 33
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Paper • 2507.00432 • Published Jul 1 • 73
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning Paper • 2506.09985 • Published Jun 11 • 30