DeepMedix-R1 Collection Chest X-ray foundation model with step reasoning. • 2 items • Updated Jul 14 • 4
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning Paper • 2508.20096 • Published 11 days ago • 35
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published 19 days ago • 117
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization Paper • 2508.14460 • Published 18 days ago • 80
CodeEvo: Interaction-Driven Synthesis of Code-centric Data through Hybrid and Iterative Feedback Paper • 2507.22080 • Published Jul 25 • 9
Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters Paper • 2507.13618 • Published Jul 18 • 16
Decoding Algorithm for LLM Reasoning Collection Collections of Decoding Algorithm for LLM Reasoning • 2 items • Updated Jul 25 • 1
MUR: Momentum Uncertainty guided Reasoning for Large Language Models Paper • 2507.14958 • Published Jul 20 • 46
From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios Paper • 2506.20279 • Published Jun 25 • 19
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning Paper • 2506.01713 • Published Jun 2 • 47
A Controllable Examination for Long-Context Language Models Paper • 2506.02921 • Published Jun 3 • 33
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows Paper • 2505.19897 • Published May 26 • 104
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis Paper • 2505.13227 • Published May 19 • 46
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning Paper • 2504.08672 • Published Apr 11 • 55
Breaking the Data Barrier -- Building GUI Agents Through Task Generalization Paper • 2504.10127 • Published Apr 14 • 17
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization Paper • 2503.16874 • Published Mar 21 • 45
MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving Paper • 2503.16905 • Published Mar 21 • 55
φ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation Paper • 2503.13288 • Published Mar 17 • 52