DeepMedix-R1 Collection Chest X-ray foundation model with step reasoning. • 2 items • Updated Jul 14 • 4
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning Paper • 2508.20096 • Published 11 days ago • 35
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published 19 days ago • 117
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization Paper • 2508.14460 • Published 19 days ago • 80
CodeEvo: Interaction-Driven Synthesis of Code-centric Data through Hybrid and Iterative Feedback Paper • 2507.22080 • Published Jul 25 • 9
Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters Paper • 2507.13618 • Published Jul 18 • 16
Decoding Algorithm for LLM Reasoning Collection Collections of Decoding Algorithm for LLM Reasoning • 2 items • Updated Jul 25 • 1
Decoding Algorithm for LLM Reasoning Collection Collections of Decoding Algorithm for LLM Reasoning • 2 items • Updated Jul 25 • 1
Decoding Algorithm for LLM Reasoning Collection Collections of Decoding Algorithm for LLM Reasoning • 2 items • Updated Jul 25 • 1
MUR: Momentum Uncertainty guided Reasoning for Large Language Models Paper • 2507.14958 • Published Jul 20 • 46 • 3
MUR: Momentum Uncertainty guided Reasoning for Large Language Models Paper • 2507.14958 • Published Jul 20 • 46
From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios Paper • 2506.20279 • Published Jun 25 • 19
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning Paper • 2506.01713 • Published Jun 2 • 47
A Controllable Examination for Long-Context Language Models Paper • 2506.02921 • Published Jun 3 • 33