Towards a Unified View of Large Language Model Post-Training Paper • 2509.04419 • Published 1 day ago • 45
Benchmarking Optimizers for Large Language Model Pretraining Paper • 2509.01440 • Published 5 days ago • 20
Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR Paper • 2509.02522 • Published 4 days ago • 22
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic Paper • 2509.01363 • Published 5 days ago • 27
Jointly Reinforcing Diversity and Quality in Language Model Generations Paper • 2509.02534 • Published 4 days ago • 22
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published 4 days ago • 76
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published 4 days ago • 141
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning Paper • 2508.21104 • Published 9 days ago • 27
StepWiser: Stepwise Generative Judges for Wiser Reasoning Paper • 2508.19229 • Published 11 days ago • 19
Diffusion Language Models Know the Answer Before Decoding Paper • 2508.19982 • Published 10 days ago • 22
Predicting the Order of Upcoming Tokens Improves Language Modeling Paper • 2508.19228 • Published 11 days ago • 20
Beyond Transcription: Mechanistic Interpretability in ASR Paper • 2508.15882 • Published 16 days ago • 83
Provable Benefits of In-Tool Learning for Large Language Models Paper • 2508.20755 • Published 9 days ago • 9
TCIA: A Task-Centric Instruction Augmentation Method for Instruction Finetuning Paper • 2508.20374 • Published 9 days ago • 21
AWorld: Orchestrating the Training Recipe for Agentic AI Paper • 2508.20404 • Published 9 days ago • 37
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers Paper • 2508.20453 • Published 9 days ago • 56
USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning Paper • 2508.18966 • Published 11 days ago • 55