The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published 3 days ago • 139
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published 3 days ago • 76
POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion Paper • 2509.01215 • Published 5 days ago • 42
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model Paper • 2509.00676 • Published 6 days ago • 73
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Paper • 2509.02544 • Published 3 days ago • 98
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published 5 days ago • 59
Baichuan-M2: Scaling Medical Capability with Large Verifier System Paper • 2509.02208 • Published 4 days ago • 32
Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR Paper • 2509.02522 • Published 3 days ago • 22
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic Paper • 2509.01363 • Published 5 days ago • 27
Jointly Reinforcing Diversity and Quality in Language Model Generations Paper • 2509.02534 • Published 3 days ago • 22
GenCompositor: Generative Video Compositing with Diffusion Transformer Paper • 2509.02460 • Published 4 days ago • 20
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning Paper • 2509.01644 • Published 4 days ago • 24
Attributes as Textual Genes: Leveraging LLMs as Genetic Algorithm Simulators for Conditional Synthetic Data Generation Paper • 2509.02040 • Published 4 days ago • 13
M3Ret: Unleashing Zero-shot Multimodal Medical Image Retrieval via Self-Supervision Paper • 2509.01360 • Published 5 days ago • 11
FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games Paper • 2509.01052 • Published 5 days ago • 18
Universal Deep Research: Bring Your Own Model and Strategy Paper • 2509.00244 • Published 7 days ago • 10
Discrete Noise Inversion for Next-scale Autoregressive Text-based Image Editing Paper • 2509.01984 • Published 4 days ago • 4
Fantastic Pretraining Optimizers and Where to Find Them Paper • 2509.02046 • Published 4 days ago • 10
MedDINOv3: How to adapt vision foundation models for medical image segmentation? Paper • 2509.02379 • Published 4 days ago • 2
Improving Large Vision and Language Models by Learning from a Panel of Peers Paper • 2509.01610 • Published 4 days ago • 2
Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views Paper • 2509.01250 • Published 5 days ago • 1
SQL-of-Thought: Multi-agentic Text-to-SQL with Guided Error Correction Paper • 2509.00581 • Published 6 days ago • 3
C-DiffDet+: Fusing Global Scene Context with Generative Denoising for High-Fidelity Object Detection Paper • 2509.00578 • Published 6 days ago • 1
Metis: Training Large Language Models with Advanced Low-Bit Quantization Paper • 2509.00404 • Published 7 days ago • 3
FastFit: Accelerating Multi-Reference Virtual Try-On via Cacheable Diffusion Models Paper • 2508.20586 • Published 9 days ago • 2