The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published 4 days ago • 143
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Paper • 2509.02544 • Published 4 days ago • 100
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published 12 days ago • 179
Enhancing Vision-Language Model Training with Reinforcement Learning in Synthetic Worlds for Real-World Success Paper • 2508.04280 • Published Aug 6 • 35
Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models Paper • 2507.12566 • Published Jul 16 • 14
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 234
ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking Paper • 2505.08581 • Published May 13 • 9
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding Paper • 2504.10465 • Published Apr 14 • 27
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 284
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31 • 302