-
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning
Paper • 2508.21104 • Published • 28 -
T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables
Paper • 2508.19813 • Published • 20 -
No Label Left Behind: A Unified Surface Defect Detection Model for all Supervision Regimes
Paper • 2508.19060 • Published • 8 -
From reactive to cognitive: brain-inspired spatial intelligence for embodied agents
Paper • 2508.17198 • Published • 6
Collections
Discover the best community collections!
Collections including paper arxiv:2508.20931
-
Provable Benefits of In-Tool Learning for Large Language Models
Paper • 2508.20755 • Published • 9 -
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
Paper • 2509.02479 • Published • 76 -
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench
Paper • 2508.20931 • Published • 15
-
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 66 -
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning
Paper • 2508.03501 • Published • 55 -
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
Paper • 2508.04700 • Published • 51 -
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems
Paper • 2508.01415 • Published • 7
-
Provable Benefits of In-Tool Learning for Large Language Models
Paper • 2508.20755 • Published • 9 -
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
Paper • 2508.20453 • Published • 56 -
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench
Paper • 2508.20931 • Published • 15
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 275 • 95 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 35 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 98 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 89
-
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning
Paper • 2508.21104 • Published • 28 -
T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables
Paper • 2508.19813 • Published • 20 -
No Label Left Behind: A Unified Surface Defect Detection Model for all Supervision Regimes
Paper • 2508.19060 • Published • 8 -
From reactive to cognitive: brain-inspired spatial intelligence for embodied agents
Paper • 2508.17198 • Published • 6
-
Provable Benefits of In-Tool Learning for Large Language Models
Paper • 2508.20755 • Published • 9 -
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
Paper • 2509.02479 • Published • 76 -
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench
Paper • 2508.20931 • Published • 15
-
Provable Benefits of In-Tool Learning for Large Language Models
Paper • 2508.20755 • Published • 9 -
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
Paper • 2508.20453 • Published • 56 -
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench
Paper • 2508.20931 • Published • 15
-
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 66 -
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning
Paper • 2508.03501 • Published • 55 -
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
Paper • 2508.04700 • Published • 51 -
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems
Paper • 2508.01415 • Published • 7
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 275 • 95 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 35 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 98 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 89