Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2508.20931

PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

Paper • 2508.21104 • Published 9 days ago • 28
T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables

Paper • 2508.19813 • Published 10 days ago • 20
No Label Left Behind: A Unified Surface Defect Detection Model for all Supervision Regimes

Paper • 2508.19060 • Published 11 days ago • 8
From reactive to cognitive: brain-inspired spatial intelligence for embodied agents

Paper • 2508.17198 • Published 13 days ago • 6

about 2 hours ago

Provable Benefits of In-Tool Learning for Large Language Models

Paper • 2508.20755 • Published 9 days ago • 9
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published 4 days ago • 76
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench

Paper • 2508.20931 • Published 9 days ago • 15

about 12 hours ago

Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5 • 66
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning

Paper • 2508.03501 • Published Aug 5 • 55
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

Paper • 2508.04700 • Published about 1 month ago • 51
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems

Paper • 2508.01415 • Published Aug 2 • 7

How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench

Paper • 2508.20931 • Published 9 days ago • 15

about 24 hours ago

Provable Benefits of In-Tool Learning for Large Language Models

Paper • 2508.20755 • Published 9 days ago • 9
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

Paper • 2508.20453 • Published 9 days ago • 56
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench

Paper • 2508.20931 • Published 9 days ago • 15

about 20 hours ago

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 275 • 95
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 35
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 98
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 89

PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

Paper • 2508.21104 • Published 9 days ago • 28
T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables

Paper • 2508.19813 • Published 10 days ago • 20
No Label Left Behind: A Unified Surface Defect Detection Model for all Supervision Regimes

Paper • 2508.19060 • Published 11 days ago • 8
From reactive to cognitive: brain-inspired spatial intelligence for embodied agents

Paper • 2508.17198 • Published 13 days ago • 6

How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench

Paper • 2508.20931 • Published 9 days ago • 15

about 2 hours ago

Provable Benefits of In-Tool Learning for Large Language Models

Paper • 2508.20755 • Published 9 days ago • 9
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published 4 days ago • 76
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench

Paper • 2508.20931 • Published 9 days ago • 15

about 24 hours ago

Provable Benefits of In-Tool Learning for Large Language Models

Paper • 2508.20755 • Published 9 days ago • 9
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

Paper • 2508.20453 • Published 9 days ago • 56
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench

Paper • 2508.20931 • Published 9 days ago • 15

about 12 hours ago

Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5 • 66
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning

Paper • 2508.03501 • Published Aug 5 • 55
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

Paper • 2508.04700 • Published about 1 month ago • 51
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems

Paper • 2508.01415 • Published Aug 2 • 7

about 20 hours ago

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 275 • 95
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 35
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 98
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 89

Company

TOS Privacy About Jobs

Website

Models Datasets OCR模型免费转Markdown Pricing 模型下载攻略