Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2509.02479

about 2 hours ago

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published 4 days ago • 76
scikit-learn/sklearn-transformers

Text Classification • Updated Mar 16, 2023 • 25
keras-io/swin-transformers

Image Classification • Updated Jul 9, 2024 • 14 • 4
keras-io/structured-data-classification-grn-vsn

Tabular Classification • Updated Jul 9, 2024 • 43 • 8

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published 4 days ago • 76

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published 4 days ago • 143
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published 4 days ago • 76
POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion

Paper • 2509.01215 • Published 5 days ago • 42
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

Paper • 2509.00676 • Published 6 days ago • 74

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Paper • 2508.21113 • Published 9 days ago • 103
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

Paper • 2508.16949 • Published 14 days ago • 22
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

Paper • 2508.21112 • Published 9 days ago • 72
UItron: Foundational GUI Agent with Advanced Perception and Planning

Paper • 2508.21767 • Published 8 days ago • 12

Bugai's Collection

about 15 hours ago

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published 9 days ago • 85
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published 13 days ago • 77
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published 11 days ago • 39
VibeVoice Technical Report

Paper • 2508.19205 • Published 11 days ago • 120

LLM - Agentic RL

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published 4 days ago • 143
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Paper • 2509.01055 • Published 5 days ago • 59
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published 4 days ago • 76

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

Paper • 2509.02544 • Published 4 days ago • 100
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published 4 days ago • 76
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published 4 days ago • 143

Interessting papers

about 1 hour ago

PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

Paper • 2508.21104 • Published 9 days ago • 28
FNet: Mixing Tokens with Fourier Transforms

Paper • 2105.03824 • Published May 9, 2021 • 1
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published 4 days ago • 76
RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28

about 2 hours ago

Provable Benefits of In-Tool Learning for Large Language Models

Paper • 2508.20755 • Published 9 days ago • 9
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published 4 days ago • 76
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench

Paper • 2508.20931 • Published 9 days ago • 15

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published 24 days ago • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published 23 days ago • 17
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published 18 days ago • 46

about 2 hours ago

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published 4 days ago • 76
scikit-learn/sklearn-transformers

Text Classification • Updated Mar 16, 2023 • 25
keras-io/swin-transformers

Image Classification • Updated Jul 9, 2024 • 14 • 4
keras-io/structured-data-classification-grn-vsn

Tabular Classification • Updated Jul 9, 2024 • 43 • 8

LLM - Agentic RL

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published 4 days ago • 143
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Paper • 2509.01055 • Published 5 days ago • 59
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published 4 days ago • 76

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published 4 days ago • 76

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

Paper • 2509.02544 • Published 4 days ago • 100
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published 4 days ago • 76
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published 4 days ago • 143

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published 4 days ago • 143
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published 4 days ago • 76
POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion

Paper • 2509.01215 • Published 5 days ago • 42
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

Paper • 2509.00676 • Published 6 days ago • 74

Interessting papers

about 1 hour ago

PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

Paper • 2508.21104 • Published 9 days ago • 28
FNet: Mixing Tokens with Fourier Transforms

Paper • 2105.03824 • Published May 9, 2021 • 1
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published 4 days ago • 76
RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Paper • 2508.21113 • Published 9 days ago • 103
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

Paper • 2508.16949 • Published 14 days ago • 22
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

Paper • 2508.21112 • Published 9 days ago • 72
UItron: Foundational GUI Agent with Advanced Perception and Planning

Paper • 2508.21767 • Published 8 days ago • 12

about 2 hours ago

Provable Benefits of In-Tool Learning for Large Language Models

Paper • 2508.20755 • Published 9 days ago • 9
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published 4 days ago • 76
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench

Paper • 2508.20931 • Published 9 days ago • 15

Bugai's Collection

about 15 hours ago

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published 9 days ago • 85
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published 13 days ago • 77
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published 11 days ago • 39
VibeVoice Technical Report

Paper • 2508.19205 • Published 11 days ago • 120

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published 24 days ago • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published 23 days ago • 17
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published 18 days ago • 46

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets OCR模型免费转Markdown Pricing 模型下载攻略