Does Your Reasoning Model Implicitly Know When to Stop Thinking? Paper • 2602.08354 • Published 14 days ago • 77
STATe-of-Thoughts: Structured Action Templates for Tree-of-Thoughts Paper • 2602.14265 • Published 8 days ago • 20
jina-embeddings-v5-text: Task-Targeted Embedding Distillation Paper • 2602.15547 • Published 6 days ago • 20
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published 10 days ago • 50
Multi-agent cooperation through in-context co-player inference Paper • 2602.16301 • Published 5 days ago • 15
Discovering Multiagent Learning Algorithms with Large Language Models Paper • 2602.16928 • Published 5 days ago • 9
DISPO: Enhancing Training Efficiency and Stability in Reinforcement Learning for Large Language Model Mathematical Reasoning Paper • 2602.00983 • Published 22 days ago • 1
R^3L: Reflect-then-Retry Reinforcement Learning with Language-Guided Exploration, Pivotal Credit, and Positive Amplification Paper • 2601.03715 • Published Jan 7 • 2
R^3: Replay, Reflection, and Ranking Rewards for LLM Reinforcement Learning Paper • 2601.19620 • Published 27 days ago • 1
Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models Paper • 2602.10224 • Published 13 days ago • 19
CLEANER: Self-Purified Trajectories Boost Agentic Reinforcement Learning Paper • 2601.15141 • Published Jan 21 • 1
STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens Paper • 2602.15620 • Published 6 days ago • 3
REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents Paper • 2602.14234 • Published 8 days ago • 21
Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO Paper • 2602.06422 • Published 17 days ago • 44