RL and Agents - a aishiknagar Collection

aishiknagar 's Collections

Predictive and Classification tasks

LLMs foe evaluation and Judge models

Analysis papers

Positions and Surveys

RL and Agents

updated Jul 21

s3: You Don't Need That Much Data to Train a Search Agent via RL

Paper • 2505.14146 • Published May 20 • 18
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI

Paper • 2505.19443 • Published May 26 • 15
ARM: Adaptive Reasoning Model

Paper • 2505.20258 • Published May 26 • 45
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles

Paper • 2505.19914 • Published May 26 • 44
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Paper • 2505.22617 • Published May 28 • 130
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO

Paper • 2505.21457 • Published May 27 • 14
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning

Paper • 2505.23754 • Published May 29 • 16
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30 • 136
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2 • 180
Resa: Transparent Reasoning Models via SAEs

Paper • 2506.09967 • Published Jun 11 • 22
Reasoning with Exploration: An Entropy Perspective

Paper • 2506.14758 • Published Jun 17 • 30
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16 • 263
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Paper • 2506.14965 • Published Jun 17 • 49
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs

Paper • 2506.15211 • Published Jun 18 • 36
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Paper • 2507.10532 • Published Jul 14 • 88
REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once

Paper • 2507.10541 • Published Jul 14 • 29
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Paper • 2507.06261 • Published Jul 7 • 60
LLMalMorph: On The Feasibility of Generating Variant Malware using Large-Language-Models

Paper • 2507.09411 • Published Jul 12 • 3
The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner

Paper • 2507.13332 • Published Jul 17 • 48