GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging Paper • 2508.18993 • Published 11 days ago • 2
SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents Paper • 2508.02085 • Published Aug 4 • 1
RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving Paper • 2505.21577 • Published May 27 • 2
ShieldLearner: A New Paradigm for Jailbreak Attack Defense in LLMs Paper • 2502.13162 • Published Feb 16