GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging Paper • 2508.18993 • Published 11 days ago • 2
SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents Paper • 2508.02085 • Published Aug 4 • 1
RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving Paper • 2505.21577 • Published May 27 • 2
ShieldLearner: A New Paradigm for Jailbreak Attack Defense in LLMs Paper • 2502.13162 • Published Feb 16
GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging Paper • 2508.18993 • Published 11 days ago • 2
GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging Paper • 2508.18993 • Published 11 days ago • 2 • 1
Running on CPU Upgrade 13.5k 13.5k Open LLM Leaderboard 🏆 Track, rank and evaluate open LLMs and chatbots