Xing Han Lù
xhluca
AI & ML interests
None yet
Organizations
AgentRewardBench
-
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Paper • 2504.08942 • Published • 28 -
McGill-NLP/agent-reward-bench
Viewer • Updated • 1.41k • 3.65k • 4 -
Running5
Agent Reward Bench Demo
💻5Explore agent trajectories and judgments in web benchmarks
-
Runtime error3
Agent Reward Bench Leaderboard
🥇3Leaderboard for AgentRewardBench
WebLINX
AgentRewardBench
-
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Paper • 2504.08942 • Published • 28 -
McGill-NLP/agent-reward-bench
Viewer • Updated • 1.41k • 3.65k • 4 -
Running5
Agent Reward Bench Demo
💻5Explore agent trajectories and judgments in web benchmarks
-
Runtime error3
Agent Reward Bench Leaderboard
🥇3Leaderboard for AgentRewardBench
BM25S
https://github.com/xhluca/bm25s