MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering Paper • 2410.07095 • Published Oct 9, 2024 • 8
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30 • 143
WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning Paper • 2512.02425 • Published 23 days ago • 23
Adaptive Multi-Agent Response Refinement in Conversational Systems Paper • 2511.08319 • Published Nov 11 • 41
Jackson0018/Preference_Set_Qwen2.5-3B-Instruct_INFV_ref_as_gt_True_IterRet_individual_recall_True_top_k_30 Viewer • Updated Sep 28 • 21.4k • 20 • 1
Jackson0018/Preference_Set_Qwen2.5-3B-Instruct_JEmb_ref_as_gt_True_IterRet_individual_recall_True_top_k_30 Viewer • Updated Sep 12 • 25.1k • 10 • 1
Jackson0018/Preference_Set_Llama-3.2-3B-Instruct_INFV_ref_as_gt_True_IterRet_individual_recall_True_top_k_30 Viewer • Updated Sep 18 • 21.7k • 21 • 1
Jackson0018/Preference_Set_Qwen2.5-3B-Instruct_BGE_ref_as_gt_True_IterRet_individual_recall_True_top_k_30 Viewer • Updated Sep 20 • 21.2k • 11 • 1
Jackson0018/Preference_Set_Llama-3.2-3B-Instruct_DPO_ref_as_gt_True_IterRet_individual_recall_True_top_k_30 Viewer • Updated Aug 4 • 17.1k • 12 • 1