Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration Paper • 2508.13755 • Published 19 days ago • 12
Instruct-SCTG: Guiding Sequential Controlled Text Generation through Instructions Paper • 2312.12299 • Published Dec 19, 2023
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators Paper • 2403.16950 • Published Mar 25, 2024 • 4
MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models Paper • 2406.13975 • Published Jun 20, 2024