Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training Paper • 2509.03403 • Published 3 days ago • 18
Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text Paper • 2506.07001 • Published Jun 8 • 4
MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving Paper • 2503.03205 • Published Mar 5 • 4
TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts Paper • 2407.03203 • Published Jul 3, 2024 • 12