Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models Paper • 2508.15202 • Published 18 days ago • 4
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs Paper • 2508.16153 • Published 17 days ago • 132
CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning Paper • 2508.15868 • Published 18 days ago • 3
StepWiser: Stepwise Generative Judges for Wiser Reasoning Paper • 2508.19229 • Published 12 days ago • 19
Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning Paper • 2507.22565 • Published Jul 30 • 9
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning Paper • 2507.19457 • Published Jul 25 • 26