R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning Paper • 2508.21113 • Published 10 days ago • 103
Xwin-LM: Strong and Scalable Alignment Practice for LLMs Paper • 2405.20335 • Published May 30, 2024 • 18