Jointly Reinforcing Diversity and Quality in Language Model Generations Paper • 2509.02534 • Published Sep 2, 2025 • 24 • 1
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning Paper • 2505.02363 • Published May 5, 2025 • 7 • 2