Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
Paper
•
2509.24203
•
Published
•
8
•
2
Totally Free + Zero Barriers + No Login Required