TMLR-Group-HF/CoReward-RephrasedMATH
Viewer
•
Updated
•
7.5k
•
100
Co-Reward is a self-supervised reinforcement learning method for LLM reasoning, which leverages contrastive agreement between original and rephrased q