TMLR-Group-HF 's Collections

Co-Reward

Co-Reward is a self-supervised reinforcement learning method for LLM reasoning, which leverages contrastive agreement between original and rephrased q