Co-Reward

TMLR-Group-HF 's Collections

NoRa

updated 24 days ago

Co-Reward is a self-supervised reinforcement learning method for LLM reasoning, which leverages contrastive agreement between original and rephrased q

Upvote

TMLR-Group-HF/CoReward-RephrasedMATH

Viewer • Updated Aug 4 • 7.5k • 100
TMLR-Group-HF/CoReward-Qwen2.5-3B

3B • Updated Aug 4 • 9
TMLR-Group-HF/CoReward-Qwen2.5-7B

8B • Updated Aug 4 • 15 • 1
TMLR-Group-HF/CoReward-Qwen3-1.7B-Base

2B • Updated Aug 4 • 15
TMLR-Group-HF/CoReward-Qwen3-4B-Base

4B • Updated Aug 4 • 22 • 1
TMLR-Group-HF/CoReward-Qwen3-8B-Base

8B • Updated Aug 4 • 22 • 1
TMLR-Group-HF/CoReward-Llama-3.2-3B-Instruct

4B • Updated Aug 4 • 18
TMLR-Group-HF/Self-Certainty-Qwen2.5-3B

3B • Updated 24 days ago • 14
TMLR-Group-HF/Self-Certainty-Qwen2.5-7B

8B • Updated 30 days ago • 17 • 1
TMLR-Group-HF/Self-Certainty-Qwen3-8B-Base

8B • Updated Aug 5 • 19 • 1
TMLR-Group-HF/Self-Certainty-Llama-3.2-3B-Instruct

4B • Updated 24 days ago • 19
TMLR-Group-HF/Self-Certainty-Qwen3-4B-Base

4B • Updated Aug 5 • 23 • 1
TMLR-Group-HF/Self-Certainty-Qwen3-1.7B-Base

2B • Updated 23 days ago • 22
TMLR-Group-HF/Majority-Voting-Llama-3.2-3B-Instruct

4B • Updated 24 days ago • 22
TMLR-Group-HF/GT-Qwen3-4B-Base

4B • Updated 30 days ago • 21
TMLR-Group-HF/GT-Qwen2.5-3B

3B • Updated 24 days ago • 13
TMLR-Group-HF/GT-Qwen2.5-7B

8B • Updated 30 days ago • 15
TMLR-Group-HF/Majority-Voting-Qwen2.5-3B

3B • Updated 24 days ago • 20 • 1
TMLR-Group-HF/GT-Llama-3.2-3B-Instruct

4B • Updated 24 days ago • 20
TMLR-Group-HF/GT-Qwen3-1.7B-Base

2B • Updated 24 days ago • 27
TMLR-Group-HF/GT-Qwen3-8B-Base

8B • Updated Aug 5 • 17
TMLR-Group-HF/Majority-Voting-Qwen3-8B-Base

8B • Updated Aug 5 • 19 • 1
TMLR-Group-HF/Majority-Voting-Qwen3-4B-Base

4B • Updated Aug 5 • 21
TMLR-Group-HF/Majority-Voting-Qwen2.5-7B

8B • Updated 30 days ago • 15 • 1
TMLR-Group-HF/Majority-Voting-Qwen3-1.7B-Base

2B • Updated 23 days ago • 19
TMLR-Group-HF/Entropy-Qwen2.5-3B

3B • Updated 24 days ago • 13
TMLR-Group-HF/Entropy-Qwen3-8B-Base

8B • Updated Aug 5 • 17 • 1
TMLR-Group-HF/Entropy-Qwen3-4B-Base

4B • Updated Aug 5 • 19 • 1
TMLR-Group-HF/Entropy-Qwen2.5-7B

8B • Updated 23 days ago • 15
TMLR-Group-HF/Entropy-Qwen3-1.7B-Base

2B • Updated 23 days ago • 22
TMLR-Group-HF/Entropy-Llama-3.2-3B-Instruct

4B • Updated 23 days ago • 18

Upvote

Collection guide
Browse collections