YANG ZHOU
BAOLONGZHANSHEN
AI & ML interests
RLHF and DPO
Recent Activity
authored
a paper
12 days ago
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement
Learning for General LLM Reasoning
commented on
a paper
12 days ago
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement
Learning for General LLM Reasoning
Organizations
None yet