-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63 -
Towards Efficient and Exact Optimization of Language Model Alignment
Paper • 2402.00856 • Published -
A General Theoretical Paradigm to Understand Learning from Human Preferences
Paper • 2310.12036 • Published • 16 -
Statistical Rejection Sampling Improves Preference Optimization
Paper • 2309.06657 • Published • 14
Yiming Zheng
ZYM666
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
5 days ago
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
liked
a Space
12 days ago
TTS-AGI/Voice-Clone-Arena
liked
a dataset
18 days ago
AIDC-AI/CSEMOTIONS