-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 48 -
Towards Efficient and Exact Optimization of Language Model Alignment
Paper • 2402.00856 • Published -
A General Theoretical Paradigm to Understand Learning from Human Preferences
Paper • 2310.12036 • Published • 14 -
Statistical Rejection Sampling Improves Preference Optimization
Paper • 2309.06657 • Published • 13
Collections
Discover the best community collections!
Collections including paper arxiv:2310.12036
-
A General Theoretical Paradigm to Understand Learning from Human Preferences
Paper • 2310.12036 • Published • 14 -
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 62 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 48
-
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper • 2310.13639 • Published • 24 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 47 -
A General Theoretical Paradigm to Understand Learning from Human Preferences
Paper • 2310.12036 • Published • 14 -
Deep Reinforcement Learning from Hierarchical Weak Preference Feedback
Paper • 2309.02632 • Published • 1
-
Moral Foundations of Large Language Models
Paper • 2310.15337 • Published • 1 -
Specific versus General Principles for Constitutional AI
Paper • 2310.13798 • Published • 2 -
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper • 2310.13639 • Published • 24 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 47