Statistical Rejection Sampling Improves Preference Optimization Paper • 2309.06657 • Published Sep 13, 2023 • 13
Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation? Paper • 2309.07462 • Published Sep 14, 2023 • 3
Stabilizing RLHF through Advantage Model and Selective Rehearsal Paper • 2309.10202 • Published Sep 18, 2023 • 9
Aligning Large Multimodal Models with Factually Augmented RLHF Paper • 2309.14525 • Published Sep 25, 2023 • 29
Safe RLHF: Safe Reinforcement Learning from Human Feedback Paper • 2310.12773 • Published Oct 19, 2023 • 28
Contrastive Prefence Learning: Learning from Human Feedback without RL Paper • 2310.13639 • Published Oct 20, 2023 • 24