RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models Paper • 2402.10038 • Published Feb 15 • 6 • 1
LiPO: Listwise Preference Optimization through Learning-to-Rank Paper • 2402.01878 • Published Feb 2 • 19 • 6