Building Math Agents with Multi-Turn Iterative Preference Learning Paper • 2409.02392 • Published Sep 4 • 14
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs Paper • 2406.02886 • Published Jun 5 • 7
Offline Regularised Reinforcement Learning for Large Language Models Alignment Paper • 2405.19107 • Published May 29 • 13
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Paper • 2403.05530 • Published Mar 8 • 59
LiPO: Listwise Preference Optimization through Learning-to-Rank Paper • 2402.01878 • Published Feb 2 • 19
Gemini: A Family of Highly Capable Multimodal Models Paper • 2312.11805 • Published Dec 19, 2023 • 45
Statistical Rejection Sampling Improves Preference Optimization Paper • 2309.06657 • Published Sep 13, 2023 • 13
Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting Paper • 2306.17563 • Published Jun 30, 2023 • 9
SLiC-HF: Sequence Likelihood Calibration with Human Feedback Paper • 2305.10425 • Published May 17, 2023 • 5