RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback Paper • 2312.00849 • Published Dec 1, 2023 • 8
RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness Paper • 2405.17220 • Published May 27
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation Paper • 2304.05977 • Published Apr 12, 2023 • 1
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts Paper • 2406.12845 • Published Jun 18 • 1
UltraFeedback: Boosting Language Models with High-quality Feedback Paper • 2310.01377 • Published Oct 2, 2023 • 5
Silkie: Preference Distillation for Large Visual Language Models Paper • 2312.10665 • Published Dec 17, 2023 • 11