mDPO: Conditional Preference Optimization for Multimodal Large Language Models Paper • 2406.11839 • Published Jun 17 • 37
Pandora: Towards General World Model with Natural Language Actions and Video States Paper • 2406.09455 • Published Jun 12 • 14
In-Context Editing: Learning Knowledge from Self-Induced Distributions Paper • 2406.11194 • Published Jun 17 • 15
Deep Bayesian Active Learning for Preference Modeling in Large Language Models Paper • 2406.10023 • Published Jun 14 • 2
RVT-2: Learning Precise Manipulation from Few Demonstrations Paper • 2406.08545 • Published Jun 12 • 7
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling Paper • 2406.07522 • Published Jun 11 • 37
MotionClone: Training-Free Motion Cloning for Controllable Video Generation Paper • 2406.05338 • Published Jun 8 • 39
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning Paper • 2406.06469 • Published Jun 10 • 23
RePLan: Robotic Replanning with Perception and Language Models Paper • 2401.04157 • Published Jan 8 • 3
Generative Expressive Robot Behaviors using Large Language Models Paper • 2401.14673 • Published Jan 26 • 5
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms Paper • 2406.02900 • Published Jun 5 • 10
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs Paper • 2406.02886 • Published Jun 5 • 7
MotionLLM: Understanding Human Behaviors from Human Motions and Videos Paper • 2405.20340 • Published May 30 • 19
Offline Regularised Reinforcement Learning for Large Language Models Alignment Paper • 2405.19107 • Published May 29 • 13
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF Paper • 2405.19320 • Published May 29 • 9
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework Paper • 2405.11143 • Published May 20 • 33
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction Paper • 2405.10315 • Published May 16 • 10
Self-Play Preference Optimization for Language Model Alignment Paper • 2405.00675 • Published May 1 • 24
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment Paper • 2404.12318 • Published Apr 18 • 14
UniFL: Improve Stable Diffusion via Unified Feedback Learning Paper • 2404.05595 • Published Apr 8 • 23
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4 • 60
RL for Consistency Models: Faster Reward Guided Text-to-Image Generation Paper • 2404.03673 • Published Mar 25 • 14