Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19 • 135
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback Paper • 2406.00888 • Published Jun 2 • 30
LESS: Selecting Influential Data for Targeted Instruction Tuning Paper • 2402.04333 • Published Feb 6 • 3
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models Paper • 2401.01335 • Published Jan 2 • 64
Robotic Offline RL from Internet Videos via Value-Function Pre-Training Paper • 2309.13041 • Published Sep 22, 2023 • 8