Direct Preference Optimization: Your Language Model is Secretly a Reward Model Paper โข 2305.18290 โข Published May 29, 2023 โข 48
Tiny Series Collection Tiny datasets that empower the foundation of Small Language Model! โข 11 items โข Updated Jan 26 โข 36