@thughost on Hugging Face: "We've open-sourced the code and models for Self-Play Preference Optimization…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

thughost

posted an update Jun 26

Post

694

We've open-sourced the code and models for Self-Play Preference Optimization (SPPO)! 🚀🚀🚀
🤗paper: Self-Play Preference Optimization for Language Model Alignment (2405.00675)
⭐ code: https://github.com/uclaml/SPPO
🤗models: UCLA-AGI/sppo-6635fdd844f2b2e4a94d0b9a

In this post

thughost Quanquan Gu