alignment Self-Play Preference Optimization for Language Model Alignment Paper • 2405.00675 • Published May 1 • 24
Self-Play Preference Optimization for Language Model Alignment Paper • 2405.00675 • Published May 1 • 24