@macadeliccc on Hugging Face: "Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

macadeliccc

posted an update Feb 16

Post

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

UCLA-AGI has proven that large language models, even weaker large language models, can improve themselves with data only produced by original model. The question they answer in their paper is:

"Can we empower a weak LLM to improve itself without acquiring additional human annotated data?"

They answer this question by the proposal and testing of a novel fine-tuning method they call Self-Play fIne-tuNing (SPIN). The process starts by applying a supervised fine-tune (SFT) to zephyr-7b using all 200k samples of HuggingfaceH4/ultrachat_200k to eliminate the need for a human annotator.

Once the model has completed SFT, the SPIN method suggests generating 50k samples of synthetic data pairs of 'chosen' and 'rejected' samples. The model will be fine tuned on those generations, and this process will repeat for another 3 iterations for a total 200k samples.

This experiment is unique because they propose that their method can yield upwards of 10% performance gains without using any additional human annotated data. The strategy was designed to improve the less strong language models, but with further experimentation could be a formidable strategy for improving language models.

If you would like to explore this strategy for yourself, here are some resources:
Colab: https://colab.research.google.com/drive/1IjDeNVBsRru2-hM_9aauD6gVI-VvJnpk?usp=sharing
Github: https://github.com/uclaml/SPIN
The product of the experiment: UCLA-AGI/zephyr-7b-sft-full-SPIN-iter3
Paper: Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models (2401.01335)

macadeliccc

Feb 16

If you are interested in a notebook, do let me know. I have one but the whole experiment will cost at minimum 250 dollars. I do not know how much it will cost if you use the new vllm method. It seems to offer quite a few improvements.

dimentox

Feb 16

Ill take it please.

In this post