Update README.md

c6c60d0 verified 6 months ago

3.69 kB

	---
	license: apache-2.0
	language:
	- en
	---

	# zephyr-7b-dpo-full-ExPO

	The extrapolated (ExPO) model based on [`alignment-handbook/zephyr-7b-dpo-full`](https://huggingface.co/alignment-handbook/zephyr-7b-dpo-full) and [`alignment-handbook/zephyr-7b-sft-full`](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full), as in the "[Weak-to-Strong Extrapolation Expedites Alignment](https://arxiv.org/abs/2404.16792)" paper.

	Specifically, we obtain this model by extrapolating (alpha = 0.3) from the weights of the SFT and DPO/RLHF checkpoints, achieving superior alignment with human preference.

	This model achieves the 18.0% win rate and 20.2% LC win rate on AlpacaEval 2.0.

	## Evaluation Results

	Evaluation results on the AlpacaEval 2.0 benchmark (you can find the evaluation outputs on the [official GitHub repo](https://github.com/chujiezheng/LLM-Extrapolation/tree/main/results_alpaca)):

	\| \| Win Rate (Ori) \| LC Win Rate (Ori) \| Win Rate (+ ExPO) \| LC Win Rate (+ ExPO) \|
	\| ------------------------------------ \| -------------- \| ----------------- \| ----------------- \| -------------------- \|
	\| `HuggingFaceH4/zephyr-7b-alpha` \| 6.7% \| 10.0% \| 10.6% \| 13.6% \|
	\| `HuggingFaceH4/zephyr-7b-beta` \| 10.2% \| 13.2% \| 11.1% \| 14.0% \|
	\| `berkeley-nest/Starling-LM-7B-alpha` \| 15.0% \| 18.3% \| 18.2% \| 19.5% \|
	\| `Nexusflow/Starling-LM-7B-beta` \| 26.6% \| 25.8% \| 29.6% \| 26.4% \|
	\| `snorkelai/Snorkel-Mistral-PairRM` \| 24.7% \| 24.0% \| 28.8% \| 26.4% \|
	\| `RLHFlow/LLaMA3-iterative-DPO-final` \| 29.2% \| 36.0% \| 32.7% \| 37.8% \|
	\| `internlm/internlm2-chat-1.8b` \| 3.8% \| 4.0% \| 5.2% \| 4.3% \|
	\| `internlm/internlm2-chat-7b` \| 20.5% \| 18.3% \| 28.1% \| 22.7% \|
	\| `internlm/internlm2-chat-20b` \| 36.1% \| 24.9% \| 46.2% \| 27.2% \|
	\| `allenai/tulu-2-dpo-7b` \| 8.5% \| 10.2% \| 11.5% \| 11.7% \|
	\| `allenai/tulu-2-dpo-13b` \| 11.2% \| 15.5% \| 15.6% \| 17.6% \|
	\| `allenai/tulu-2-dpo-70b` \| 15.4% \| 21.2% \| 23.0% \| 25.7% \|

	Evaluation results on the MT-Bench benchmark (you can find the evaluation outputs on the [official GitHub repo](https://github.com/chujiezheng/LLM-Extrapolation/tree/main/results_mtbench)):

	\| \| Original \| + ExPO \|
	\| ------------------------------------ \| -------- \| -------- \|
	\| `HuggingFaceH4/zephyr-7b-alpha` \| 6.85 \| 6.87 \|
	\| `HuggingFaceH4/zephyr-7b-beta` \| 7.02 \| 7.06 \|
	\| `berkeley-nest/Starling-LM-7B-alpha` \| 7.82 \| 7.91 \|
	\| `Nexusflow/Starling-LM-7B-beta` \| 8.10 \| 8.18 \|
	\| `snorkelai/Snorkel-Mistral-PairRM` \| 7.63 \| 7.69 \|
	\| `RLHFlow/LLaMA3-iterative-DPO-final` \| 8.08 \| 8.45 \|
	\| `internlm/internlm2-chat-1.8b` \| 5.17 \| 5.26 \|
	\| `internlm/internlm2-chat-7b` \| 7.72 \| 7.80 \|
	\| `internlm/internlm2-chat-20b` \| 8.13 \| 8.26 \|
	\| `allenai/tulu-2-dpo-7b` \| 6.35 \| 6.38 \|
	\| `allenai/tulu-2-dpo-13b` \| 7.00 \| 7.26 \|
	\| `allenai/tulu-2-dpo-70b` \| 7.79 \| 8.03 \|

	---
	license: apache-2.0
	language:
	- en
	---

	# zephyr-7b-dpo-full-ExPO

	The extrapolated (ExPO) model based on [`alignment-handbook/zephyr-7b-dpo-full`](https://huggingface.co/alignment-handbook/zephyr-7b-dpo-full) and [`alignment-handbook/zephyr-7b-sft-full`](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full), as in the "[Weak-to-Strong Extrapolation Expedites Alignment](https://arxiv.org/abs/2404.16792)" paper.

	Specifically, we obtain this model by extrapolating (alpha = 0.3) from the weights of the SFT and DPO/RLHF checkpoints, achieving superior alignment with human preference.

	This model achieves the 18.0% win rate and 20.2% LC win rate on AlpacaEval 2.0.

	## Evaluation Results

	Evaluation results on the AlpacaEval 2.0 benchmark (you can find the evaluation outputs on the [official GitHub repo](https://github.com/chujiezheng/LLM-Extrapolation/tree/main/results_alpaca)):

	\| \| Win Rate (Ori) \| LC Win Rate (Ori) \| Win Rate (+ ExPO) \| LC Win Rate (+ ExPO) \|
	\| ------------------------------------ \| -------------- \| ----------------- \| ----------------- \| -------------------- \|
	\| `HuggingFaceH4/zephyr-7b-alpha` \| 6.7% \| 10.0% \| 10.6% \| 13.6% \|
	\| `HuggingFaceH4/zephyr-7b-beta` \| 10.2% \| 13.2% \| 11.1% \| 14.0% \|
	\| `berkeley-nest/Starling-LM-7B-alpha` \| 15.0% \| 18.3% \| 18.2% \| 19.5% \|
	\| `Nexusflow/Starling-LM-7B-beta` \| 26.6% \| 25.8% \| 29.6% \| 26.4% \|
	\| `snorkelai/Snorkel-Mistral-PairRM` \| 24.7% \| 24.0% \| 28.8% \| 26.4% \|
	\| `RLHFlow/LLaMA3-iterative-DPO-final` \| 29.2% \| 36.0% \| 32.7% \| 37.8% \|
	\| `internlm/internlm2-chat-1.8b` \| 3.8% \| 4.0% \| 5.2% \| 4.3% \|
	\| `internlm/internlm2-chat-7b` \| 20.5% \| 18.3% \| 28.1% \| 22.7% \|
	\| `internlm/internlm2-chat-20b` \| 36.1% \| 24.9% \| 46.2% \| 27.2% \|
	\| `allenai/tulu-2-dpo-7b` \| 8.5% \| 10.2% \| 11.5% \| 11.7% \|
	\| `allenai/tulu-2-dpo-13b` \| 11.2% \| 15.5% \| 15.6% \| 17.6% \|
	\| `allenai/tulu-2-dpo-70b` \| 15.4% \| 21.2% \| 23.0% \| 25.7% \|

	Evaluation results on the MT-Bench benchmark (you can find the evaluation outputs on the [official GitHub repo](https://github.com/chujiezheng/LLM-Extrapolation/tree/main/results_mtbench)):

	\| \| Original \| + ExPO \|
	\| ------------------------------------ \| -------- \| -------- \|
	\| `HuggingFaceH4/zephyr-7b-alpha` \| 6.85 \| 6.87 \|
	\| `HuggingFaceH4/zephyr-7b-beta` \| 7.02 \| 7.06 \|
	\| `berkeley-nest/Starling-LM-7B-alpha` \| 7.82 \| 7.91 \|
	\| `Nexusflow/Starling-LM-7B-beta` \| 8.10 \| 8.18 \|
	\| `snorkelai/Snorkel-Mistral-PairRM` \| 7.63 \| 7.69 \|
	\| `RLHFlow/LLaMA3-iterative-DPO-final` \| 8.08 \| 8.45 \|
	\| `internlm/internlm2-chat-1.8b` \| 5.17 \| 5.26 \|
	\| `internlm/internlm2-chat-7b` \| 7.72 \| 7.80 \|
	\| `internlm/internlm2-chat-20b` \| 8.13 \| 8.26 \|
	\| `allenai/tulu-2-dpo-7b` \| 6.35 \| 6.38 \|
	\| `allenai/tulu-2-dpo-13b` \| 7.00 \| 7.26 \|
	\| `allenai/tulu-2-dpo-70b` \| 7.79 \| 8.03 \|