Model Checkpoints in the ExPO Paper - a chujiezheng Collection

chujiezheng 's Collections

Weak-to-Strong Extrapolation Expedites Alignment

Model Checkpoints in the ExPO Paper

Model Checkpoints in the ExPO Paper

updated May 19

chujiezheng/zephyr_0.05

Text Generation • Updated Apr 28 • 12

Note zephyr-7b-sft-full trained by DPO with 5% UltraFeedback data
chujiezheng/zephyr_0.1

Text Generation • Updated Apr 28 • 23

Note zephyr-7b-sft-full trained by DPO with 10% UltraFeedback data
chujiezheng/zephyr_0.1_a8.0

Text Generation • Updated Apr 28 • 9

Note alpha = 8.0
chujiezheng/zephyr_0.2

Text Generation • Updated Apr 28 • 7

Note zephyr-7b-sft-full trained by DPO with 20% UltraFeedback data
chujiezheng/zephyr_0.2_a2.5

Text Generation • Updated Apr 28 • 8

Note alpha = 2.5
chujiezheng/zephyr_0.4

Text Generation • Updated Apr 28 • 10

Note zephyr-7b-sft-full trained by DPO with 40% UltraFeedback data
chujiezheng/zephyr_0.2_2lr

Text Generation • Updated Apr 25 • 10

Note zephyr-7b-sft-full trained by DPO with 20% UltraFeedback data and x2 learning rate
chujiezheng/zephyr_0.2_3lr

Text Generation • Updated Apr 25 • 5

Note zephyr-7b-sft-full trained by DPO with 20% UltraFeedback data and x3 learning rate
chujiezheng/zephyr_0.2_2ep

Text Generation • Updated Apr 25 • 9

Note zephyr-7b-sft-full trained by DPO with 20% UltraFeedback data and x2 epochs
chujiezheng/zephyr_0.2_3ep

Text Generation • Updated Apr 25 • 12

Note zephyr-7b-sft-full trained by DPO with 20% UltraFeedback data and x3 epochs