OpenELM-1_1B-DPO-full-1

This model is a fine-tuned version of data/OpenELM-1_1B-SFT-1 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.8127
Rewards/chosen: -7.4062
Rewards/rejected: -9.625
Rewards/accuracies: 0.7266
Rewards/margins: 2.2188
Logps/rejected: -1248.0
Logps/chosen: -1056.0
Logits/rejected: -1.5781
Logits/chosen: -4.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6194	0.1047	100	0.6171	-0.875	-1.1797	0.6758	0.3008	-406.0	-406.0	-10.75	-11.0
0.5947	0.2093	200	0.6038	-1.4531	-1.8359	0.6680	0.3848	-472.0	-464.0	-11.3125	-11.75
0.6583	0.3140	300	0.6007	-2.2344	-2.7344	0.6758	0.4941	-560.0	-544.0	-13.1875	-13.5
0.6003	0.4186	400	0.5892	-1.8359	-2.3906	0.7012	0.5586	-528.0	-502.0	-9.75	-10.3125
0.5701	0.5233	500	0.5772	-1.9688	-2.5	0.6875	0.5391	-540.0	-516.0	-10.5	-11.0
0.55	0.6279	600	0.5671	-2.6875	-3.4219	0.7129	0.7266	-632.0	-588.0	-9.5625	-10.4375
0.554	0.7326	700	0.5667	-2.625	-3.375	0.7285	0.75	-628.0	-580.0	-9.25	-10.0625
0.5478	0.8373	800	0.5699	-2.7188	-3.3906	0.7070	0.6602	-628.0	-592.0	-8.9375	-9.875
0.5759	0.9419	900	0.5660	-2.75	-3.4375	0.7090	0.6914	-632.0	-592.0	-10.25	-11.1875
0.2284	1.0466	1000	0.5897	-3.375	-4.5625	0.7305	1.1797	-744.0	-656.0	-6.8125	-8.8125
0.1919	1.1512	1100	0.5994	-3.7656	-4.9375	0.7266	1.1797	-784.0	-696.0	-8.375	-10.125
0.1942	1.2559	1200	0.6058	-4.5	-5.6562	0.7188	1.1719	-856.0	-768.0	-3.5469	-5.5
0.2071	1.3605	1300	0.5985	-4.3125	-5.4688	0.7441	1.1484	-836.0	-752.0	-6.1875	-7.7812
0.1811	1.4652	1400	0.6045	-5.375	-6.5625	0.7363	1.2109	-948.0	-856.0	-6.6562	-8.0
0.1715	1.5699	1500	0.6054	-4.7188	-6.0312	0.7383	1.3047	-892.0	-792.0	-7.1875	-8.6875
0.186	1.6745	1600	0.6277	-4.4688	-5.7188	0.7285	1.2344	-860.0	-768.0	-8.3125	-9.6875
0.1763	1.7792	1700	0.6386	-5.2188	-6.625	0.7246	1.4062	-952.0	-840.0	-5.5312	-7.4375
0.1678	1.8838	1800	0.6220	-4.5625	-5.8125	0.7246	1.2266	-868.0	-776.0	-6.8125	-8.4375
0.1563	1.9885	1900	0.6274	-5.5	-6.8438	0.7266	1.3672	-976.0	-868.0	-6.3438	-7.875
0.0144	2.0931	2000	0.7311	-6.4375	-8.1875	0.7305	1.7656	-1112.0	-960.0	-3.3281	-5.5
0.029	2.1978	2100	0.8195	-7.5312	-9.6875	0.7285	2.1719	-1256.0	-1072.0	-2.375	-4.75
0.0228	2.3025	2200	0.8282	-7.6875	-9.875	0.7188	2.2031	-1280.0	-1088.0	-1.9297	-4.375
0.0159	2.4071	2300	0.8055	-7.2188	-9.375	0.7266	2.1562	-1224.0	-1040.0	-2.0625	-4.4688
0.0192	2.5118	2400	0.7881	-6.9688	-9.0625	0.7207	2.0938	-1200.0	-1016.0	-2.3906	-4.7812
0.0158	2.6164	2500	0.8027	-7.3438	-9.5	0.7266	2.1562	-1240.0	-1056.0	-1.5312	-3.9375
0.0193	2.7211	2600	0.8205	-7.625	-9.875	0.7383	2.25	-1280.0	-1080.0	-1.1797	-3.5938
0.0229	2.8257	2700	0.8136	-7.4375	-9.625	0.7266	2.2188	-1256.0	-1064.0	-1.5391	-3.9531
0.0213	2.9304	2800	0.8121	-7.4062	-9.625	0.7285	2.2188	-1248.0	-1056.0	-1.5781	-4.0

Framework versions

Transformers 4.44.2
Pytorch 2.3.0
Datasets 2.21.0
Tokenizers 0.19.1

CharlesLi
/

OpenELM-1_1B-DPO-full-1

OpenELM-1_1B-DPO-full-1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train CharlesLi/OpenELM-1_1B-DPO-full-1

Evaluation results