tinyllama_mole_dpo_ep3

This model is a fine-tuned version of ondevicellm/tinyllama_mole_sft_ultrachat_ep3 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.6285
Rewards/chosen: -0.3050
Rewards/rejected: -0.5353
Rewards/accuracies: 0.6806
Rewards/margins: 0.2302
Logps/rejected: -354.2071
Logps/chosen: -373.1399
Logits/rejected: -1.6731
Logits/chosen: -1.8041

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6896	0.1	100	0.6899	0.0064	-0.0013	0.6448	0.0076	-300.8089	-342.0017	-1.7574	-1.8918
0.6762	0.21	200	0.6756	-0.0293	-0.0716	0.6627	0.0423	-307.8423	-345.5688	-1.7501	-1.8839
0.6499	0.31	300	0.6587	-0.0875	-0.1813	0.6687	0.0938	-318.8118	-351.3895	-1.7358	-1.8688
0.6374	0.42	400	0.6451	-0.1726	-0.3218	0.6746	0.1493	-332.8632	-359.8953	-1.7164	-1.8482
0.6348	0.52	500	0.6377	-0.2696	-0.4550	0.6647	0.1854	-346.1808	-369.6013	-1.6884	-1.8208
0.6308	0.63	600	0.6333	-0.2783	-0.4815	0.6726	0.2032	-348.8291	-370.4673	-1.6965	-1.8269
0.62	0.73	700	0.6312	-0.2323	-0.4505	0.6806	0.2182	-345.7306	-365.8656	-1.6841	-1.8149
0.6055	0.84	800	0.6287	-0.2877	-0.5169	0.6865	0.2292	-352.3697	-371.4099	-1.6793	-1.8099
0.6357	0.94	900	0.6285	-0.3050	-0.5353	0.6806	0.2302	-354.2071	-373.1399	-1.6731	-1.8041

Framework versions

Transformers 4.37.0
Pytorch 2.1.2+cu118
Datasets 2.16.1
Tokenizers 0.15.0

ondevicellm
/

tinyllama_mole_dpo_ep3

tinyllama_mole_dpo_ep3

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for ondevicellm/tinyllama_mole_dpo_ep3

Dataset used to train ondevicellm/tinyllama_mole_dpo_ep3

Evaluation results