Wenboz
/

phi3-offline-dpo-lora-noise-0.0-5e-7-thre-1.5-42

alignment-handbook

Generated from Trainer

Model card Files Files and versions Community

Edit model card

phi3-offline-dpo-lora-noise-0.0-5e-7-thre-1.5-42

This model is a fine-tuned version of microsoft/Phi-3-mini-4k-instruct on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.6130
Rewards/chosen: -0.4194
Rewards/rejected: -0.5933
Rewards/accuracies: 0.7540
Rewards/margins: 0.1739
Logps/rejected: -459.6432
Logps/chosen: -448.1436
Logits/rejected: 12.5287
Logits/chosen: 13.8414

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 4
total_train_batch_size: 64
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6885	0.1778	100	0.6884	-0.0158	-0.0244	0.6190	0.0086	-402.7496	-407.7805	12.8305	14.2621
0.6712	0.3556	200	0.6680	-0.0971	-0.1464	0.7937	0.0493	-414.9504	-415.9148	12.6482	14.0845
0.6339	0.5333	300	0.6389	-0.2593	-0.3712	0.7540	0.1119	-437.4307	-432.1300	12.8556	14.1744
0.6203	0.7111	400	0.6203	-0.3738	-0.5313	0.7540	0.1575	-453.4457	-443.5887	12.6256	13.9444
0.6102	0.8889	500	0.6131	-0.4150	-0.5892	0.7540	0.1743	-459.2376	-447.7001	12.5314	13.8427

Framework versions

PEFT 0.7.1
Transformers 4.42.3
Pytorch 2.3.0+cu121
Datasets 2.14.6
Tokenizers 0.19.1

Downloads last month: 3

Inference API

Unable to determine this model’s pipeline type. Check the docs .

Model tree for Wenboz/phi3-offline-dpo-lora-noise-0.0-5e-7-thre-1.5-42

Base model

microsoft/Phi-3-mini-4k-instruct

Adapter

(288)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard