This is a test DPO finetune of Microsoft phi-2
Two DPO datasets are used. Training was for 1 epoch as a qlora with rank 64.