Edit model card

dpo_pythia1b_hh_rlhf.yml_local_29-04-24_13-31-33_xxxxx

This model is a fine-tuned version of sophiex/pythia-1b-sft_hh_rlhf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6581
  • Rewards/chosen: -0.1633
  • Rewards/rejected: -0.3103
  • Rewards/accuracies: 0.5971
  • Rewards/margins: 0.1470
  • Logps/rejected: -160.0996
  • Logps/chosen: -160.0996

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 150
  • num_epochs: 1
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen
0.6931 0.0 1 0.6931 0.0 0.0 0.0 0.0 -158.4665 -158.4665
0.6701 0.2 503 0.6745 -0.0602 -0.1382 0.5734 0.0779 -159.0690 -159.0690
0.6624 0.4 1006 0.6670 -0.0864 -0.1939 0.5862 0.1075 -159.3303 -159.3303
0.6587 0.6 1509 0.6612 -0.1043 -0.2301 0.5891 0.1259 -159.5091 -159.5091
0.6511 0.8 2012 0.6581 -0.1633 -0.3103 0.5971 0.1470 -160.0996 -160.0996

Framework versions

  • PEFT 0.10.0
  • Transformers 4.38.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.17.1
  • Tokenizers 0.15.2
Downloads last month
3
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for sophiex/dpo_pythia1b_hh_rlhf.yml_local_29-04-24_13-31-33_xxxxx

Adapter
(1)
this model