wxzhang's picture
Model save
581934d verified
|
raw
history blame
3.42 kB
metadata
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: dpo-selective-buffer-safeipo
    results: []

dpo-selective-buffer-safeipo

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 4322.0576
  • Rewards/chosen: -0.9426
  • Rewards/rejected: -1.0072
  • Rewards/accuracies: 0.6033
  • Rewards/margins: 0.0646
  • Rewards/safe Rewards: -0.9377
  • Rewards/unsafe Rewards: -0.9382
  • Logps/rejected: -193.1814
  • Logps/chosen: -224.6856
  • Logits/rejected: -1.7714
  • Logits/chosen: -1.9525

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 2
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/safe Rewards Rewards/unsafe Rewards Logps/rejected Logps/chosen Logits/rejected Logits/chosen
13096.7359 0.16 300 4529.6733 -0.3957 -0.4772 0.6584 0.0815 -0.3930 -0.3956 -140.1830 -170.0027 -2.1815 -2.3195
11584.7875 0.32 600 4406.7134 -0.8083 -0.8819 0.6338 0.0736 -0.8028 -0.8050 -180.6571 -211.2575 -1.7938 -1.9934
10862.3484 0.48 900 4377.5635 -0.8828 -0.9530 0.6196 0.0701 -0.8775 -0.8778 -187.7609 -218.7140 -1.7468 -1.9377
11671.4219 0.65 1200 4346.4053 -0.9811 -1.0509 0.6158 0.0699 -0.9764 -0.9768 -197.5588 -228.5369 -1.6740 -1.8665
10202.4125 0.81 1500 4320.9878 -0.9655 -1.0271 0.6023 0.0617 -0.9611 -0.9618 -195.1794 -226.9775 -1.7645 -1.9420
11785.8336 0.97 1800 4320.8208 -0.9417 -1.0065 0.6027 0.0648 -0.9369 -0.9373 -193.1151 -224.6014 -1.7745 -1.9550

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.2
  • Datasets 2.14.6
  • Tokenizers 0.15.0