wxzhang's picture
End of training
83248f0 verified
metadata
base_model: PKU-Alignment/alpaca-7b-reproduced
tags:
  - alignment-handbook
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - PKU-Alignment/PKU-SafeRLHF
model-index:
  - name: dpo-selective-alpaca
    results: []

dpo-selective-alpaca

This model is a fine-tuned version of PKU-Alignment/alpaca-7b-reproduced on the PKU-Alignment/PKU-SafeRLHF dataset. It achieves the following results on the evaluation set:

  • Loss: 4659.3857
  • Rewards/chosen: -0.2274
  • Rewards/rejected: -0.2645
  • Rewards/accuracies: 0.6342
  • Rewards/margins: 0.0372
  • Rewards/safe Rewards: -0.2254
  • Rewards/unsafe Rewards: -0.2253
  • Logps/rejected: -174.8009
  • Logps/chosen: -202.5513
  • Logits/rejected: -1.7296
  • Logits/chosen: -1.5835

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/safe Rewards Rewards/unsafe Rewards Logps/rejected Logps/chosen Logits/rejected Logits/chosen
4842.2766 0.11 500 4952.8877 0.0166 0.0096 0.6573 0.0070 0.0166 0.0165 -147.3908 -178.1579 -1.7834 -1.6386
4764.3852 0.22 1000 4865.9209 -0.0099 -0.0282 0.6644 0.0184 -0.0094 -0.0098 -151.1701 -180.8021 -1.7281 -1.5780
4814.1586 0.32 1500 4783.4697 -0.1011 -0.1298 0.6566 0.0286 -0.1003 -0.1009 -161.3237 -189.9300 -1.7085 -1.5581
4693.2395 0.43 2000 4735.1978 -0.1597 -0.1926 0.6480 0.0329 -0.1583 -0.1588 -167.6019 -195.7835 -1.7080 -1.5598
4747.273 0.54 2500 4701.7651 -0.1978 -0.2321 0.6416 0.0344 -0.1960 -0.1962 -171.5614 -199.5948 -1.7166 -1.5693
4464.0027 0.65 3000 4681.6167 -0.2061 -0.2411 0.6356 0.0350 -0.2041 -0.2043 -172.4578 -200.4294 -1.7240 -1.5768
4613.8953 0.75 3500 4667.7300 -0.2201 -0.2561 0.6333 0.0360 -0.2182 -0.2182 -173.9565 -201.8304 -1.7289 -1.5822
4642.2859 0.86 4000 4661.8745 -0.2258 -0.2627 0.6336 0.0369 -0.2238 -0.2238 -174.6188 -202.3950 -1.7298 -1.5833
4747.2375 0.97 4500 4659.3687 -0.2266 -0.2638 0.6363 0.0372 -0.2246 -0.2245 -174.7243 -202.4745 -1.7302 -1.5838

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.2
  • Datasets 2.14.6
  • Tokenizers 0.15.0