Edit model card

Llama-2-7b-gen-dpo-10k

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5523
  • Rewards/real: 4.9937
  • Rewards/generated: 2.9393
  • Rewards/accuracies: 0.8462
  • Rewards/margins: 2.0544
  • Logps/generated: -253.2762
  • Logps/real: -213.7525
  • Logits/generated: -0.8742
  • Logits/real: -0.8781

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/real Rewards/generated Rewards/accuracies Rewards/margins Logps/generated Logps/real Logits/generated Logits/real
0.9274 0.1984 62 0.9072 0.3392 0.2823 0.5769 0.0569 -279.8465 -260.2975 -0.8569 -0.8220
0.7991 0.3968 124 0.7609 1.1652 0.5108 0.75 0.6543 -277.5608 -252.0375 -0.7345 -0.6920
0.7105 0.5952 186 0.6948 2.1035 1.1496 0.75 0.9539 -271.1730 -242.6541 -0.7212 -0.6757
0.6956 0.7936 248 0.6513 2.5451 1.4131 0.7692 1.1320 -268.5384 -238.2380 -0.7591 -0.7111
0.6502 0.992 310 0.6210 2.9937 1.6865 0.8269 1.3073 -265.8045 -233.7518 -0.8166 -0.7765
0.5016 1.1904 372 0.5914 3.5957 2.0722 0.8269 1.5235 -261.9469 -227.7318 -0.8192 -0.7953
0.5296 1.3888 434 0.5809 4.1921 2.5569 0.8462 1.6352 -257.1006 -221.7682 -0.8477 -0.8234
0.4344 1.5872 496 0.5769 4.4690 2.6897 0.8462 1.7792 -255.7717 -218.9994 -0.8474 -0.8298
0.513 1.7856 558 0.5656 4.6486 2.8940 0.8462 1.7546 -253.7296 -217.2037 -0.8719 -0.8539
0.4632 1.984 620 0.5639 4.7129 2.9278 0.8462 1.7851 -253.3908 -216.5599 -0.8339 -0.8251
0.391 2.1824 682 0.5555 4.8380 2.9011 0.8462 1.9369 -253.6578 -215.3090 -0.8728 -0.8686
0.3823 2.3808 744 0.5525 4.9421 2.9736 0.8462 1.9685 -252.9326 -214.2682 -0.8613 -0.8617
0.3705 2.5792 806 0.5512 4.9861 2.9682 0.8654 2.0178 -252.9866 -213.8285 -0.8641 -0.8686
0.3718 2.7776 868 0.5555 5.0071 2.9724 0.8462 2.0347 -252.9452 -213.6185 -0.8636 -0.8680
0.4001 2.976 930 0.5523 4.9937 2.9393 0.8462 2.0544 -253.2762 -213.7525 -0.8742 -0.8781

Framework versions

  • Transformers 4.43.3
  • Pytorch 2.2.2+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
30
Safetensors
Model size
6.74B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for AmberYifan/Llama-2-7b-gen-dpo-10k

Finetuned
(591)
this model