pythia410m-dpo-tldr-lr1e-5
This model is a fine-tuned version of mnoukhov/pythia410m-sft-tldr on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.5595
- Rewards/chosen: -0.9059
- Rewards/rejected: -1.3735
- Rewards/accuracies: 0.7113
- Rewards/margins: 0.4677
- Logps/rejected: -88.3830
- Logps/chosen: -88.3830
- Logps/ref Rejected: -63.5119
- Logps/ref Chosen: -70.2656
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- num_epochs: 1.0
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logps/ref Rejected | Logps/ref Chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6295 | 0.2 | 291 | 0.5864 | -0.5101 | -0.8319 | 0.7039 | 0.3218 | -80.4685 | -80.4685 | -63.5119 | -70.2656 |
0.5926 | 0.4 | 582 | 0.5600 | -0.9009 | -1.3738 | 0.7120 | 0.4728 | -88.2839 | -88.2839 | -63.5119 | -70.2656 |
0.5761 | 0.6 | 873 | 0.5585 | -0.9509 | -1.4326 | 0.7110 | 0.4817 | -89.2846 | -89.2846 | -63.5119 | -70.2656 |
0.5678 | 0.8 | 1164 | 0.5595 | -0.9059 | -1.3735 | 0.7113 | 0.4677 | -88.3830 | -88.3830 | -63.5119 | -70.2656 |
Framework versions
- PEFT 0.10.0
- Transformers 4.38.2
- Pytorch 2.1.2+cu121
- Datasets 2.17.0
- Tokenizers 0.15.2
- Downloads last month
- 4
Model tree for mnoukhov/pythia410m-dpo-tldr-lr1e-5
Base model
EleutherAI/pythia-410m-deduped
Finetuned
mnoukhov/pythia410m-sft-tldr