metadata
base_model: HuggingFaceTB/smollm2-135M-8k-lc100k-mix1-ep2
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: smollm2-135M-8k-lc100k-dpo-ultaf-ep2
results: []
smollm2-135M-8k-lc100k-dpo-ultaf-ep2
This model is a fine-tuned version of HuggingFaceTB/smollm2-135M-8k-lc100k-mix1-ep2 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.6741
- Rewards/chosen: -0.0719
- Rewards/rejected: -0.3407
- Rewards/accuracies: 0.6151
- Rewards/margins: 0.2687
- Logps/rejected: -378.1583
- Logps/chosen: -443.6482
- Logits/rejected: 4.9520
- Logits/chosen: 4.6009
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 2
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 8
- total_train_batch_size: 128
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.7296 | 0.2094 | 100 | 0.7357 | 0.0117 | -0.0252 | 0.5516 | 0.0369 | -377.5274 | -443.4810 | 5.1272 | 4.7554 |
0.7062 | 0.4187 | 200 | 0.6988 | -0.0251 | -0.0968 | 0.5675 | 0.0717 | -377.6706 | -443.5545 | 5.0879 | 4.7255 |
0.6782 | 0.6281 | 300 | 0.6943 | -0.0323 | -0.2031 | 0.5675 | 0.1708 | -377.8831 | -443.5688 | 5.0161 | 4.6621 |
0.6863 | 0.8375 | 400 | 0.6757 | -0.0882 | -0.2789 | 0.5992 | 0.1907 | -378.0348 | -443.6808 | 4.9992 | 4.6459 |
0.6836 | 1.0468 | 500 | 0.6708 | -0.0957 | -0.3325 | 0.6349 | 0.2368 | -378.1419 | -443.6958 | 4.9696 | 4.6170 |
0.6349 | 1.2562 | 600 | 0.6720 | -0.0539 | -0.3214 | 0.5992 | 0.2675 | -378.1197 | -443.6121 | 4.9707 | 4.6203 |
0.6427 | 1.4656 | 700 | 0.6796 | -0.0877 | -0.3456 | 0.6032 | 0.2579 | -378.1681 | -443.6797 | 4.9430 | 4.5920 |
0.6128 | 1.6750 | 800 | 0.6704 | -0.0604 | -0.3680 | 0.6071 | 0.3075 | -378.2128 | -443.6252 | 4.9689 | 4.6106 |
0.6474 | 1.8843 | 900 | 0.6692 | -0.0590 | -0.3703 | 0.6270 | 0.3113 | -378.2174 | -443.6223 | 4.9211 | 4.5737 |
Framework versions
- Transformers 4.42.3
- Pytorch 2.1.2
- Datasets 2.20.0
- Tokenizers 0.19.1