--- base_model: HuggingFaceTB/smollm2-135M-8k-lc100k-mix1-ep2 tags: - alignment-handbook - trl - dpo - generated_from_trainer - trl - dpo - generated_from_trainer datasets: - HuggingFaceH4/ultrafeedback_binarized model-index: - name: smollm2-135M-8k-lc100k-dpo-ultaf-ep2 results: [] --- [Visualize in Weights & Biases](https://wandb.ai/loubnabnl/huggingface/runs/3el89rp6) # smollm2-135M-8k-lc100k-dpo-ultaf-ep2 This model is a fine-tuned version of [HuggingFaceTB/smollm2-135M-8k-lc100k-mix1-ep2](https://huggingface.co/HuggingFaceTB/smollm2-135M-8k-lc100k-mix1-ep2) on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set: - Loss: 0.6741 - Rewards/chosen: -0.0719 - Rewards/rejected: -0.3407 - Rewards/accuracies: 0.6151 - Rewards/margins: 0.2687 - Logps/rejected: -378.1583 - Logps/chosen: -443.6482 - Logits/rejected: 4.9520 - Logits/chosen: 4.6009 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-06 - train_batch_size: 2 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 8 - total_train_batch_size: 128 - total_eval_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 2 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.7296 | 0.2094 | 100 | 0.7357 | 0.0117 | -0.0252 | 0.5516 | 0.0369 | -377.5274 | -443.4810 | 5.1272 | 4.7554 | | 0.7062 | 0.4187 | 200 | 0.6988 | -0.0251 | -0.0968 | 0.5675 | 0.0717 | -377.6706 | -443.5545 | 5.0879 | 4.7255 | | 0.6782 | 0.6281 | 300 | 0.6943 | -0.0323 | -0.2031 | 0.5675 | 0.1708 | -377.8831 | -443.5688 | 5.0161 | 4.6621 | | 0.6863 | 0.8375 | 400 | 0.6757 | -0.0882 | -0.2789 | 0.5992 | 0.1907 | -378.0348 | -443.6808 | 4.9992 | 4.6459 | | 0.6836 | 1.0468 | 500 | 0.6708 | -0.0957 | -0.3325 | 0.6349 | 0.2368 | -378.1419 | -443.6958 | 4.9696 | 4.6170 | | 0.6349 | 1.2562 | 600 | 0.6720 | -0.0539 | -0.3214 | 0.5992 | 0.2675 | -378.1197 | -443.6121 | 4.9707 | 4.6203 | | 0.6427 | 1.4656 | 700 | 0.6796 | -0.0877 | -0.3456 | 0.6032 | 0.2579 | -378.1681 | -443.6797 | 4.9430 | 4.5920 | | 0.6128 | 1.6750 | 800 | 0.6704 | -0.0604 | -0.3680 | 0.6071 | 0.3075 | -378.2128 | -443.6252 | 4.9689 | 4.6106 | | 0.6474 | 1.8843 | 900 | 0.6692 | -0.0590 | -0.3703 | 0.6270 | 0.3113 | -378.2174 | -443.6223 | 4.9211 | 4.5737 | ### Framework versions - Transformers 4.42.3 - Pytorch 2.1.2 - Datasets 2.20.0 - Tokenizers 0.19.1