loubnabnl's picture
loubnabnl HF staff
Duplicate from HuggingFaceTB/smollm2-135M-8k-lc100k-dpo-ultaf-ep2
e682e54 verified
|
raw
history blame
4.09 kB
metadata
base_model: HuggingFaceTB/smollm2-135M-8k-lc100k-mix1-ep2
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
model-index:
  - name: smollm2-135M-8k-lc100k-dpo-ultaf-ep2
    results: []

Visualize in Weights & Biases

smollm2-135M-8k-lc100k-dpo-ultaf-ep2

This model is a fine-tuned version of HuggingFaceTB/smollm2-135M-8k-lc100k-mix1-ep2 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6741
  • Rewards/chosen: -0.0719
  • Rewards/rejected: -0.3407
  • Rewards/accuracies: 0.6151
  • Rewards/margins: 0.2687
  • Logps/rejected: -378.1583
  • Logps/chosen: -443.6482
  • Logits/rejected: 4.9520
  • Logits/chosen: 4.6009

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7296 0.2094 100 0.7357 0.0117 -0.0252 0.5516 0.0369 -377.5274 -443.4810 5.1272 4.7554
0.7062 0.4187 200 0.6988 -0.0251 -0.0968 0.5675 0.0717 -377.6706 -443.5545 5.0879 4.7255
0.6782 0.6281 300 0.6943 -0.0323 -0.2031 0.5675 0.1708 -377.8831 -443.5688 5.0161 4.6621
0.6863 0.8375 400 0.6757 -0.0882 -0.2789 0.5992 0.1907 -378.0348 -443.6808 4.9992 4.6459
0.6836 1.0468 500 0.6708 -0.0957 -0.3325 0.6349 0.2368 -378.1419 -443.6958 4.9696 4.6170
0.6349 1.2562 600 0.6720 -0.0539 -0.3214 0.5992 0.2675 -378.1197 -443.6121 4.9707 4.6203
0.6427 1.4656 700 0.6796 -0.0877 -0.3456 0.6032 0.2579 -378.1681 -443.6797 4.9430 4.5920
0.6128 1.6750 800 0.6704 -0.0604 -0.3680 0.6071 0.3075 -378.2128 -443.6252 4.9689 4.6106
0.6474 1.8843 900 0.6692 -0.0590 -0.3703 0.6270 0.3113 -378.2174 -443.6223 4.9211 4.5737

Framework versions

  • Transformers 4.42.3
  • Pytorch 2.1.2
  • Datasets 2.20.0
  • Tokenizers 0.19.1