Edit model card

Llama-2-7b-hf-eval_threapist-ORPO-filtered-0.2-version-1

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5471
  • Rewards/chosen: -0.1206
  • Rewards/rejected: -0.1280
  • Rewards/accuracies: 0.6500
  • Rewards/margins: 0.0074
  • Logps/rejected: -1.2803
  • Logps/chosen: -1.2064
  • Logits/rejected: -1.4351
  • Logits/chosen: -1.4353
  • Nll Loss: 0.4721
  • Log Odds Ratio: -0.7502
  • Log Odds Chosen: 0.0744

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Nll Loss Log Odds Ratio Log Odds Chosen
0.8049 0.6 141 0.8303 -0.2121 -0.2243 0.6500 0.0122 -2.2432 -2.1211 -0.7703 -0.7698 0.7515 -0.7874 0.1083
0.5299 1.2 282 0.5796 -0.1515 -0.1597 0.6500 0.0081 -1.5968 -1.5153 -1.2952 -1.2957 0.5038 -0.7583 0.0737
0.595 1.8 423 0.5535 -0.1227 -0.1296 0.6500 0.0070 -1.2962 -1.2265 -1.4015 -1.4015 0.4782 -0.7532 0.0683
0.6024 2.4 564 0.5476 -0.1216 -0.1293 0.6500 0.0077 -1.2929 -1.2164 -1.4239 -1.4238 0.4727 -0.7491 0.0768
0.59 3.0 705 0.5471 -0.1206 -0.1280 0.6500 0.0074 -1.2803 -1.2064 -1.4351 -1.4353 0.4721 -0.7502 0.0744

Framework versions

  • PEFT 0.10.0
  • Transformers 4.40.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for LBK95/Llama-2-7b-hf-eval_threapist-ORPO-filtered-0.2-version-1

Adapter
(1099)
this model