Edit model card

BEE-spoke-data/zephyr-220m-dpo-full-GGUF

Quantized GGUF model files for zephyr-220m-dpo-full from BEE-spoke-data

Original Model Card:

zephyr-220m-dpo-full

This model is a fine-tuned version of amazingvince/zephyr-220m-sft-full on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5608
  • Rewards/chosen: 0.4691
  • Rewards/rejected: -0.0455
  • Rewards/accuracies: 0.6930
  • Rewards/margins: 0.5145
  • Logps/rejected: -438.4595
  • Logps/chosen: -544.6858
  • Logits/rejected: -4.0092
  • Logits/chosen: -3.9839

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6906 0.03 100 0.6932 0.0008 0.0007 0.4860 0.0002 -437.9984 -549.3683 -4.0893 -4.0515
0.6844 0.05 200 0.6855 0.0323 0.0173 0.5640 0.0150 -437.8319 -549.0540 -4.0871 -4.0501
0.6685 0.08 300 0.6675 0.1075 0.0537 0.6160 0.0538 -437.4682 -548.3016 -4.0788 -4.0432
0.6579 0.1 400 0.6426 0.2153 0.0941 0.6430 0.1212 -437.0637 -547.2234 -4.0645 -4.0309
0.6331 0.13 500 0.6241 0.2980 0.1106 0.6430 0.1874 -436.8989 -546.3970 -4.0525 -4.0221
0.6229 0.15 600 0.6138 0.3428 0.1103 0.6580 0.2325 -436.9023 -545.9487 -4.0402 -4.0116
0.6008 0.18 700 0.6053 0.3822 0.0970 0.6560 0.2852 -437.0354 -545.5550 -4.0301 -4.0042
0.5751 0.21 800 0.5998 0.4077 0.0879 0.6540 0.3198 -437.1260 -545.2994 -4.0359 -4.0099
0.6485 0.23 900 0.5922 0.4208 0.0655 0.6600 0.3553 -437.3501 -545.1683 -4.0167 -3.9936
0.6164 0.26 1000 0.5880 0.4046 0.0287 0.6620 0.3759 -437.7182 -545.3309 -4.0092 -3.9869
0.6225 0.28 1100 0.5852 0.4058 0.0110 0.6680 0.3948 -437.8951 -545.3189 -4.0240 -3.9984
0.6289 0.31 1200 0.5824 0.4127 0.0078 0.6670 0.4048 -437.9265 -545.2498 -4.0253 -3.9994
0.5818 0.34 1300 0.5818 0.4222 0.0097 0.6680 0.4125 -437.9080 -545.1544 -4.0212 -3.9953
0.567 0.36 1400 0.5797 0.4098 -0.0141 0.6730 0.4238 -438.1456 -545.2791 -4.0333 -4.0062
0.5659 0.39 1500 0.5790 0.4204 -0.0154 0.6780 0.4358 -438.1591 -545.1725 -4.0245 -3.9963
0.5993 0.41 1600 0.5783 0.4161 -0.0285 0.6720 0.4446 -438.2904 -545.2161 -4.0185 -3.9907
0.5999 0.44 1700 0.5767 0.4067 -0.0468 0.6840 0.4535 -438.4729 -545.3095 -4.0207 -3.9935
0.6004 0.46 1800 0.5731 0.4233 -0.0394 0.6830 0.4627 -438.3991 -545.1437 -4.0219 -3.9944
0.5349 0.49 1900 0.5720 0.4285 -0.0429 0.6830 0.4714 -438.4335 -545.0914 -4.0295 -4.0012
0.5377 0.52 2000 0.5702 0.4255 -0.0540 0.6850 0.4795 -438.5449 -545.1220 -4.0290 -4.0009
0.4988 0.54 2100 0.5713 0.4347 -0.0548 0.6840 0.4895 -438.5533 -545.0299 -4.0317 -4.0039
0.6093 0.57 2200 0.5706 0.4464 -0.0456 0.6810 0.4920 -438.4607 -544.9128 -4.0288 -4.0014
0.5356 0.59 2300 0.5689 0.4484 -0.0486 0.6880 0.4971 -438.4912 -544.8922 -4.0257 -3.9986
0.5753 0.62 2400 0.5681 0.4596 -0.0441 0.6850 0.5037 -438.4457 -544.7802 -4.0100 -3.9846
0.5709 0.65 2500 0.5673 0.4693 -0.0387 0.6910 0.5081 -438.3924 -544.6835 -4.0100 -3.9849
0.5565 0.67 2600 0.5665 0.4692 -0.0401 0.6820 0.5092 -438.4054 -544.6850 -4.0096 -3.9843
0.585 0.7 2700 0.5650 0.4780 -0.0351 0.6940 0.5131 -438.3558 -544.5962 -4.0074 -3.9820
0.5883 0.72 2800 0.5670 0.4914 -0.0151 0.6880 0.5066 -438.1562 -544.4624 -3.9894 -3.9669
0.624 0.75 2900 0.5663 0.4877 -0.0191 0.6840 0.5068 -438.1958 -544.4997 -3.9935 -3.9705
0.5347 0.77 3000 0.5644 0.4757 -0.0335 0.6850 0.5092 -438.3401 -544.6199 -4.0019 -3.9777
0.5837 0.8 3100 0.5637 0.4783 -0.0302 0.6830 0.5085 -438.3073 -544.5936 -3.9976 -3.9742
0.5293 0.83 3200 0.5634 0.4715 -0.0363 0.6890 0.5078 -438.3679 -544.6616 -4.0023 -3.9778
0.5128 0.85 3300 0.5620 0.4745 -0.0387 0.6880 0.5131 -438.3917 -544.6319 -4.0053 -3.9804
0.6204 0.88 3400 0.5625 0.4679 -0.0442 0.6860 0.5121 -438.4469 -544.6978 -4.0067 -3.9815
0.5469 0.9 3500 0.5618 0.4612 -0.0491 0.6860 0.5102 -438.4956 -544.7651 -4.0098 -3.9843
0.5807 0.93 3600 0.5615 0.4675 -0.0454 0.6890 0.5129 -438.4584 -544.7015 -4.0068 -3.9818
0.5265 0.96 3700 0.5620 0.4675 -0.0435 0.6880 0.5110 -438.4403 -544.7019 -4.0082 -3.9833
0.5484 0.98 3800 0.5615 0.4685 -0.0449 0.6930 0.5133 -438.4536 -544.6919 -4.0103 -3.9851

Framework versions

  • Transformers 4.37.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.15.0
  • Tokenizers 0.15.0

https://wandb.ai/amazingvince/huggingface/runs/z71h0hc3?workspace=user-amazingvince

Downloads last month
60
GGUF
Model size
218M params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Examples
Inference API (serverless) has been turned off for this model.

Model tree for afrideva/zephyr-220m-dpo-full-GGUF

Dataset used to train afrideva/zephyr-220m-dpo-full-GGUF