File size: 10,457 Bytes
577946f
 
 
 
e4d0aae
577946f
 
 
e4d0aae
 
 
 
 
577946f
 
 
 
 
 
 
 
 
 
e4d0aae
577946f
e4d0aae
 
 
 
 
 
 
 
 
577946f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
license: apache-2.0
base_model: martimfasantos/tinyllama-1.1b-sum-sft-full
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- openai/summarize_from_feedback
model-index:
- name: tinyllama-1.1b-sum-dpo-full_LR5e-7_3epochs
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# tinyllama-1.1b-sum-dpo-full_LR5e-7_3epochs

This model is a fine-tuned version of [martimfasantos/tinyllama-1.1b-sum-sft-full](https://huggingface.co/martimfasantos/tinyllama-1.1b-sum-sft-full) on the openai/summarize_from_feedback dataset.
It achieves the following results on the evaluation set:
- Loss: 0.7099
- Rewards/chosen: -2.8601
- Rewards/rejected: -3.4154
- Rewards/accuracies: 0.6320
- Rewards/margins: 0.5553
- Logps/rejected: -404.2897
- Logps/chosen: -345.0273
- Logits/rejected: -1.9822
- Logits/chosen: -2.0068

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step  | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.689         | 0.0689 | 400   | 0.6921          | 0.0010         | -0.0011          | 0.5616             | 0.0021          | -62.8638       | -58.9160     | -2.9633         | -2.9669       |
| 0.6822        | 0.1378 | 800   | 0.6861          | -0.0503        | -0.0663          | 0.5746             | 0.0160          | -69.3792       | -64.0464     | -2.9255         | -2.9291       |
| 0.6737        | 0.2068 | 1200  | 0.6780          | -0.2790        | -0.3169          | 0.5762             | 0.0379          | -94.4367       | -86.9165     | -2.8527         | -2.8562       |
| 0.6648        | 0.2757 | 1600  | 0.6677          | -0.4500        | -0.5183          | 0.6029             | 0.0683          | -114.5829      | -104.0142    | -2.7578         | -2.7612       |
| 0.6678        | 0.3446 | 2000  | 0.6576          | -0.7094        | -0.8175          | 0.6217             | 0.1081          | -144.4979      | -129.9582    | -2.6611         | -2.6651       |
| 0.6253        | 0.4135 | 2400  | 0.6468          | -1.0987        | -1.2558          | 0.6236             | 0.1571          | -188.3249      | -168.8844    | -2.4966         | -2.5038       |
| 0.6616        | 0.4824 | 2800  | 0.6473          | -0.7839        | -0.9244          | 0.6303             | 0.1405          | -155.1877      | -137.4051    | -2.4668         | -2.4737       |
| 0.6282        | 0.5513 | 3200  | 0.6395          | -1.3763        | -1.5943          | 0.6331             | 0.2181          | -222.1840      | -196.6437    | -2.2441         | -2.2573       |
| 0.5886        | 0.6203 | 3600  | 0.6382          | -1.2763        | -1.4872          | 0.6355             | 0.2109          | -211.4734      | -186.6474    | -2.1487         | -2.1634       |
| 0.5903        | 0.6892 | 4000  | 0.6398          | -1.0104        | -1.2131          | 0.6366             | 0.2027          | -184.0546      | -160.0534    | -2.1888         | -2.2035       |
| 0.5886        | 0.7581 | 4400  | 0.6349          | -1.2844        | -1.5732          | 0.6341             | 0.2888          | -220.0676      | -187.4508    | -2.0898         | -2.1111       |
| 0.5907        | 0.8270 | 4800  | 0.6306          | -1.3443        | -1.6135          | 0.6478             | 0.2692          | -224.0959      | -193.4449    | -2.0942         | -2.1137       |
| 0.5456        | 0.8959 | 5200  | 0.6327          | -1.1753        | -1.4199          | 0.6408             | 0.2446          | -204.7423      | -176.5441    | -2.1214         | -2.1394       |
| 0.5465        | 0.9649 | 5600  | 0.6325          | -1.2769        | -1.5500          | 0.6371             | 0.2731          | -217.7467      | -186.7071    | -2.0669         | -2.0872       |
| 0.4632        | 1.0338 | 6000  | 0.6484          | -2.1822        | -2.6404          | 0.6496             | 0.4582          | -326.7876      | -277.2339    | -1.8836         | -1.9125       |
| 0.4736        | 1.1027 | 6400  | 0.6454          | -2.1568        | -2.5961          | 0.6547             | 0.4393          | -322.3579      | -274.6943    | -1.8531         | -1.8794       |
| 0.4665        | 1.1716 | 6800  | 0.6386          | -1.8958        | -2.2728          | 0.6443             | 0.3770          | -290.0295      | -248.5992    | -1.8821         | -1.9042       |
| 0.4789        | 1.2405 | 7200  | 0.6483          | -1.9198        | -2.2931          | 0.6403             | 0.3733          | -292.0611      | -250.9941    | -1.9443         | -1.9659       |
| 0.5477        | 1.3094 | 7600  | 0.6413          | -1.7843        | -2.1677          | 0.6499             | 0.3834          | -279.5165      | -237.4425    | -1.9622         | -1.9845       |
| 0.4423        | 1.3784 | 8000  | 0.6528          | -2.0003        | -2.3620          | 0.6415             | 0.3617          | -298.9479      | -259.0417    | -1.9266         | -1.9469       |
| 0.4668        | 1.4473 | 8400  | 0.6515          | -1.8405        | -2.1818          | 0.6403             | 0.3413          | -280.9325      | -243.0684    | -1.9825         | -2.0027       |
| 0.509         | 1.5162 | 8800  | 0.6471          | -1.9547        | -2.3166          | 0.6424             | 0.3619          | -294.4091      | -254.4828    | -2.0224         | -2.0422       |
| 0.4177        | 1.5851 | 9200  | 0.6542          | -1.9336        | -2.3034          | 0.6392             | 0.3699          | -293.0923      | -252.3707    | -1.9854         | -2.0064       |
| 0.4181        | 1.6540 | 9600  | 0.6626          | -2.3352        | -2.8057          | 0.6438             | 0.4706          | -343.3230      | -292.5314    | -1.9265         | -1.9501       |
| 0.4469        | 1.7229 | 10000 | 0.6436          | -1.8037        | -2.1726          | 0.6431             | 0.3689          | -280.0089      | -239.3807    | -2.0388         | -2.0591       |
| 0.4365        | 1.7919 | 10400 | 0.6446          | -1.7691        | -2.1263          | 0.6466             | 0.3572          | -275.3837      | -235.9303    | -2.0443         | -2.0637       |
| 0.4488        | 1.8608 | 10800 | 0.6558          | -2.1203        | -2.5393          | 0.6450             | 0.4190          | -316.6843      | -271.0489    | -2.0317         | -2.0535       |
| 0.4611        | 1.9297 | 11200 | 0.6646          | -2.4708        | -2.9416          | 0.6468             | 0.4708          | -356.9083      | -306.0948    | -1.9987         | -2.0224       |
| 0.4546        | 1.9986 | 11600 | 0.6541          | -2.2751        | -2.7321          | 0.6436             | 0.4570          | -335.9583      | -286.5284    | -1.9967         | -2.0195       |
| 0.3836        | 2.0675 | 12000 | 0.6827          | -2.7558        | -3.3214          | 0.6464             | 0.5655          | -394.8881      | -334.6001    | -1.9585         | -1.9844       |
| 0.337         | 2.1365 | 12400 | 0.7083          | -3.2136        | -3.8269          | 0.6424             | 0.6132          | -445.4347      | -380.3789    | -1.9217         | -1.9480       |
| 0.3756        | 2.2054 | 12800 | 0.6892          | -2.5637        | -3.0760          | 0.6378             | 0.5123          | -370.3519      | -315.3893    | -1.9938         | -2.0171       |
| 0.4071        | 2.2743 | 13200 | 0.6989          | -2.7240        | -3.2763          | 0.6345             | 0.5523          | -390.3795      | -331.4143    | -1.9810         | -2.0059       |
| 0.4236        | 2.3432 | 13600 | 0.7127          | -2.9174        | -3.4982          | 0.6329             | 0.5808          | -412.5668      | -350.7576    | -1.9542         | -1.9798       |
| 0.3527        | 2.4121 | 14000 | 0.7006          | -2.6980        | -3.2475          | 0.6252             | 0.5496          | -387.5038      | -328.8109    | -1.9852         | -2.0098       |
| 0.3258        | 2.4810 | 14400 | 0.7095          | -2.9212        | -3.5009          | 0.6292             | 0.5798          | -412.8438      | -351.1316    | -1.9581         | -1.9835       |
| 0.3646        | 2.5500 | 14800 | 0.7041          | -2.7281        | -3.2711          | 0.6350             | 0.5430          | -389.8630      | -331.8257    | -1.9884         | -2.0127       |
| 0.3596        | 2.6189 | 15200 | 0.7046          | -2.7894        | -3.3372          | 0.6359             | 0.5478          | -396.4674      | -337.9509    | -1.9862         | -2.0104       |
| 0.3549        | 2.6878 | 15600 | 0.7067          | -2.8436        | -3.3930          | 0.6310             | 0.5494          | -402.0518      | -343.3737    | -1.9841         | -2.0084       |
| 0.2868        | 2.7567 | 16000 | 0.7117          | -2.9064        | -3.4673          | 0.6289             | 0.5609          | -409.4747      | -349.6523    | -1.9770         | -2.0016       |
| 0.3243        | 2.8256 | 16400 | 0.7086          | -2.8350        | -3.3883          | 0.6320             | 0.5533          | -401.5786      | -342.5143    | -1.9841         | -2.0085       |
| 0.3963        | 2.8946 | 16800 | 0.7104          | -2.8648        | -3.4205          | 0.6301             | 0.5558          | -404.8014      | -345.4919    | -1.9835         | -2.0081       |
| 0.3399        | 2.9635 | 17200 | 0.7095          | -2.8594        | -3.4153          | 0.6336             | 0.5559          | -404.2798      | -344.9560    | -1.9830         | -2.0075       |


### Framework versions

- Transformers 4.41.2
- Pytorch 2.1.2
- Datasets 2.19.2
- Tokenizers 0.19.1