mnoukhov commited on
Commit
5ac1e2b
1 Parent(s): f1fb321

mnoukhov/pythia410m-dpo2-tldr

Browse files
Files changed (2) hide show
  1. README.md +18 -15
  2. adapter_model.safetensors +1 -1
README.md CHANGED
@@ -16,15 +16,15 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  This model is a fine-tuned version of [mnoukhov/pythia410m-sft-tldr](https://huggingface.co/mnoukhov/pythia410m-sft-tldr) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 0.6550
20
- - Rewards/chosen: -0.1077
21
- - Rewards/rejected: -0.2150
22
- - Rewards/accuracies: 0.6269
23
- - Rewards/margins: 0.1073
24
- - Logps/rejected: -72.4202
25
- - Logps/chosen: -72.4202
26
- - Logps/ref Rejected: -63.5119
27
- - Logps/ref Chosen: -70.2656
28
 
29
  ## Model description
30
 
@@ -43,25 +43,28 @@ More information needed
43
  ### Training hyperparameters
44
 
45
  The following hyperparameters were used during training:
46
- - learning_rate: 3e-05
47
- - train_batch_size: 16
48
  - eval_batch_size: 8
49
  - seed: 42
50
  - distributed_type: multi-GPU
 
51
  - gradient_accumulation_steps: 4
52
  - total_train_batch_size: 64
 
53
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
54
  - lr_scheduler_type: cosine
55
  - num_epochs: 1.0
 
56
 
57
  ### Training results
58
 
59
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logps/ref Rejected | Logps/ref Chosen |
60
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:------------------:|:----------------:|
61
- | No log | 0.2016 | 63 | 0.7000 | -0.0093 | -0.0090 | 0.5043 | -0.0003 | -70.4516 | -70.4516 | -63.5119 | -70.2656 |
62
- | 0.6682 | 0.4032 | 126 | 0.6714 | -0.0226 | -0.0861 | 0.5957 | 0.0635 | -70.7182 | -70.7182 | -63.5119 | -70.2656 |
63
- | 0.6682 | 0.6048 | 189 | 0.6623 | -0.0820 | -0.1707 | 0.6142 | 0.0887 | -71.9055 | -71.9055 | -63.5119 | -70.2656 |
64
- | 0.6423 | 0.8064 | 252 | 0.6550 | -0.1077 | -0.2150 | 0.6269 | 0.1073 | -72.4202 | -72.4202 | -63.5119 | -70.2656 |
65
 
66
 
67
  ### Framework versions
 
16
 
17
  This model is a fine-tuned version of [mnoukhov/pythia410m-sft-tldr](https://huggingface.co/mnoukhov/pythia410m-sft-tldr) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
+ - Loss: 0.6237
20
+ - Rewards/chosen: -2.4342
21
+ - Rewards/rejected: -2.7481
22
+ - Rewards/accuracies: 0.6489
23
+ - Rewards/margins: 0.3139
24
+ - Logps/rejected: -114.3425
25
+ - Logps/chosen: -114.3425
26
+ - Logps/ref Rejected: -59.5615
27
+ - Logps/ref Chosen: -65.6594
28
 
29
  ## Model description
30
 
 
43
  ### Training hyperparameters
44
 
45
  The following hyperparameters were used during training:
46
+ - learning_rate: 1e-05
47
+ - train_batch_size: 4
48
  - eval_batch_size: 8
49
  - seed: 42
50
  - distributed_type: multi-GPU
51
+ - num_devices: 4
52
  - gradient_accumulation_steps: 4
53
  - total_train_batch_size: 64
54
+ - total_eval_batch_size: 32
55
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
56
  - lr_scheduler_type: cosine
57
  - num_epochs: 1.0
58
+ - mixed_precision_training: Native AMP
59
 
60
  ### Training results
61
 
62
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logps/ref Rejected | Logps/ref Chosen |
63
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:------------------:|:----------------:|
64
+ | No log | 0.2016 | 63 | 0.6296 | -0.3668 | -0.5301 | 0.6780 | 0.1633 | -72.9961 | -72.9961 | -59.5615 | -65.6594 |
65
+ | 0.5884 | 0.4032 | 126 | 0.6180 | -1.5625 | -1.8194 | 0.6596 | 0.2569 | -96.9089 | -96.9089 | -59.5615 | -65.6594 |
66
+ | 0.5884 | 0.6048 | 189 | 0.6302 | -2.3096 | -2.5816 | 0.6445 | 0.2720 | -111.8520 | -111.8520 | -59.5615 | -65.6594 |
67
+ | 0.4615 | 0.8064 | 252 | 0.6237 | -2.4342 | -2.7481 | 0.6489 | 0.3139 | -114.3425 | -114.3425 | -59.5615 | -65.6594 |
68
 
69
 
70
  ### Framework versions
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ad6230f86f228c40c1e6d7bc9fe8fb2d0bbdd0accb1065d02ed94e674d97f81f
3
  size 25192592
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:69b6220103f85750164fc3f8c7dc64d08043a12b521325782ae63aef087193e6
3
  size 25192592