Text Generation
Transformers
Safetensors
mistral
alignment-handbook
Generated from Trainer
text-generation-inference
Inference Endpoints
fblgit commited on
Commit
91322ed
1 Parent(s): 6e00a8e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +211 -1
README.md CHANGED
@@ -1,3 +1,213 @@
1
  ---
2
- license: cc-by-nc-nd-4.0
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: fblgit/zephyr-lora-dpo-b1
3
+ tags:
4
+ - alignment-handbook
5
+ - generated_from_trainer
6
+ datasets:
7
+ - HuggingFaceH4/ultrafeedback_binarized
8
+ model-index:
9
+ - name: juanako-7b-v1
10
+ results: []
11
+ license: artistic-2.0
12
  ---
13
+
14
+ # juanako-7b-v1
15
+
16
+ This model is a fine-tuned version of [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1) on the HuggingFaceH4/ultrafeedback_binarized dataset.
17
+ It achieves the following results on the evaluation set:
18
+ - Loss: 0.4594
19
+ - Rewards/chosen: -1.1095
20
+ - Rewards/rejected: -2.3132
21
+ - Rewards/accuracies: 0.7964
22
+ - Rewards/margins: 1.2037
23
+ - Logps/rejected: -220.0052
24
+ - Logps/chosen: -217.5506
25
+ - Logits/rejected: -2.5535
26
+ - Logits/chosen: -2.7973
27
+
28
+ ## Model description
29
+
30
+ **It seems to outperforms the original Zephyr in most of the tasks.**
31
+
32
+ I trained Juanako with the same datasets and trainer from [alignment-handbook/zephyr-7b-sft-lora](https://huggingface.co/alignment-handbook/zephyr-7b-sft-lora)
33
+ * 1 epoch on DPO with transformers-UNA, the result is [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1) after merge using FastChat converter.
34
+ * finally 1 epoch on DPO with transformers-UNA to [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1).
35
+
36
+ Some other experiments were performed as well to test transformers-UNA capabilities on diverse scenarios and models.
37
+
38
+ **This is a complete version of the model, the result of converting LoRa's**
39
+
40
+ ## Intended uses & limitations
41
+
42
+ Research purposes.
43
+
44
+ ## Training and evaluation data
45
+
46
+ alignment-handbook DPO with UNA on top of the SFT lora.
47
+
48
+ ### Evaluation lm-evaluation-harness
49
+ #### 0-Shot
50
+ ```
51
+ hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 0, batch_size: 8
52
+ ```
53
+ | Tasks |Version|Filter| Metric | Value | |Stderr|
54
+ |-------------------|-------|------|-----------|------:|---|-----:|
55
+ |arc_challenge |Yaml |none |acc | 0.5691|± |0.0145|
56
+ | | |none |acc_norm | 0.6041|± |0.0143|
57
+ |arc_easy |Yaml |none |acc | 0.8363|± |0.0076|
58
+ | | |none |acc_norm | 0.8161|± |0.0079|
59
+ |hellaswag |Yaml |none |acc | 0.6554|± |0.0047|
60
+ | | |none |acc_norm | 0.8411|± |0.0036|
61
+ |boolq |Yaml |none |acc | 0.8355|± |0.0065|
62
+ |lambada |N/A |none |perplexity | 3.3607|± |0.1398|
63
+ | | |none |acc | 0.7309|± |0.0137|
64
+ |piqa |Yaml |none |acc | 0.8194|± |0.0090|
65
+ | | |none |acc_norm | 0.8335|± |0.0087|
66
+ |sciq |Yaml |none |acc | 0.9480|± |0.0070|
67
+ | | |none |acc_norm | 0.8960|± |0.0097|
68
+ |truthfulqa |N/A |none |bleu_max |26.0803|± |0.6528|
69
+ | - truthfulqa_mc1 |Yaml |none |acc | 0.4198|± |0.0173|
70
+ | - truthfulqa_mc2 |Yaml |none |acc | 0.5847|± |0.0153|
71
+ |winogrande |Yaml |none |acc | 0.7609|± |0.0120|
72
+
73
+ #### 1-Shot
74
+ ```
75
+ hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 1, batch_size: 8
76
+ ```
77
+ | Tasks |Version|Filter| Metric | Value | |Stderr|
78
+ |-------------------|-------|------|-----------|------:|---|-----:|
79
+ |arc_challenge |Yaml |none |acc | 0.6084|± |0.0143|
80
+ | | |none |acc_norm | 0.6357|± |0.0141|
81
+ |arc_easy |Yaml |none |acc | 0.8645|± |0.0070|
82
+ | | |none |acc_norm | 0.8645|± |0.0070|
83
+ |hellaswag |Yaml |none |acc | 0.6475|± |0.0048|
84
+ | | |none |acc_norm | 0.8372|± |0.0037|
85
+ |boolq |Yaml |none |acc | 0.8609|± |0.0061|
86
+ |lambada |N/A |none |perplexity | 3.5484|± |0.1034|
87
+ | | |none |acc | 0.7207|± |0.0107|
88
+ |piqa |Yaml |none |acc | 0.8259|± |0.0088|
89
+ | | |none |acc_norm | 0.8384|± |0.0086|
90
+ |sciq |Yaml |none |acc | 0.9730|± |0.0051|
91
+ | | |none |acc_norm | 0.9740|± |0.0050|
92
+ |truthfulqa |N/A |none |bleu_max |18.9814|± |0.4805|
93
+ | | |none |acc | 0.4856|± |0.0521|
94
+ | - truthfulqa_mc1 |Yaml |none |acc | 0.4333|± |0.0173|
95
+ | - truthfulqa_mc2 |Yaml |none |acc | 0.5903|± |0.0153|
96
+ |winogrande |Yaml |none |acc | 0.7609|± |0.0120|
97
+
98
+ ## Training procedure
99
+
100
+ ### Training hyperparameters
101
+
102
+ The following hyperparameters were used during training:
103
+ - learning_rate: 0.0001
104
+ - train_batch_size: 1
105
+ - eval_batch_size: 1
106
+ - seed: 42
107
+ - distributed_type: multi-GPU
108
+ - num_devices: 12
109
+ - gradient_accumulation_steps: 16
110
+ - total_train_batch_size: 192
111
+ - total_eval_batch_size: 12
112
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
113
+ - lr_scheduler_type: linear
114
+ - lr_scheduler_warmup_ratio: 0.01
115
+ - num_epochs: 1
116
+
117
+ ### Training results
118
+
119
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
120
+ |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
121
+ | 0.4966 | 0.15 | 50 | 0.4893 | -1.1759 | -2.2914 | 0.7485 | 1.1155 | -219.7872 | -218.2148 | -2.5450 | -2.7884 |
122
+ | 0.4522 | 0.31 | 100 | 0.4808 | -0.8099 | -1.8893 | 0.7784 | 1.0794 | -215.7659 | -214.5544 | -2.5644 | -2.8095 |
123
+ | 0.5048 | 0.46 | 150 | 0.4706 | -1.0526 | -2.1412 | 0.7725 | 1.0887 | -218.2852 | -216.9814 | -2.5638 | -2.8089 |
124
+ | 0.4853 | 0.62 | 200 | 0.4640 | -1.0787 | -2.2821 | 0.7725 | 1.2034 | -219.6941 | -217.2426 | -2.5460 | -2.7891 |
125
+ | 0.4639 | 0.77 | 250 | 0.4636 | -1.2348 | -2.4583 | 0.8084 | 1.2235 | -221.4559 | -218.8034 | -2.5533 | -2.7970 |
126
+ | 0.4634 | 0.93 | 300 | 0.4601 | -1.1370 | -2.3243 | 0.7964 | 1.1873 | -220.1163 | -217.8257 | -2.5540 | -2.7977 |
127
+ | - | 1.00 | 300 | 0.4594 | -1.1095 | -2.3132 | 0.7964 | 1.2037 | -220.0052 | -217.5506 | -2.5535 | -2.7973 |
128
+
129
+ ### Framework versions
130
+
131
+ - Transformers 4.35.0-UNA
132
+ - Pytorch 2.1.0
133
+ - Datasets 2.14.6
134
+ - Tokenizers 0.14.1
135
+
136
+ ## MMLU Results
137
+
138
+ #### 1-Shot
139
+ ```
140
+ hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 1, batch_size: 1
141
+ ```
142
+ | Tasks |Version|Filter|Metric|Value | |Stderr|
143
+ |---------------------------------------|-------|------|------|-----:|---|-----:|
144
+ |mmlu |N/A |none |acc |0.6085|± |0.1321|
145
+ | - humanities |N/A |none |acc |0.5405|± |0.1478|
146
+ | - formal_logic |Yaml |none |acc |0.4206|± |0.0442|
147
+ | - high_school_european_history |Yaml |none |acc |0.7576|± |0.0335|
148
+ | - high_school_us_history |Yaml |none |acc |0.8186|± |0.0270|
149
+ | - high_school_world_history |Yaml |none |acc |0.7890|± |0.0266|
150
+ | - international_law |Yaml |none |acc |0.7438|± |0.0398|
151
+ | - jurisprudence |Yaml |none |acc |0.8056|± |0.0383|
152
+ | - logical_fallacies |Yaml |none |acc |0.7791|± |0.0326|
153
+ | - moral_disputes |Yaml |none |acc |0.7023|± |0.0246|
154
+ | - moral_scenarios |Yaml |none |acc |0.2145|± |0.0137|
155
+ | - philosophy |Yaml |none |acc |0.7074|± |0.0258|
156
+ | - prehistory |Yaml |none |acc |0.7377|± |0.0245|
157
+ | - professional_law |Yaml |none |acc |0.4361|± |0.0127|
158
+ | - world_religions |Yaml |none |acc |0.8421|± |0.0280|
159
+ | - other |N/A |none |acc |0.6894|± |0.1091|
160
+ | - business_ethics |Yaml |none |acc |0.5600|± |0.0499|
161
+ | - clinical_knowledge |Yaml |none |acc |0.6981|± |0.0283|
162
+ | - college_medicine |Yaml |none |acc |0.6185|± |0.0370|
163
+ | - global_facts |Yaml |none |acc |0.3300|± |0.0473|
164
+ | - human_aging |Yaml |none |acc |0.6726|± |0.0315|
165
+ | - management |Yaml |none |acc |0.8058|± |0.0392|
166
+ | - marketing |Yaml |none |acc |0.8419|± |0.0239|
167
+ | - medical_genetics |Yaml |none |acc |0.7200|± |0.0451|
168
+ | - miscellaneous |Yaml |none |acc |0.8033|± |0.0142|
169
+ | - nutrition |Yaml |none |acc |0.7288|± |0.0255|
170
+ | - professional_accounting |Yaml |none |acc |0.4929|± |0.0298|
171
+ | - professional_medicine |Yaml |none |acc |0.6801|± |0.0283|
172
+ | - virology |Yaml |none |acc |0.5000|± |0.0389|
173
+ | - social_sciences |N/A |none |acc |0.7195|± |0.0676|
174
+ | - econometrics |Yaml |none |acc |0.5000|± |0.0470|
175
+ | - high_school_geography |Yaml |none |acc |0.7879|± |0.0291|
176
+ | - high_school_government_and_politics|Yaml |none |acc |0.8601|± |0.0250|
177
+ | - high_school_macroeconomics |Yaml |none |acc |0.6231|± |0.0246|
178
+ | - high_school_microeconomics |Yaml |none |acc |0.6471|± |0.0310|
179
+ | - high_school_psychology |Yaml |none |acc |0.8000|± |0.0171|
180
+ | - human_sexuality |Yaml |none |acc |0.7557|± |0.0377|
181
+ | - professional_psychology |Yaml |none |acc |0.6552|± |0.0192|
182
+ | - public_relations |Yaml |none |acc |0.6636|± |0.0453|
183
+ | - security_studies |Yaml |none |acc |0.7184|± |0.0288|
184
+ | - sociology |Yaml |none |acc |0.8358|± |0.0262|
185
+ | - us_foreign_policy |Yaml |none |acc |0.8500|± |0.0359|
186
+ | - stem |N/A |none |acc |0.5217|± |0.1149|
187
+ | - abstract_algebra |Yaml |none |acc |0.3000|± |0.0461|
188
+ | - anatomy |Yaml |none |acc |0.6222|± |0.0419|
189
+ | - astronomy |Yaml |none |acc |0.6711|± |0.0382|
190
+ | - college_biology |Yaml |none |acc |0.7361|± |0.0369|
191
+ | - college_chemistry |Yaml |none |acc |0.4400|± |0.0499|
192
+ | - college_computer_science |Yaml |none |acc |0.5000|± |0.0503|
193
+ | - college_mathematics |Yaml |none |acc |0.3100|± |0.0465|
194
+ | - college_physics |Yaml |none |acc |0.4902|± |0.0497|
195
+ | - computer_security |Yaml |none |acc |0.7100|± |0.0456|
196
+ | - conceptual_physics |Yaml |none |acc |0.5362|± |0.0326|
197
+ | - electrical_engineering |Yaml |none |acc |0.5862|± |0.0410|
198
+ | - elementary_mathematics |Yaml |none |acc |0.4365|± |0.0255|
199
+ | - high_school_biology |Yaml |none |acc |0.7129|± |0.0257|
200
+ | - high_school_chemistry |Yaml |none |acc |0.5074|± |0.0352|
201
+ | - high_school_computer_science |Yaml |none |acc |0.6500|± |0.0479|
202
+ | - high_school_mathematics |Yaml |none |acc |0.3259|± |0.0286|
203
+ | - high_school_physics |Yaml |none |acc |0.3709|± |0.0394|
204
+ | - high_school_statistics |Yaml |none |acc |0.5139|± |0.0341|
205
+ | - machine_learning |Yaml |none |acc |0.5089|± |0.0475|
206
+
207
+ | Groups |Version|Filter|Metric|Value | |Stderr|
208
+ |------------------|-------|------|------|-----:|---|-----:|
209
+ |mmlu |N/A |none |acc |0.6085|± |0.1321|
210
+ | - humanities |N/A |none |acc |0.5405|± |0.1478|
211
+ | - other |N/A |none |acc |0.6894|± |0.1091|
212
+ | - social_sciences|N/A |none |acc |0.7195|± |0.0676|
213
+ | - stem |N/A |none |acc |0.5217|± |0.1149|