Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,213 @@
|
|
1 |
---
|
2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
base_model: fblgit/zephyr-lora-dpo-b1
|
3 |
+
tags:
|
4 |
+
- alignment-handbook
|
5 |
+
- generated_from_trainer
|
6 |
+
datasets:
|
7 |
+
- HuggingFaceH4/ultrafeedback_binarized
|
8 |
+
model-index:
|
9 |
+
- name: juanako-7b-v1
|
10 |
+
results: []
|
11 |
+
license: artistic-2.0
|
12 |
---
|
13 |
+
|
14 |
+
# juanako-7b-v1
|
15 |
+
|
16 |
+
This model is a fine-tuned version of [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1) on the HuggingFaceH4/ultrafeedback_binarized dataset.
|
17 |
+
It achieves the following results on the evaluation set:
|
18 |
+
- Loss: 0.4594
|
19 |
+
- Rewards/chosen: -1.1095
|
20 |
+
- Rewards/rejected: -2.3132
|
21 |
+
- Rewards/accuracies: 0.7964
|
22 |
+
- Rewards/margins: 1.2037
|
23 |
+
- Logps/rejected: -220.0052
|
24 |
+
- Logps/chosen: -217.5506
|
25 |
+
- Logits/rejected: -2.5535
|
26 |
+
- Logits/chosen: -2.7973
|
27 |
+
|
28 |
+
## Model description
|
29 |
+
|
30 |
+
**It seems to outperforms the original Zephyr in most of the tasks.**
|
31 |
+
|
32 |
+
I trained Juanako with the same datasets and trainer from [alignment-handbook/zephyr-7b-sft-lora](https://huggingface.co/alignment-handbook/zephyr-7b-sft-lora)
|
33 |
+
* 1 epoch on DPO with transformers-UNA, the result is [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1) after merge using FastChat converter.
|
34 |
+
* finally 1 epoch on DPO with transformers-UNA to [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1).
|
35 |
+
|
36 |
+
Some other experiments were performed as well to test transformers-UNA capabilities on diverse scenarios and models.
|
37 |
+
|
38 |
+
**This is a complete version of the model, the result of converting LoRa's**
|
39 |
+
|
40 |
+
## Intended uses & limitations
|
41 |
+
|
42 |
+
Research purposes.
|
43 |
+
|
44 |
+
## Training and evaluation data
|
45 |
+
|
46 |
+
alignment-handbook DPO with UNA on top of the SFT lora.
|
47 |
+
|
48 |
+
### Evaluation lm-evaluation-harness
|
49 |
+
#### 0-Shot
|
50 |
+
```
|
51 |
+
hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 0, batch_size: 8
|
52 |
+
```
|
53 |
+
| Tasks |Version|Filter| Metric | Value | |Stderr|
|
54 |
+
|-------------------|-------|------|-----------|------:|---|-----:|
|
55 |
+
|arc_challenge |Yaml |none |acc | 0.5691|± |0.0145|
|
56 |
+
| | |none |acc_norm | 0.6041|± |0.0143|
|
57 |
+
|arc_easy |Yaml |none |acc | 0.8363|± |0.0076|
|
58 |
+
| | |none |acc_norm | 0.8161|± |0.0079|
|
59 |
+
|hellaswag |Yaml |none |acc | 0.6554|± |0.0047|
|
60 |
+
| | |none |acc_norm | 0.8411|± |0.0036|
|
61 |
+
|boolq |Yaml |none |acc | 0.8355|± |0.0065|
|
62 |
+
|lambada |N/A |none |perplexity | 3.3607|± |0.1398|
|
63 |
+
| | |none |acc | 0.7309|± |0.0137|
|
64 |
+
|piqa |Yaml |none |acc | 0.8194|± |0.0090|
|
65 |
+
| | |none |acc_norm | 0.8335|± |0.0087|
|
66 |
+
|sciq |Yaml |none |acc | 0.9480|± |0.0070|
|
67 |
+
| | |none |acc_norm | 0.8960|± |0.0097|
|
68 |
+
|truthfulqa |N/A |none |bleu_max |26.0803|± |0.6528|
|
69 |
+
| - truthfulqa_mc1 |Yaml |none |acc | 0.4198|± |0.0173|
|
70 |
+
| - truthfulqa_mc2 |Yaml |none |acc | 0.5847|± |0.0153|
|
71 |
+
|winogrande |Yaml |none |acc | 0.7609|± |0.0120|
|
72 |
+
|
73 |
+
#### 1-Shot
|
74 |
+
```
|
75 |
+
hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 1, batch_size: 8
|
76 |
+
```
|
77 |
+
| Tasks |Version|Filter| Metric | Value | |Stderr|
|
78 |
+
|-------------------|-------|------|-----------|------:|---|-----:|
|
79 |
+
|arc_challenge |Yaml |none |acc | 0.6084|± |0.0143|
|
80 |
+
| | |none |acc_norm | 0.6357|± |0.0141|
|
81 |
+
|arc_easy |Yaml |none |acc | 0.8645|± |0.0070|
|
82 |
+
| | |none |acc_norm | 0.8645|± |0.0070|
|
83 |
+
|hellaswag |Yaml |none |acc | 0.6475|± |0.0048|
|
84 |
+
| | |none |acc_norm | 0.8372|± |0.0037|
|
85 |
+
|boolq |Yaml |none |acc | 0.8609|± |0.0061|
|
86 |
+
|lambada |N/A |none |perplexity | 3.5484|± |0.1034|
|
87 |
+
| | |none |acc | 0.7207|± |0.0107|
|
88 |
+
|piqa |Yaml |none |acc | 0.8259|± |0.0088|
|
89 |
+
| | |none |acc_norm | 0.8384|± |0.0086|
|
90 |
+
|sciq |Yaml |none |acc | 0.9730|± |0.0051|
|
91 |
+
| | |none |acc_norm | 0.9740|± |0.0050|
|
92 |
+
|truthfulqa |N/A |none |bleu_max |18.9814|± |0.4805|
|
93 |
+
| | |none |acc | 0.4856|± |0.0521|
|
94 |
+
| - truthfulqa_mc1 |Yaml |none |acc | 0.4333|± |0.0173|
|
95 |
+
| - truthfulqa_mc2 |Yaml |none |acc | 0.5903|± |0.0153|
|
96 |
+
|winogrande |Yaml |none |acc | 0.7609|± |0.0120|
|
97 |
+
|
98 |
+
## Training procedure
|
99 |
+
|
100 |
+
### Training hyperparameters
|
101 |
+
|
102 |
+
The following hyperparameters were used during training:
|
103 |
+
- learning_rate: 0.0001
|
104 |
+
- train_batch_size: 1
|
105 |
+
- eval_batch_size: 1
|
106 |
+
- seed: 42
|
107 |
+
- distributed_type: multi-GPU
|
108 |
+
- num_devices: 12
|
109 |
+
- gradient_accumulation_steps: 16
|
110 |
+
- total_train_batch_size: 192
|
111 |
+
- total_eval_batch_size: 12
|
112 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
113 |
+
- lr_scheduler_type: linear
|
114 |
+
- lr_scheduler_warmup_ratio: 0.01
|
115 |
+
- num_epochs: 1
|
116 |
+
|
117 |
+
### Training results
|
118 |
+
|
119 |
+
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
120 |
+
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
121 |
+
| 0.4966 | 0.15 | 50 | 0.4893 | -1.1759 | -2.2914 | 0.7485 | 1.1155 | -219.7872 | -218.2148 | -2.5450 | -2.7884 |
|
122 |
+
| 0.4522 | 0.31 | 100 | 0.4808 | -0.8099 | -1.8893 | 0.7784 | 1.0794 | -215.7659 | -214.5544 | -2.5644 | -2.8095 |
|
123 |
+
| 0.5048 | 0.46 | 150 | 0.4706 | -1.0526 | -2.1412 | 0.7725 | 1.0887 | -218.2852 | -216.9814 | -2.5638 | -2.8089 |
|
124 |
+
| 0.4853 | 0.62 | 200 | 0.4640 | -1.0787 | -2.2821 | 0.7725 | 1.2034 | -219.6941 | -217.2426 | -2.5460 | -2.7891 |
|
125 |
+
| 0.4639 | 0.77 | 250 | 0.4636 | -1.2348 | -2.4583 | 0.8084 | 1.2235 | -221.4559 | -218.8034 | -2.5533 | -2.7970 |
|
126 |
+
| 0.4634 | 0.93 | 300 | 0.4601 | -1.1370 | -2.3243 | 0.7964 | 1.1873 | -220.1163 | -217.8257 | -2.5540 | -2.7977 |
|
127 |
+
| - | 1.00 | 300 | 0.4594 | -1.1095 | -2.3132 | 0.7964 | 1.2037 | -220.0052 | -217.5506 | -2.5535 | -2.7973 |
|
128 |
+
|
129 |
+
### Framework versions
|
130 |
+
|
131 |
+
- Transformers 4.35.0-UNA
|
132 |
+
- Pytorch 2.1.0
|
133 |
+
- Datasets 2.14.6
|
134 |
+
- Tokenizers 0.14.1
|
135 |
+
|
136 |
+
## MMLU Results
|
137 |
+
|
138 |
+
#### 1-Shot
|
139 |
+
```
|
140 |
+
hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 1, batch_size: 1
|
141 |
+
```
|
142 |
+
| Tasks |Version|Filter|Metric|Value | |Stderr|
|
143 |
+
|---------------------------------------|-------|------|------|-----:|---|-----:|
|
144 |
+
|mmlu |N/A |none |acc |0.6085|± |0.1321|
|
145 |
+
| - humanities |N/A |none |acc |0.5405|± |0.1478|
|
146 |
+
| - formal_logic |Yaml |none |acc |0.4206|± |0.0442|
|
147 |
+
| - high_school_european_history |Yaml |none |acc |0.7576|± |0.0335|
|
148 |
+
| - high_school_us_history |Yaml |none |acc |0.8186|± |0.0270|
|
149 |
+
| - high_school_world_history |Yaml |none |acc |0.7890|± |0.0266|
|
150 |
+
| - international_law |Yaml |none |acc |0.7438|± |0.0398|
|
151 |
+
| - jurisprudence |Yaml |none |acc |0.8056|± |0.0383|
|
152 |
+
| - logical_fallacies |Yaml |none |acc |0.7791|± |0.0326|
|
153 |
+
| - moral_disputes |Yaml |none |acc |0.7023|± |0.0246|
|
154 |
+
| - moral_scenarios |Yaml |none |acc |0.2145|± |0.0137|
|
155 |
+
| - philosophy |Yaml |none |acc |0.7074|± |0.0258|
|
156 |
+
| - prehistory |Yaml |none |acc |0.7377|± |0.0245|
|
157 |
+
| - professional_law |Yaml |none |acc |0.4361|± |0.0127|
|
158 |
+
| - world_religions |Yaml |none |acc |0.8421|± |0.0280|
|
159 |
+
| - other |N/A |none |acc |0.6894|± |0.1091|
|
160 |
+
| - business_ethics |Yaml |none |acc |0.5600|± |0.0499|
|
161 |
+
| - clinical_knowledge |Yaml |none |acc |0.6981|± |0.0283|
|
162 |
+
| - college_medicine |Yaml |none |acc |0.6185|± |0.0370|
|
163 |
+
| - global_facts |Yaml |none |acc |0.3300|± |0.0473|
|
164 |
+
| - human_aging |Yaml |none |acc |0.6726|± |0.0315|
|
165 |
+
| - management |Yaml |none |acc |0.8058|± |0.0392|
|
166 |
+
| - marketing |Yaml |none |acc |0.8419|± |0.0239|
|
167 |
+
| - medical_genetics |Yaml |none |acc |0.7200|± |0.0451|
|
168 |
+
| - miscellaneous |Yaml |none |acc |0.8033|± |0.0142|
|
169 |
+
| - nutrition |Yaml |none |acc |0.7288|± |0.0255|
|
170 |
+
| - professional_accounting |Yaml |none |acc |0.4929|± |0.0298|
|
171 |
+
| - professional_medicine |Yaml |none |acc |0.6801|± |0.0283|
|
172 |
+
| - virology |Yaml |none |acc |0.5000|± |0.0389|
|
173 |
+
| - social_sciences |N/A |none |acc |0.7195|± |0.0676|
|
174 |
+
| - econometrics |Yaml |none |acc |0.5000|± |0.0470|
|
175 |
+
| - high_school_geography |Yaml |none |acc |0.7879|± |0.0291|
|
176 |
+
| - high_school_government_and_politics|Yaml |none |acc |0.8601|± |0.0250|
|
177 |
+
| - high_school_macroeconomics |Yaml |none |acc |0.6231|± |0.0246|
|
178 |
+
| - high_school_microeconomics |Yaml |none |acc |0.6471|± |0.0310|
|
179 |
+
| - high_school_psychology |Yaml |none |acc |0.8000|± |0.0171|
|
180 |
+
| - human_sexuality |Yaml |none |acc |0.7557|± |0.0377|
|
181 |
+
| - professional_psychology |Yaml |none |acc |0.6552|± |0.0192|
|
182 |
+
| - public_relations |Yaml |none |acc |0.6636|± |0.0453|
|
183 |
+
| - security_studies |Yaml |none |acc |0.7184|± |0.0288|
|
184 |
+
| - sociology |Yaml |none |acc |0.8358|± |0.0262|
|
185 |
+
| - us_foreign_policy |Yaml |none |acc |0.8500|± |0.0359|
|
186 |
+
| - stem |N/A |none |acc |0.5217|± |0.1149|
|
187 |
+
| - abstract_algebra |Yaml |none |acc |0.3000|± |0.0461|
|
188 |
+
| - anatomy |Yaml |none |acc |0.6222|± |0.0419|
|
189 |
+
| - astronomy |Yaml |none |acc |0.6711|± |0.0382|
|
190 |
+
| - college_biology |Yaml |none |acc |0.7361|± |0.0369|
|
191 |
+
| - college_chemistry |Yaml |none |acc |0.4400|± |0.0499|
|
192 |
+
| - college_computer_science |Yaml |none |acc |0.5000|± |0.0503|
|
193 |
+
| - college_mathematics |Yaml |none |acc |0.3100|± |0.0465|
|
194 |
+
| - college_physics |Yaml |none |acc |0.4902|± |0.0497|
|
195 |
+
| - computer_security |Yaml |none |acc |0.7100|± |0.0456|
|
196 |
+
| - conceptual_physics |Yaml |none |acc |0.5362|± |0.0326|
|
197 |
+
| - electrical_engineering |Yaml |none |acc |0.5862|± |0.0410|
|
198 |
+
| - elementary_mathematics |Yaml |none |acc |0.4365|± |0.0255|
|
199 |
+
| - high_school_biology |Yaml |none |acc |0.7129|± |0.0257|
|
200 |
+
| - high_school_chemistry |Yaml |none |acc |0.5074|± |0.0352|
|
201 |
+
| - high_school_computer_science |Yaml |none |acc |0.6500|± |0.0479|
|
202 |
+
| - high_school_mathematics |Yaml |none |acc |0.3259|± |0.0286|
|
203 |
+
| - high_school_physics |Yaml |none |acc |0.3709|± |0.0394|
|
204 |
+
| - high_school_statistics |Yaml |none |acc |0.5139|± |0.0341|
|
205 |
+
| - machine_learning |Yaml |none |acc |0.5089|± |0.0475|
|
206 |
+
|
207 |
+
| Groups |Version|Filter|Metric|Value | |Stderr|
|
208 |
+
|------------------|-------|------|------|-----:|---|-----:|
|
209 |
+
|mmlu |N/A |none |acc |0.6085|± |0.1321|
|
210 |
+
| - humanities |N/A |none |acc |0.5405|± |0.1478|
|
211 |
+
| - other |N/A |none |acc |0.6894|± |0.1091|
|
212 |
+
| - social_sciences|N/A |none |acc |0.7195|± |0.0676|
|
213 |
+
| - stem |N/A |none |acc |0.5217|± |0.1149|
|