fblgit
/

juanako-7b-v1

 ---
+base_model: fblgit/zephyr-lora-dpo-b1
+tags:
+- alignment-handbook
+- generated_from_trainer
+datasets:
+- HuggingFaceH4/ultrafeedback_binarized
+model-index:
+- name: juanako-7b-v1
+  results: []
+license: artistic-2.0
 ---
+# juanako-7b-v1
+This model is a fine-tuned version of [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1) on the HuggingFaceH4/ultrafeedback_binarized dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.4594
+- Rewards/chosen: -1.1095
+- Rewards/rejected: -2.3132
+- Rewards/accuracies: 0.7964
+- Rewards/margins: 1.2037
+- Logps/rejected: -220.0052
+- Logps/chosen: -217.5506
+- Logits/rejected: -2.5535
+- Logits/chosen: -2.7973
+## Model description
+**It seems to outperforms the original Zephyr in most of the tasks.**
+I trained Juanako with the same datasets and trainer from [alignment-handbook/zephyr-7b-sft-lora](https://huggingface.co/alignment-handbook/zephyr-7b-sft-lora)
+* 1 epoch on DPO with transformers-UNA, the result is [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1) after merge using FastChat converter.
+* finally 1 epoch on DPO with transformers-UNA to [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1).
+Some other experiments were performed as well to test transformers-UNA capabilities on diverse scenarios and models.
+**This is a complete version of the model, the result of converting LoRa's**
+## Intended uses & limitations
+Research purposes.
+## Training and evaluation data
+alignment-handbook DPO with UNA on top of the SFT lora.
+### Evaluation lm-evaluation-harness
+#### 0-Shot
+```
+hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 0, batch_size: 8
+```
+|       Tasks       |Version|Filter|  Metric   | Value |   |Stderr|
+|-------------------|-------|------|-----------|------:|---|-----:|
+|arc_challenge      |Yaml   |none  |acc        | 0.5691|±  |0.0145|
+|                   |       |none  |acc_norm   | 0.6041|±  |0.0143|
+|arc_easy           |Yaml   |none  |acc        | 0.8363|±  |0.0076|
+|                   |       |none  |acc_norm   | 0.8161|±  |0.0079|
+|hellaswag          |Yaml   |none  |acc        | 0.6554|±  |0.0047|
+|                   |       |none  |acc_norm   | 0.8411|±  |0.0036|
+|boolq              |Yaml   |none  |acc        | 0.8355|±  |0.0065|
+|lambada            |N/A    |none  |perplexity | 3.3607|±  |0.1398|
+|                   |       |none  |acc        | 0.7309|±  |0.0137|
+|piqa               |Yaml   |none  |acc        | 0.8194|±  |0.0090|
+|                   |       |none  |acc_norm   | 0.8335|±  |0.0087|
+|sciq               |Yaml   |none  |acc        | 0.9480|±  |0.0070|
+|                   |       |none  |acc_norm   | 0.8960|±  |0.0097|
+|truthfulqa         |N/A    |none  |bleu_max   |26.0803|±  |0.6528|
+| - truthfulqa_mc1  |Yaml   |none  |acc        | 0.4198|±  |0.0173|
+| - truthfulqa_mc2  |Yaml   |none  |acc        | 0.5847|±  |0.0153|
+|winogrande         |Yaml   |none  |acc        | 0.7609|±  |0.0120|
+#### 1-Shot
+```
+hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 1, batch_size: 8
+```
+|       Tasks       |Version|Filter|  Metric   | Value |   |Stderr|
+|-------------------|-------|------|-----------|------:|---|-----:|
+|arc_challenge      |Yaml   |none  |acc        | 0.6084|±  |0.0143|
+|                   |       |none  |acc_norm   | 0.6357|±  |0.0141|
+|arc_easy           |Yaml   |none  |acc        | 0.8645|±  |0.0070|
+|                   |       |none  |acc_norm   | 0.8645|±  |0.0070|
+|hellaswag          |Yaml   |none  |acc        | 0.6475|±  |0.0048|
+|                   |       |none  |acc_norm   | 0.8372|±  |0.0037|
+|boolq              |Yaml   |none  |acc        | 0.8609|±  |0.0061|
+|lambada            |N/A    |none  |perplexity | 3.5484|±  |0.1034|
+|                   |       |none  |acc        | 0.7207|±  |0.0107|
+|piqa               |Yaml   |none  |acc        | 0.8259|±  |0.0088|
+|                   |       |none  |acc_norm   | 0.8384|±  |0.0086|
+|sciq               |Yaml   |none  |acc        | 0.9730|±  |0.0051|
+|                   |       |none  |acc_norm   | 0.9740|±  |0.0050|
+|truthfulqa         |N/A    |none  |bleu_max   |18.9814|±  |0.4805|
+|                   |       |none  |acc        | 0.4856|±  |0.0521|
+| - truthfulqa_mc1  |Yaml   |none  |acc        | 0.4333|±  |0.0173|
+| - truthfulqa_mc2  |Yaml   |none  |acc        | 0.5903|±  |0.0153|
+|winogrande         |Yaml   |none  |acc        | 0.7609|±  |0.0120|
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0001
+- train_batch_size: 1
+- eval_batch_size: 1
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 12
+- gradient_accumulation_steps: 16
+- total_train_batch_size: 192
+- total_eval_batch_size: 12
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_ratio: 0.01
+- num_epochs: 1
+### Training results
+| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
+|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.4966        | 0.15  | 50   | 0.4893          | -1.1759        | -2.2914          | 0.7485             | 1.1155          | -219.7872      | -218.2148    | -2.5450         | -2.7884       |
+| 0.4522        | 0.31  | 100  | 0.4808          | -0.8099        | -1.8893          | 0.7784             | 1.0794          | -215.7659      | -214.5544    | -2.5644         | -2.8095       |
+| 0.5048        | 0.46  | 150  | 0.4706          | -1.0526        | -2.1412          | 0.7725             | 1.0887          | -218.2852      | -216.9814    | -2.5638         | -2.8089       |
+| 0.4853        | 0.62  | 200  | 0.4640          | -1.0787        | -2.2821          | 0.7725             | 1.2034          | -219.6941      | -217.2426    | -2.5460         | -2.7891       |
+| 0.4639        | 0.77  | 250  | 0.4636          | -1.2348        | -2.4583          | 0.8084             | 1.2235          | -221.4559      | -218.8034    | -2.5533         | -2.7970       |
+| 0.4634        | 0.93  | 300  | 0.4601          | -1.1370        | -2.3243          | 0.7964             | 1.1873          | -220.1163      | -217.8257    | -2.5540         | -2.7977       |
+| -             | 1.00  | 300  | 0.4594          | -1.1095        | -2.3132          | 0.7964             | 1.2037          | -220.0052      | -217.5506    | -2.5535         | -2.7973       |
+### Framework versions
+- Transformers 4.35.0-UNA
+- Pytorch 2.1.0
+- Datasets 2.14.6
+- Tokenizers 0.14.1
+## MMLU Results
+#### 1-Shot
+```
+hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 1, batch_size: 1
+```
+|                 Tasks                 |Version|Filter|Metric|Value |   |Stderr|
+|---------------------------------------|-------|------|------|-----:|---|-----:|
+|mmlu                                   |N/A    |none  |acc   |0.6085|±  |0.1321|
+| - humanities                          |N/A    |none  |acc   |0.5405|±  |0.1478|
+|  - formal_logic                       |Yaml   |none  |acc   |0.4206|±  |0.0442|
+|  - high_school_european_history       |Yaml   |none  |acc   |0.7576|±  |0.0335|
+|  - high_school_us_history             |Yaml   |none  |acc   |0.8186|±  |0.0270|
+|  - high_school_world_history          |Yaml   |none  |acc   |0.7890|±  |0.0266|
+|  - international_law                  |Yaml   |none  |acc   |0.7438|±  |0.0398|
+|  - jurisprudence                      |Yaml   |none  |acc   |0.8056|±  |0.0383|
+|  - logical_fallacies                  |Yaml   |none  |acc   |0.7791|±  |0.0326|
+|  - moral_disputes                     |Yaml   |none  |acc   |0.7023|±  |0.0246|
+|  - moral_scenarios                    |Yaml   |none  |acc   |0.2145|±  |0.0137|
+|  - philosophy                         |Yaml   |none  |acc   |0.7074|±  |0.0258|
+|  - prehistory                         |Yaml   |none  |acc   |0.7377|±  |0.0245|
+|  - professional_law                   |Yaml   |none  |acc   |0.4361|±  |0.0127|
+|  - world_religions                    |Yaml   |none  |acc   |0.8421|±  |0.0280|
+| - other                               |N/A    |none  |acc   |0.6894|±  |0.1091|
+|  - business_ethics                    |Yaml   |none  |acc   |0.5600|±  |0.0499|
+|  - clinical_knowledge                 |Yaml   |none  |acc   |0.6981|±  |0.0283|
+|  - college_medicine                   |Yaml   |none  |acc   |0.6185|±  |0.0370|
+|  - global_facts                       |Yaml   |none  |acc   |0.3300|±  |0.0473|
+|  - human_aging                        |Yaml   |none  |acc   |0.6726|±  |0.0315|
+|  - management                         |Yaml   |none  |acc   |0.8058|±  |0.0392|
+|  - marketing                          |Yaml   |none  |acc   |0.8419|±  |0.0239|
+|  - medical_genetics                   |Yaml   |none  |acc   |0.7200|±  |0.0451|
+|  - miscellaneous                      |Yaml   |none  |acc   |0.8033|±  |0.0142|
+|  - nutrition                          |Yaml   |none  |acc   |0.7288|±  |0.0255|
+|  - professional_accounting            |Yaml   |none  |acc   |0.4929|±  |0.0298|
+|  - professional_medicine              |Yaml   |none  |acc   |0.6801|±  |0.0283|
+|  - virology                           |Yaml   |none  |acc   |0.5000|±  |0.0389|
+| - social_sciences                     |N/A    |none  |acc   |0.7195|±  |0.0676|
+|  - econometrics                       |Yaml   |none  |acc   |0.5000|±  |0.0470|
+|  - high_school_geography              |Yaml   |none  |acc   |0.7879|±  |0.0291|
+|  - high_school_government_and_politics|Yaml   |none  |acc   |0.8601|±  |0.0250|
+|  - high_school_macroeconomics         |Yaml   |none  |acc   |0.6231|±  |0.0246|
+|  - high_school_microeconomics         |Yaml   |none  |acc   |0.6471|±  |0.0310|
+|  - high_school_psychology             |Yaml   |none  |acc   |0.8000|±  |0.0171|
+|  - human_sexuality                    |Yaml   |none  |acc   |0.7557|±  |0.0377|
+|  - professional_psychology            |Yaml   |none  |acc   |0.6552|±  |0.0192|
+|  - public_relations                   |Yaml   |none  |acc   |0.6636|±  |0.0453|
+|  - security_studies                   |Yaml   |none  |acc   |0.7184|±  |0.0288|
+|  - sociology                          |Yaml   |none  |acc   |0.8358|±  |0.0262|
+|  - us_foreign_policy                  |Yaml   |none  |acc   |0.8500|±  |0.0359|
+| - stem                                |N/A    |none  |acc   |0.5217|±  |0.1149|
+|  - abstract_algebra                   |Yaml   |none  |acc   |0.3000|±  |0.0461|
+|  - anatomy                            |Yaml   |none  |acc   |0.6222|±  |0.0419|
+|  - astronomy                          |Yaml   |none  |acc   |0.6711|±  |0.0382|
+|  - college_biology                    |Yaml   |none  |acc   |0.7361|±  |0.0369|
+|  - college_chemistry                  |Yaml   |none  |acc   |0.4400|±  |0.0499|
+|  - college_computer_science           |Yaml   |none  |acc   |0.5000|±  |0.0503|
+|  - college_mathematics                |Yaml   |none  |acc   |0.3100|±  |0.0465|
+|  - college_physics                    |Yaml   |none  |acc   |0.4902|±  |0.0497|
+|  - computer_security                  |Yaml   |none  |acc   |0.7100|±  |0.0456|
+|  - conceptual_physics                 |Yaml   |none  |acc   |0.5362|±  |0.0326|
+|  - electrical_engineering             |Yaml   |none  |acc   |0.5862|±  |0.0410|
+|  - elementary_mathematics             |Yaml   |none  |acc   |0.4365|±  |0.0255|
+|  - high_school_biology                |Yaml   |none  |acc   |0.7129|±  |0.0257|
+|  - high_school_chemistry              |Yaml   |none  |acc   |0.5074|±  |0.0352|
+|  - high_school_computer_science       |Yaml   |none  |acc   |0.6500|±  |0.0479|
+|  - high_school_mathematics            |Yaml   |none  |acc   |0.3259|±  |0.0286|
+|  - high_school_physics                |Yaml   |none  |acc   |0.3709|±  |0.0394|
+|  - high_school_statistics             |Yaml   |none  |acc   |0.5139|±  |0.0341|
+|  - machine_learning                   |Yaml   |none  |acc   |0.5089|±  |0.0475|
+|      Groups      |Version|Filter|Metric|Value |   |Stderr|
+|------------------|-------|------|------|-----:|---|-----:|
+|mmlu              |N/A    |none  |acc   |0.6085|±  |0.1321|
+| - humanities     |N/A    |none  |acc   |0.5405|±  |0.1478|
+| - other          |N/A    |none  |acc   |0.6894|±  |0.1091|
+| - social_sciences|N/A    |none  |acc   |0.7195|±  |0.0676|
+| - stem           |N/A    |none  |acc   |0.5217|±  |0.1149|