|
--- |
|
base_model: fblgit/zephyr-lora-dpo-b1 |
|
tags: |
|
- alignment-handbook |
|
- generated_from_trainer |
|
datasets: |
|
- HuggingFaceH4/ultrafeedback_binarized |
|
model-index: |
|
- name: juanako-7b-v1 |
|
results: [] |
|
license: artistic-2.0 |
|
--- |
|
|
|
# juanako-7b-v1 |
|
|
|
This model is a fine-tuned version of [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1) on the HuggingFaceH4/ultrafeedback_binarized dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.4594 |
|
- Rewards/chosen: -1.1095 |
|
- Rewards/rejected: -2.3132 |
|
- Rewards/accuracies: 0.7964 |
|
- Rewards/margins: 1.2037 |
|
- Logps/rejected: -220.0052 |
|
- Logps/chosen: -217.5506 |
|
- Logits/rejected: -2.5535 |
|
- Logits/chosen: -2.7973 |
|
|
|
## Model description |
|
|
|
**It seems to outperforms the original Zephyr in most of the tasks.** |
|
|
|
I trained Juanako with the same datasets and trainer from [alignment-handbook/zephyr-7b-sft-lora](https://huggingface.co/alignment-handbook/zephyr-7b-sft-lora) |
|
* 1 epoch on DPO with transformers-UNA, the result is [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1) after merge using FastChat converter. |
|
* finally 1 epoch on DPO with transformers-UNA to [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1). |
|
|
|
Some other experiments were performed as well to test transformers-UNA capabilities on diverse scenarios and models. |
|
|
|
**This is a complete version of the model, the result of converting LoRa's** |
|
|
|
## Intended uses & limitations |
|
|
|
Research purposes. |
|
|
|
## Training and evaluation data |
|
|
|
alignment-handbook DPO with UNA on top of the SFT lora. |
|
|
|
### Evaluation lm-evaluation-harness |
|
#### 0-Shot |
|
``` |
|
hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 0, batch_size: 8 |
|
``` |
|
| Tasks |Version|Filter| Metric | Value | |Stderr| |
|
|-------------------|-------|------|-----------|------:|---|-----:| |
|
|arc_challenge |Yaml |none |acc | 0.5691|± |0.0145| |
|
| | |none |acc_norm | 0.6041|± |0.0143| |
|
|arc_easy |Yaml |none |acc | 0.8363|± |0.0076| |
|
| | |none |acc_norm | 0.8161|± |0.0079| |
|
|hellaswag |Yaml |none |acc | 0.6554|± |0.0047| |
|
| | |none |acc_norm | 0.8411|± |0.0036| |
|
|boolq |Yaml |none |acc | 0.8355|± |0.0065| |
|
|lambada |N/A |none |perplexity | 3.3607|± |0.1398| |
|
| | |none |acc | 0.7309|± |0.0137| |
|
|piqa |Yaml |none |acc | 0.8194|± |0.0090| |
|
| | |none |acc_norm | 0.8335|± |0.0087| |
|
|sciq |Yaml |none |acc | 0.9480|± |0.0070| |
|
| | |none |acc_norm | 0.8960|± |0.0097| |
|
|truthfulqa |N/A |none |bleu_max |26.0803|± |0.6528| |
|
| - truthfulqa_mc1 |Yaml |none |acc | 0.4198|± |0.0173| |
|
| - truthfulqa_mc2 |Yaml |none |acc | 0.5847|± |0.0153| |
|
|winogrande |Yaml |none |acc | 0.7609|± |0.0120| |
|
|
|
#### 1-Shot |
|
``` |
|
hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 1, batch_size: 8 |
|
``` |
|
| Tasks |Version|Filter| Metric | Value | |Stderr| |
|
|-------------------|-------|------|-----------|------:|---|-----:| |
|
|arc_challenge |Yaml |none |acc | 0.6084|± |0.0143| |
|
| | |none |acc_norm | 0.6357|± |0.0141| |
|
|arc_easy |Yaml |none |acc | 0.8645|± |0.0070| |
|
| | |none |acc_norm | 0.8645|± |0.0070| |
|
|hellaswag |Yaml |none |acc | 0.6475|± |0.0048| |
|
| | |none |acc_norm | 0.8372|± |0.0037| |
|
|boolq |Yaml |none |acc | 0.8609|± |0.0061| |
|
|lambada |N/A |none |perplexity | 3.5484|± |0.1034| |
|
| | |none |acc | 0.7207|± |0.0107| |
|
|piqa |Yaml |none |acc | 0.8259|± |0.0088| |
|
| | |none |acc_norm | 0.8384|± |0.0086| |
|
|sciq |Yaml |none |acc | 0.9730|± |0.0051| |
|
| | |none |acc_norm | 0.9740|± |0.0050| |
|
|truthfulqa |N/A |none |bleu_max |18.9814|± |0.4805| |
|
| | |none |acc | 0.4856|± |0.0521| |
|
| - truthfulqa_mc1 |Yaml |none |acc | 0.4333|± |0.0173| |
|
| - truthfulqa_mc2 |Yaml |none |acc | 0.5903|± |0.0153| |
|
|winogrande |Yaml |none |acc | 0.7609|± |0.0120| |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.0001 |
|
- train_batch_size: 1 |
|
- eval_batch_size: 1 |
|
- seed: 42 |
|
- distributed_type: multi-GPU |
|
- num_devices: 12 |
|
- gradient_accumulation_steps: 16 |
|
- total_train_batch_size: 192 |
|
- total_eval_batch_size: 12 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- lr_scheduler_warmup_ratio: 0.01 |
|
- num_epochs: 1 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |
|
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| |
|
| 0.4966 | 0.15 | 50 | 0.4893 | -1.1759 | -2.2914 | 0.7485 | 1.1155 | -219.7872 | -218.2148 | -2.5450 | -2.7884 | |
|
| 0.4522 | 0.31 | 100 | 0.4808 | -0.8099 | -1.8893 | 0.7784 | 1.0794 | -215.7659 | -214.5544 | -2.5644 | -2.8095 | |
|
| 0.5048 | 0.46 | 150 | 0.4706 | -1.0526 | -2.1412 | 0.7725 | 1.0887 | -218.2852 | -216.9814 | -2.5638 | -2.8089 | |
|
| 0.4853 | 0.62 | 200 | 0.4640 | -1.0787 | -2.2821 | 0.7725 | 1.2034 | -219.6941 | -217.2426 | -2.5460 | -2.7891 | |
|
| 0.4639 | 0.77 | 250 | 0.4636 | -1.2348 | -2.4583 | 0.8084 | 1.2235 | -221.4559 | -218.8034 | -2.5533 | -2.7970 | |
|
| 0.4634 | 0.93 | 300 | 0.4601 | -1.1370 | -2.3243 | 0.7964 | 1.1873 | -220.1163 | -217.8257 | -2.5540 | -2.7977 | |
|
| - | 1.00 | 300 | 0.4594 | -1.1095 | -2.3132 | 0.7964 | 1.2037 | -220.0052 | -217.5506 | -2.5535 | -2.7973 | |
|
|
|
### Framework versions |
|
|
|
- Transformers 4.35.0-UNA |
|
- Pytorch 2.1.0 |
|
- Datasets 2.14.6 |
|
- Tokenizers 0.14.1 |
|
|
|
## MMLU Results |
|
|
|
#### 1-Shot |
|
``` |
|
hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 1, batch_size: 1 |
|
``` |
|
| Tasks |Version|Filter|Metric|Value | |Stderr| |
|
|---------------------------------------|-------|------|------|-----:|---|-----:| |
|
|mmlu |N/A |none |acc |0.6085|± |0.1321| |
|
| - humanities |N/A |none |acc |0.5405|± |0.1478| |
|
| - formal_logic |Yaml |none |acc |0.4206|± |0.0442| |
|
| - high_school_european_history |Yaml |none |acc |0.7576|± |0.0335| |
|
| - high_school_us_history |Yaml |none |acc |0.8186|± |0.0270| |
|
| - high_school_world_history |Yaml |none |acc |0.7890|± |0.0266| |
|
| - international_law |Yaml |none |acc |0.7438|± |0.0398| |
|
| - jurisprudence |Yaml |none |acc |0.8056|± |0.0383| |
|
| - logical_fallacies |Yaml |none |acc |0.7791|± |0.0326| |
|
| - moral_disputes |Yaml |none |acc |0.7023|± |0.0246| |
|
| - moral_scenarios |Yaml |none |acc |0.2145|± |0.0137| |
|
| - philosophy |Yaml |none |acc |0.7074|± |0.0258| |
|
| - prehistory |Yaml |none |acc |0.7377|± |0.0245| |
|
| - professional_law |Yaml |none |acc |0.4361|± |0.0127| |
|
| - world_religions |Yaml |none |acc |0.8421|± |0.0280| |
|
| - other |N/A |none |acc |0.6894|± |0.1091| |
|
| - business_ethics |Yaml |none |acc |0.5600|± |0.0499| |
|
| - clinical_knowledge |Yaml |none |acc |0.6981|± |0.0283| |
|
| - college_medicine |Yaml |none |acc |0.6185|± |0.0370| |
|
| - global_facts |Yaml |none |acc |0.3300|± |0.0473| |
|
| - human_aging |Yaml |none |acc |0.6726|± |0.0315| |
|
| - management |Yaml |none |acc |0.8058|± |0.0392| |
|
| - marketing |Yaml |none |acc |0.8419|± |0.0239| |
|
| - medical_genetics |Yaml |none |acc |0.7200|± |0.0451| |
|
| - miscellaneous |Yaml |none |acc |0.8033|± |0.0142| |
|
| - nutrition |Yaml |none |acc |0.7288|± |0.0255| |
|
| - professional_accounting |Yaml |none |acc |0.4929|± |0.0298| |
|
| - professional_medicine |Yaml |none |acc |0.6801|± |0.0283| |
|
| - virology |Yaml |none |acc |0.5000|± |0.0389| |
|
| - social_sciences |N/A |none |acc |0.7195|± |0.0676| |
|
| - econometrics |Yaml |none |acc |0.5000|± |0.0470| |
|
| - high_school_geography |Yaml |none |acc |0.7879|± |0.0291| |
|
| - high_school_government_and_politics|Yaml |none |acc |0.8601|± |0.0250| |
|
| - high_school_macroeconomics |Yaml |none |acc |0.6231|± |0.0246| |
|
| - high_school_microeconomics |Yaml |none |acc |0.6471|± |0.0310| |
|
| - high_school_psychology |Yaml |none |acc |0.8000|± |0.0171| |
|
| - human_sexuality |Yaml |none |acc |0.7557|± |0.0377| |
|
| - professional_psychology |Yaml |none |acc |0.6552|± |0.0192| |
|
| - public_relations |Yaml |none |acc |0.6636|± |0.0453| |
|
| - security_studies |Yaml |none |acc |0.7184|± |0.0288| |
|
| - sociology |Yaml |none |acc |0.8358|± |0.0262| |
|
| - us_foreign_policy |Yaml |none |acc |0.8500|± |0.0359| |
|
| - stem |N/A |none |acc |0.5217|± |0.1149| |
|
| - abstract_algebra |Yaml |none |acc |0.3000|± |0.0461| |
|
| - anatomy |Yaml |none |acc |0.6222|± |0.0419| |
|
| - astronomy |Yaml |none |acc |0.6711|± |0.0382| |
|
| - college_biology |Yaml |none |acc |0.7361|± |0.0369| |
|
| - college_chemistry |Yaml |none |acc |0.4400|± |0.0499| |
|
| - college_computer_science |Yaml |none |acc |0.5000|± |0.0503| |
|
| - college_mathematics |Yaml |none |acc |0.3100|± |0.0465| |
|
| - college_physics |Yaml |none |acc |0.4902|± |0.0497| |
|
| - computer_security |Yaml |none |acc |0.7100|± |0.0456| |
|
| - conceptual_physics |Yaml |none |acc |0.5362|± |0.0326| |
|
| - electrical_engineering |Yaml |none |acc |0.5862|± |0.0410| |
|
| - elementary_mathematics |Yaml |none |acc |0.4365|± |0.0255| |
|
| - high_school_biology |Yaml |none |acc |0.7129|± |0.0257| |
|
| - high_school_chemistry |Yaml |none |acc |0.5074|± |0.0352| |
|
| - high_school_computer_science |Yaml |none |acc |0.6500|± |0.0479| |
|
| - high_school_mathematics |Yaml |none |acc |0.3259|± |0.0286| |
|
| - high_school_physics |Yaml |none |acc |0.3709|± |0.0394| |
|
| - high_school_statistics |Yaml |none |acc |0.5139|± |0.0341| |
|
| - machine_learning |Yaml |none |acc |0.5089|± |0.0475| |
|
|
|
| Groups |Version|Filter|Metric|Value | |Stderr| |
|
|------------------|-------|------|------|-----:|---|-----:| |
|
|mmlu |N/A |none |acc |0.6085|± |0.1321| |
|
| - humanities |N/A |none |acc |0.5405|± |0.1478| |
|
| - other |N/A |none |acc |0.6894|± |0.1091| |
|
| - social_sciences|N/A |none |acc |0.7195|± |0.0676| |
|
| - stem |N/A |none |acc |0.5217|± |0.1149| |