Text Generation
Transformers
Safetensors
mistral
alignment-handbook
Generated from Trainer
text-generation-inference
Inference Endpoints
juanako-7b-v1 / README.md
fblgit's picture
Update README.md
91322ed
|
raw
history blame
12.8 kB
---
base_model: fblgit/zephyr-lora-dpo-b1
tags:
- alignment-handbook
- generated_from_trainer
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: juanako-7b-v1
results: []
license: artistic-2.0
---
# juanako-7b-v1
This model is a fine-tuned version of [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1) on the HuggingFaceH4/ultrafeedback_binarized dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4594
- Rewards/chosen: -1.1095
- Rewards/rejected: -2.3132
- Rewards/accuracies: 0.7964
- Rewards/margins: 1.2037
- Logps/rejected: -220.0052
- Logps/chosen: -217.5506
- Logits/rejected: -2.5535
- Logits/chosen: -2.7973
## Model description
**It seems to outperforms the original Zephyr in most of the tasks.**
I trained Juanako with the same datasets and trainer from [alignment-handbook/zephyr-7b-sft-lora](https://huggingface.co/alignment-handbook/zephyr-7b-sft-lora)
* 1 epoch on DPO with transformers-UNA, the result is [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1) after merge using FastChat converter.
* finally 1 epoch on DPO with transformers-UNA to [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1).
Some other experiments were performed as well to test transformers-UNA capabilities on diverse scenarios and models.
**This is a complete version of the model, the result of converting LoRa's**
## Intended uses & limitations
Research purposes.
## Training and evaluation data
alignment-handbook DPO with UNA on top of the SFT lora.
### Evaluation lm-evaluation-harness
#### 0-Shot
```
hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 0, batch_size: 8
```
| Tasks |Version|Filter| Metric | Value | |Stderr|
|-------------------|-------|------|-----------|------:|---|-----:|
|arc_challenge |Yaml |none |acc | 0.5691|± |0.0145|
| | |none |acc_norm | 0.6041|± |0.0143|
|arc_easy |Yaml |none |acc | 0.8363|± |0.0076|
| | |none |acc_norm | 0.8161|± |0.0079|
|hellaswag |Yaml |none |acc | 0.6554|± |0.0047|
| | |none |acc_norm | 0.8411|± |0.0036|
|boolq |Yaml |none |acc | 0.8355|± |0.0065|
|lambada |N/A |none |perplexity | 3.3607|± |0.1398|
| | |none |acc | 0.7309|± |0.0137|
|piqa |Yaml |none |acc | 0.8194|± |0.0090|
| | |none |acc_norm | 0.8335|± |0.0087|
|sciq |Yaml |none |acc | 0.9480|± |0.0070|
| | |none |acc_norm | 0.8960|± |0.0097|
|truthfulqa |N/A |none |bleu_max |26.0803|± |0.6528|
| - truthfulqa_mc1 |Yaml |none |acc | 0.4198|± |0.0173|
| - truthfulqa_mc2 |Yaml |none |acc | 0.5847|± |0.0153|
|winogrande |Yaml |none |acc | 0.7609|± |0.0120|
#### 1-Shot
```
hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 1, batch_size: 8
```
| Tasks |Version|Filter| Metric | Value | |Stderr|
|-------------------|-------|------|-----------|------:|---|-----:|
|arc_challenge |Yaml |none |acc | 0.6084|± |0.0143|
| | |none |acc_norm | 0.6357|± |0.0141|
|arc_easy |Yaml |none |acc | 0.8645|± |0.0070|
| | |none |acc_norm | 0.8645|± |0.0070|
|hellaswag |Yaml |none |acc | 0.6475|± |0.0048|
| | |none |acc_norm | 0.8372|± |0.0037|
|boolq |Yaml |none |acc | 0.8609|± |0.0061|
|lambada |N/A |none |perplexity | 3.5484|± |0.1034|
| | |none |acc | 0.7207|± |0.0107|
|piqa |Yaml |none |acc | 0.8259|± |0.0088|
| | |none |acc_norm | 0.8384|± |0.0086|
|sciq |Yaml |none |acc | 0.9730|± |0.0051|
| | |none |acc_norm | 0.9740|± |0.0050|
|truthfulqa |N/A |none |bleu_max |18.9814|± |0.4805|
| | |none |acc | 0.4856|± |0.0521|
| - truthfulqa_mc1 |Yaml |none |acc | 0.4333|± |0.0173|
| - truthfulqa_mc2 |Yaml |none |acc | 0.5903|± |0.0153|
|winogrande |Yaml |none |acc | 0.7609|± |0.0120|
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 12
- gradient_accumulation_steps: 16
- total_train_batch_size: 192
- total_eval_batch_size: 12
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.01
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.4966 | 0.15 | 50 | 0.4893 | -1.1759 | -2.2914 | 0.7485 | 1.1155 | -219.7872 | -218.2148 | -2.5450 | -2.7884 |
| 0.4522 | 0.31 | 100 | 0.4808 | -0.8099 | -1.8893 | 0.7784 | 1.0794 | -215.7659 | -214.5544 | -2.5644 | -2.8095 |
| 0.5048 | 0.46 | 150 | 0.4706 | -1.0526 | -2.1412 | 0.7725 | 1.0887 | -218.2852 | -216.9814 | -2.5638 | -2.8089 |
| 0.4853 | 0.62 | 200 | 0.4640 | -1.0787 | -2.2821 | 0.7725 | 1.2034 | -219.6941 | -217.2426 | -2.5460 | -2.7891 |
| 0.4639 | 0.77 | 250 | 0.4636 | -1.2348 | -2.4583 | 0.8084 | 1.2235 | -221.4559 | -218.8034 | -2.5533 | -2.7970 |
| 0.4634 | 0.93 | 300 | 0.4601 | -1.1370 | -2.3243 | 0.7964 | 1.1873 | -220.1163 | -217.8257 | -2.5540 | -2.7977 |
| - | 1.00 | 300 | 0.4594 | -1.1095 | -2.3132 | 0.7964 | 1.2037 | -220.0052 | -217.5506 | -2.5535 | -2.7973 |
### Framework versions
- Transformers 4.35.0-UNA
- Pytorch 2.1.0
- Datasets 2.14.6
- Tokenizers 0.14.1
## MMLU Results
#### 1-Shot
```
hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: None, num_fewshot: 1, batch_size: 1
```
| Tasks |Version|Filter|Metric|Value | |Stderr|
|---------------------------------------|-------|------|------|-----:|---|-----:|
|mmlu |N/A |none |acc |0.6085|± |0.1321|
| - humanities |N/A |none |acc |0.5405|± |0.1478|
| - formal_logic |Yaml |none |acc |0.4206|± |0.0442|
| - high_school_european_history |Yaml |none |acc |0.7576|± |0.0335|
| - high_school_us_history |Yaml |none |acc |0.8186|± |0.0270|
| - high_school_world_history |Yaml |none |acc |0.7890|± |0.0266|
| - international_law |Yaml |none |acc |0.7438|± |0.0398|
| - jurisprudence |Yaml |none |acc |0.8056|± |0.0383|
| - logical_fallacies |Yaml |none |acc |0.7791|± |0.0326|
| - moral_disputes |Yaml |none |acc |0.7023|± |0.0246|
| - moral_scenarios |Yaml |none |acc |0.2145|± |0.0137|
| - philosophy |Yaml |none |acc |0.7074|± |0.0258|
| - prehistory |Yaml |none |acc |0.7377|± |0.0245|
| - professional_law |Yaml |none |acc |0.4361|± |0.0127|
| - world_religions |Yaml |none |acc |0.8421|± |0.0280|
| - other |N/A |none |acc |0.6894|± |0.1091|
| - business_ethics |Yaml |none |acc |0.5600|± |0.0499|
| - clinical_knowledge |Yaml |none |acc |0.6981|± |0.0283|
| - college_medicine |Yaml |none |acc |0.6185|± |0.0370|
| - global_facts |Yaml |none |acc |0.3300|± |0.0473|
| - human_aging |Yaml |none |acc |0.6726|± |0.0315|
| - management |Yaml |none |acc |0.8058|± |0.0392|
| - marketing |Yaml |none |acc |0.8419|± |0.0239|
| - medical_genetics |Yaml |none |acc |0.7200|± |0.0451|
| - miscellaneous |Yaml |none |acc |0.8033|± |0.0142|
| - nutrition |Yaml |none |acc |0.7288|± |0.0255|
| - professional_accounting |Yaml |none |acc |0.4929|± |0.0298|
| - professional_medicine |Yaml |none |acc |0.6801|± |0.0283|
| - virology |Yaml |none |acc |0.5000|± |0.0389|
| - social_sciences |N/A |none |acc |0.7195|± |0.0676|
| - econometrics |Yaml |none |acc |0.5000|± |0.0470|
| - high_school_geography |Yaml |none |acc |0.7879|± |0.0291|
| - high_school_government_and_politics|Yaml |none |acc |0.8601|± |0.0250|
| - high_school_macroeconomics |Yaml |none |acc |0.6231|± |0.0246|
| - high_school_microeconomics |Yaml |none |acc |0.6471|± |0.0310|
| - high_school_psychology |Yaml |none |acc |0.8000|± |0.0171|
| - human_sexuality |Yaml |none |acc |0.7557|± |0.0377|
| - professional_psychology |Yaml |none |acc |0.6552|± |0.0192|
| - public_relations |Yaml |none |acc |0.6636|± |0.0453|
| - security_studies |Yaml |none |acc |0.7184|± |0.0288|
| - sociology |Yaml |none |acc |0.8358|± |0.0262|
| - us_foreign_policy |Yaml |none |acc |0.8500|± |0.0359|
| - stem |N/A |none |acc |0.5217|± |0.1149|
| - abstract_algebra |Yaml |none |acc |0.3000|± |0.0461|
| - anatomy |Yaml |none |acc |0.6222|± |0.0419|
| - astronomy |Yaml |none |acc |0.6711|± |0.0382|
| - college_biology |Yaml |none |acc |0.7361|± |0.0369|
| - college_chemistry |Yaml |none |acc |0.4400|± |0.0499|
| - college_computer_science |Yaml |none |acc |0.5000|± |0.0503|
| - college_mathematics |Yaml |none |acc |0.3100|± |0.0465|
| - college_physics |Yaml |none |acc |0.4902|± |0.0497|
| - computer_security |Yaml |none |acc |0.7100|± |0.0456|
| - conceptual_physics |Yaml |none |acc |0.5362|± |0.0326|
| - electrical_engineering |Yaml |none |acc |0.5862|± |0.0410|
| - elementary_mathematics |Yaml |none |acc |0.4365|± |0.0255|
| - high_school_biology |Yaml |none |acc |0.7129|± |0.0257|
| - high_school_chemistry |Yaml |none |acc |0.5074|± |0.0352|
| - high_school_computer_science |Yaml |none |acc |0.6500|± |0.0479|
| - high_school_mathematics |Yaml |none |acc |0.3259|± |0.0286|
| - high_school_physics |Yaml |none |acc |0.3709|± |0.0394|
| - high_school_statistics |Yaml |none |acc |0.5139|± |0.0341|
| - machine_learning |Yaml |none |acc |0.5089|± |0.0475|
| Groups |Version|Filter|Metric|Value | |Stderr|
|------------------|-------|------|------|-----:|---|-----:|
|mmlu |N/A |none |acc |0.6085|± |0.1321|
| - humanities |N/A |none |acc |0.5405|± |0.1478|
| - other |N/A |none |acc |0.6894|± |0.1091|
| - social_sciences|N/A |none |acc |0.7195|± |0.0676|
| - stem |N/A |none |acc |0.5217|± |0.1149|