phi-1_5-sft-openhermes-v2
This model is a fine-tuned version of microsoft/phi-1_5 on the generator dataset.
It achieves the following results on the evaluation set:
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 500
- num_epochs: 2
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
1.7865 |
0.0831 |
275 |
1.4033 |
1.3614 |
0.1663 |
550 |
1.3218 |
1.2986 |
0.2494 |
825 |
1.2788 |
1.2667 |
0.3325 |
1100 |
1.2531 |
1.2405 |
0.4157 |
1375 |
1.2376 |
1.2239 |
0.4988 |
1650 |
1.2237 |
1.2078 |
0.5819 |
1925 |
1.2122 |
1.2114 |
0.6651 |
2200 |
1.2005 |
1.2028 |
0.7482 |
2475 |
1.1915 |
1.173 |
0.8313 |
2750 |
1.1833 |
1.1782 |
0.9144 |
3025 |
1.1776 |
1.1805 |
0.9976 |
3300 |
1.1720 |
1.0112 |
1.0807 |
3575 |
1.1817 |
0.9988 |
1.1638 |
3850 |
1.1791 |
0.9919 |
1.2470 |
4125 |
1.1786 |
0.9886 |
1.3301 |
4400 |
1.1768 |
0.9904 |
1.4132 |
4675 |
1.1763 |
1.001 |
1.4964 |
4950 |
1.1756 |
0.9979 |
1.5795 |
5225 |
1.1751 |
0.9858 |
1.6626 |
5500 |
1.1750 |
0.9975 |
1.7458 |
5775 |
1.1750 |
0.9924 |
1.8289 |
6050 |
1.1750 |
0.9978 |
1.9120 |
6325 |
1.1750 |
0.9892 |
1.9952 |
6600 |
1.1750 |
Framework versions
- Transformers 4.42.4
- Pytorch 2.3.1+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1