killawhale2
commited on
Commit
•
60adef1
1
Parent(s):
d3167df
add training strategy to model card (#3)
Browse files- add training strategy to model card (2ddf358d72c71dd963261ba27ce3c9f4018ecc6f)
README.md
CHANGED
@@ -16,6 +16,12 @@ We developed the Depth Up-Scaling technique. Built on the Llama2 architecture, S
|
|
16 |
Depth-Upscaled SOLAR-10.7B has remarkable performance. It outperforms models with up to 30B parameters, even surpassing the recent Mixtral 8X7B model. For detailed information, please refer to the experimental table ([link to be updated soon]).
|
17 |
Solar 10.7B is an ideal choice for fine-tuning. SOLAR-10.7B offers robustness and adaptability for your fine-tuning needs. Our simple instruction fine-tuning using the SOLAR-10.7B pre-trained model yields significant performance improvements. [[link to be updated soon]]
|
18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
# **Usage Instructions**
|
21 |
|
|
|
16 |
Depth-Upscaled SOLAR-10.7B has remarkable performance. It outperforms models with up to 30B parameters, even surpassing the recent Mixtral 8X7B model. For detailed information, please refer to the experimental table ([link to be updated soon]).
|
17 |
Solar 10.7B is an ideal choice for fine-tuning. SOLAR-10.7B offers robustness and adaptability for your fine-tuning needs. Our simple instruction fine-tuning using the SOLAR-10.7B pre-trained model yields significant performance improvements. [[link to be updated soon]]
|
18 |
|
19 |
+
# **Training Strategy**
|
20 |
+
|
21 |
+
We utilize state-of-the-art instruction fine-tuning methods including supervised fine-tuning (SFT) and direct preference optimization (DPO) [1].
|
22 |
+
Using open source datasets with Alpaca- and OpenOrca-style and generated synthetic datasets, we apply an iterative DPO training, a proprietary alignment strategy, to maximize the performance of our resulting model.
|
23 |
+
|
24 |
+
[1] Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C.D. and Finn, C., 2023. Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290.
|
25 |
|
26 |
# **Usage Instructions**
|
27 |
|