File size: 10,308 Bytes
b2e33e1 df64bea b2e33e1 c5e81d4 b2e33e1 df64bea 8115a4c df64bea 1f197e5 df64bea bb2523b df64bea af443a1 bb2523b 357fade df64bea bb2523b df64bea 8115a4c bb2523b 357fade 8115a4c bb2523b 357fade bb2523b 357fade df64bea bb2523b df64bea |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
---
language:
- en
- ko
pipeline_tag: text-generation
license: cc-by-nc-sa-4.0
tags:
- merge
---
# **Twice-KoSOLAR-16.1B-test**
## Model Details
**Model Developers** Kyujin Han (kyujinpy)
**๋ชจ๋ธ ๋ชฉ์ **
<img src='./solar.png'>
์ต๊ทผ, SOLAR-10.7B ๋ชจ๋ธ์ด [Depth-Up-Scaling](https://arxiv.org/pdf/2312.15166.pdf)(์์ ์ฌ์ง) ๋ฐฉ๋ฒ๋ก ์ ๋ด์ธ์์ LLM ๋ฆฌ๋๋ณด๋์์ ์ข์ ์ฑ๋ฅ์ ๋ณด์ด๊ณ ์๋ค. ๋๋ถ์ด์ `์ผ๋์`์์ ๋ง๋ `seungduk/KoSOLAR-10.7B-v0.1` ๋ชจ๋ธ์ Ko-LLM ๋ฆฌ๋๋ณด๋์ ํฐ ํ๊ธ๋ ฅ์ ๋ถ๋ฌ์ค๋ฉด์, ์์ผ๋ก์ ๋ฆฌ๋๋ณด๋์ ํ๋ฆ๋ ๋ฐ๋ ๊ฒ์ผ๋ก ์์๋๋ค.
์ฌ๊ธฐ์ ๋จ์ํ ํธ๊ธฐ์ฌ์ด ๋ค์๋ค. **Upstage์์ ๋ฐํํ Depth-Up-Scaling(DUS) ๋ฐฉ๋ฒ๋ก ์ mistral-7B ๋ชจ๋ธ 2๊ฐ๋ฅผ merge(passthrough)ํ ๋ฐฉ๋ฒ**์ด๋ค.
์ด๋ ๋๋๊ฒ๋, DUS ๋ฐฉ๋ฒ๋ก ์ ์ ์ฉํ `upstage/SOLAR-10.7B-v1.0`๋ชจ๋ธ์ ๊ธฐ์กด์ mistral-7B ๋ชจ๋ธ๋ณด๋ค ๋ฆฌ๋๋ณด๋์์ ๋์ ์ฑ๋ฅ์ ๊ธฐ๋กํ๋ค. (์๋์ ํ
์ด๋ธ ์ฐธ๊ณ )
๊ทธ๋ ๋ค๋ฉด, DUS ๋ฐฉ๋ฒ๋ก ์ ์ ํ์์ด, ๋ค๋ฅธ ๋ชจ๋ธ์ ์ ์ฉํ๋ฉด ๋๊ฐ์ ๊ฒฐ๊ณผ๊ฐ ๋ฐ์ํ ์ง ๋๋ฌด๋ ๊ถ๊ธํ๋ค. ๐
์คํ์ ํตํด์ ๋์ ํธ๊ธฐ์ฌ์ ๋ํ ๊ฒฐ๋ก ์ ๋ด๋ ค๋ณด๊ณ ์ ํ๋ค. ๐๐
| Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
| --- | --- | --- | --- | --- | --- | --- | --- |
| [seungduk/KoSOLAR-10.7B-v0.1](https://huggingface.co/seungduk/KoSOLAR-10.7B-v0.1) | **66.04** | 62.03 | 84.54 | 65.56 | 45.03 | 83.58 | 55.50 |
| [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0) | **66.04** | 61.95 | 84.60 | 65.48 | 45.04 | 83.66 | 55.50 |
| [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 60.97 | 59.98 | 83.31 | 64.16 | 42.15 | 78.37 | 37.83 |
> Follow up as [En-link](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
**Method**
Using [Mergekit](https://github.com/cg123/mergekit).
- Korean Pretrain-SOTA (12/30) [seungduk/KoSOLAR-10.7B-v0.1](https://huggingface.co/seungduk/KoSOLAR-10.7B-v0.1)
**Merge config**
๊ธฐ์กด [`SOLAR-10.7B`](https://arxiv.org/pdf/2312.15166.pdf) ๋
ผ๋ฌธ์์๋ mistral-7B layer๋ฅผ `layer-24`์ `layer-8`๋ก ๊ตฌ๋ถํ์ฌ์, `layer-24` 2๊ฐ๋ฅผ ๋ณํฉํ์ฌ ์ด `layer-48`๋ฅผ ์์ฑํ๋ค.
์์ ratio๊ฐ `uses:waste=3:1` ์ด๋ฏ๋ก, ๊ฐ์ ๋น์จ๋ก `seungduk/KoSOLAR-10.7B-v0.1` layer๋ฅผ `layer-36`์ `layer-12`๋ก ๊ตฌ๋ถํ๊ณ , `layer-36` 2๊ฐ๋ฅผ ๋ณํฉํ์ฌ ์ด `layer-72`๋ฅผ ์์ฑ์์ผฐ๋ค.
์์ธํ merge config ์๋์ ๊ฐ๋ค.
```yaml
slices:
- sources:
- model: seungduk/KoSOLAR-10.7B-v0.1
layer_range: [0, 36]
- sources:
- model: seungduk/KoSOLAR-10.7B-v0.1
layer_range: [12, 48]
merge_method: passthrough
dtype: float16
```
> Share all of things. It is my belief.
# **Model Benchmark**
## Open Ko-LLM leaderboard & lm-evaluation-harness(zero-shot)
- Follow up as [Ko-link](https://huggingface.co/spaces/upstage/open-ko-llm-leaderboard).
| Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Ko-CommonGenV2 |
| --- | --- | --- | --- | --- | --- | --- |
| PracticeLLM/Twice-KoSOLAR-16.1B-test | 50.20 | 45.65 | 57.14 | 51.39 | 42.99 | 53.84 |
| [Megastudy/M-SOLAR-10.7B-v1.1-beta](https://huggingface.co/Megastudy/M-SOLAR-10.7B-v1.1-beta) | 55.25 | 51.71 | 60.86 | 54.24 | 47.12 | 62.34 |
| [jjourney1125/M-SOLAR-10.7B-v1.0](https://huggingface.co/jjourney1125/M-SOLAR-10.7B-v1.0) | 55.15 | 49.57 | 60.12 | 54.60 | 49.23 | 62.22 |
| [seungduk/KoSOLAR-10.7B-v0.1](https://huggingface.co/seungduk/KoSOLAR-10.7B-v0.1) | 52.40 | 47.18 | 59.54 | 52.04 | 41.84 | 61.39 |
- Follow up as [beomi/LM-Harness](https://github.com/Beomi/ko-lm-evaluation-harness)
```
gpt2 (pretrained=PracticeLLM/Twice-KoSOLAR-16.1B-test), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
| Task |Version| Metric |Value | |Stderr|
|----------------|------:|--------|-----:|---|-----:|
|kobest_boolq | 0|acc |0.7201|ยฑ |0.0120|
| | |macro_f1|0.7073|ยฑ |0.0124|
|kobest_copa | 0|acc |0.6510|ยฑ |0.0151|
| | |macro_f1|0.6506|ยฑ |0.0151|
|kobest_hellaswag| 0|acc |0.4520|ยฑ |0.0223|
| | |acc_norm|0.5820|ยฑ |0.0221|
| | |macro_f1|0.4475|ยฑ |0.0222|
|kobest_sentineg | 0|acc |0.7078|ยฑ |0.0229|
| | |macro_f1|0.7071|ยฑ |0.0229|
gpt2 (pretrained=Megastudy/M-SOLAR-10.7B-v1.1-beta), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
| Task |Version| Metric |Value | |Stderr|
|----------------|------:|--------|-----:|---|-----:|
|kobest_boolq | 0|acc |0.7137|ยฑ |0.0121|
| | |macro_f1|0.6878|ยฑ |0.0128|
|kobest_copa | 0|acc |0.7060|ยฑ |0.0144|
| | |macro_f1|0.7054|ยฑ |0.0145|
|kobest_hellaswag| 0|acc |0.4620|ยฑ |0.0223|
| | |acc_norm|0.5360|ยฑ |0.0223|
| | |macro_f1|0.4595|ยฑ |0.0223|
|kobest_sentineg | 0|acc |0.7431|ยฑ |0.0220|
| | |macro_f1|0.7295|ยฑ |0.0230|
gpt2 (pretrained=jjourney1125/M-SOLAR-10.7B-v1.0), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
| Task |Version| Metric |Value | |Stderr|
|----------------|------:|--------|-----:|---|-----:|
|kobest_boolq | 0|acc |0.5228|ยฑ |0.0133|
| | |macro_f1|0.3788|ยฑ |0.0097|
|kobest_copa | 0|acc |0.6860|ยฑ |0.0147|
| | |macro_f1|0.6858|ยฑ |0.0147|
|kobest_hellaswag| 0|acc |0.4580|ยฑ |0.0223|
| | |acc_norm|0.5380|ยฑ |0.0223|
| | |macro_f1|0.4552|ยฑ |0.0222|
|kobest_sentineg | 0|acc |0.6474|ยฑ |0.0240|
| | |macro_f1|0.6012|ยฑ |0.0257|
gpt2 (pretrained=yanolja/KoSOLAR-10.7B-v0.1), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
| Task |Version| Metric |Value | |Stderr|
|----------------|------:|--------|-----:|---|-----:|
|kobest_boolq | 0|acc |0.8725|ยฑ |0.0089|
| | |macro_f1|0.8722|ยฑ |0.0089|
|kobest_copa | 0|acc |0.6850|ยฑ |0.0147|
| | |macro_f1|0.6844|ยฑ |0.0147|
|kobest_hellaswag| 0|acc |0.4340|ยฑ |0.0222|
| | |acc_norm|0.5840|ยฑ |0.0221|
| | |macro_f1|0.4296|ยฑ |0.0221|
|kobest_sentineg | 0|acc |0.7506|ยฑ |0.0217|
| | |macro_f1|0.7505|ยฑ |0.0217|
```
## Open EN-LLM leaderboard & lm-evaluation-harness(zero-shot)
- Follow up as [En-link](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
| Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
| --- | --- | --- | --- | --- | --- | --- | --- |
| PracticeLLM/Twice-KoSOLAR-16.1B-test | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| [kyujinpy/Sakura-SOLAR-Instruct](https://huggingface.co/kyujinpy/Sakura-SOLAR-Instruct) | **74.40** | 70.99 | 88.42 | 66.33 | 71.79 | 83.66 | 65.20 |
| [seungduk/KoSOLAR-10.7B-v0.1](https://huggingface.co/seungduk/KoSOLAR-10.7B-v0.1) | 66.04 | 62.03 | 84.54 | 65.56 | 45.03 | 83.58 | 55.50 |
| [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0) | 66.04 | 61.95 | 84.60 | 65.48 | 45.04 | 83.66 | 55.50 |
| [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 60.97 | 59.98 | 83.31 | 64.16 | 42.15 | 78.37 | 37.83 |
- Follow up as [Eleuther/LM-Harness](https://github.com/EleutherAI/lm-evaluation-harness)
```yaml
(will update)
```
# Implementation Code
```python
### KO-Platypus
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
repo = "PracticeLLM/Twice-KoSOLAR-test"
OpenOrca = AutoModelForCausalLM.from_pretrained(
repo,
return_dict=True,
torch_dtype=torch.float16,
device_map='auto'
)
OpenOrca_tokenizer = AutoTokenizer.from_pretrained(repo)
```
--- Refereces (Model Card)
# yanolja/KoSOLAR-10.7B-v0.1
This model is a Korean vocabulary-extended version of [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0), trained on various Korean web-crawled datasets that are publicly available on HuggingFace.
The hypothesis was that while maintaining the original performance of the base model, we could add more tokens to the base model's vocabulary by training the embeddings for the new tokens only. The evaluation results seem to indicate that both English and Korean performances were preserved.
## Model Description
Most parameters of [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0) were frozen except for the embed_tokens layer and the lm_head layer. Embeddings for the existing tokens in those layers were frozen during training. The embeddings for the new tokens have been tuned.
---
# **Meet 10.7B Solar: Elevating Performance with Upstage Depth UP Scaling!**
# **Introduction**
We introduce SOLAR-10.7B, an advanced large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. It's compact, yet remarkably powerful, and demonstrates unparalleled state-of-the-art performance in models with parameters under 30B.
We present a methodology for scaling LLMs called depth up-scaling (DUS) , which encompasses architectural modifications and continued pretraining. In other words, we integrated Mistral 7B weights into the upscaled layers, and finally, continued pre-training for the entire model.
SOLAR-10.7B has remarkable performance. It outperforms models with up to 30B parameters, even surpassing the recent Mixtral 8X7B model. For detailed information, please refer to the experimental table.
Solar 10.7B is an ideal choice for fine-tuning. SOLAR-10.7B offers robustness and adaptability for your fine-tuning needs. Our simple instruction fine-tuning using the SOLAR-10.7B pre-trained model yields significant performance improvements ([SOLAR-10.7B-Instruct-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0)).
For full details of this model please read our [paper](https://arxiv.org/abs/2312.15166). |