File size: 7,803 Bytes
476823f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
---
thumbnail: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rinna.png
license: llama3
datasets:
- CohereForAI/aya_dataset
- kunishou/databricks-dolly-15k-ja
- kunishou/HelpSteer-35k-ja
- kunishou/HelpSteer2-20k-ja
- kunishou/hh-rlhf-49k-ja
- kunishou/oasst1-chat-44k-ja
- kunishou/oasst2-chat-68k-ja
- meta-math/MetaMathQA
- OpenAssistant/oasst1
- OpenAssistant/oasst2
- sahil2801/CodeAlpaca-20k
language:
- ja
- en
tags:
- llama
- llama-3
inference: false
base_model:
- rinna/llama-3-youko-8b
- meta-llama/Meta-Llama-3-8B
- meta-llama/Meta-Llama-3-8B-Instruct
base_model_relation: merge
---
# `Llama 3 Youko 8B Instruct (rinna/llama-3-youko-8b-instruct)`
![rinna-icon](./rinna.png)
# Overview
The model is the instruction-tuned version of [rinna/llama-3-youko-8b](https://huggingface.co/rinna/llama-3-youko-8b), using supervised fine-tuning (SFT), Chat Vector, and direct preference optimization (DPO). It adpots the Llama-3 chat format.
| Size | Continual Pre-Training | Instruction-Tuning |
| :- | :- | :- |
| 8B | Llama 3 Youko 8B [[HF]](https://huggingface.co/rinna/llama-3-youko-8b) [[GPTQ]](https://huggingface.co/rinna/llama-3-youko-8b-gptq) | Llama 3 Youko 8B Instruct [[HF]](https://huggingface.co/rinna/llama-3-youko-8b-instruct) [[GPTQ]](https://huggingface.co/rinna/llama-3-youko-8b-instruct-gptq) |
| 70B | Llama 3 Youko 70B [[HF]](https://huggingface.co/rinna/llama-3-youko-70b) [[GPTQ]](https://huggingface.co/rinna/llama-3-youko-70b-gptq) | Llama 3 Youko 70B Instruct [[HF]](https://huggingface.co/rinna/llama-3-youko-70b-instruct) [[GPTQ]](https://huggingface.co/rinna/llama-3-youko-70b-instruct-gptq) |
* **Model architecture**
A 32-layer, 4096-hidden-size transformer-based language model. Refer to the [Llama 3 Model Card](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md) for architecture details.
* **Training: Built with Meta Llama 3**
**Supervised fine-tuning.** The supervised fine-tuning data is a subset of the following datasets.
- [CohereForAI/aya_dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset)
- The JPN subset was used.
- [FLAN](https://github.com/google-research/FLAN/tree/main/flan/v2)
- [kunishou/databricks-dolly-15k-ja](https://huggingface.co/datasets/kunishou/databricks-dolly-15k-ja)
- [kunishou/hh-rlhf-49k-ja](https://huggingface.co/datasets/kunishou/hh-rlhf-49k-ja)
- [kunishou/oasst1-chat-44k-ja](https://huggingface.co/datasets/kunishou/oasst1-chat-44k-ja)
- [kunishou/oasst2-chat-68k-ja](https://huggingface.co/datasets/kunishou/oasst2-chat-68k-ja)
- [meta-math/MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA)
- The following sections were used: MATH_AnsAug, MATH_Rephrased, MATH_SV, and MATH_FOBAR.
- The remaining sections, containing augmented data from commonly used evaluation corpora, were skipped for preventing any possibility of data leak.
- [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1)
- The EN and JA subsets were used.
- [OpenAssistant/oasst2](https://huggingface.co/datasets/OpenAssistant/oasst2)
- The EN and JA subsets were used.
- [sahil2801/CodeAlpaca-20k](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k)
- rinna Dataset
**Model merging.** The fine-tuned model (llama-3-youko-8b-sft) has been enhanced through the following chat vector addition. The chat vector was obtained by subtracting the parameter vectors of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) from those of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
~~~~text
llama-3-youko-8b-sft + 0.5 * (meta-llama/Meta-Llama-3-8B-Instruct - meta-llama/Meta-Llama-3-8B)
~~~~
Here, the embedding layer was skipped while subtracting and adding the parameter vectors.
**Direct preference optimization** was then applied with a subset of the following datasets to build this instruct model.
- [kunishou/HelpSteer-35k-ja](https://huggingface.co/datasets/kunishou/HelpSteer-35k-ja)
- [kunishou/HelpSteer2-20k-ja](https://huggingface.co/datasets/kunishou/HelpSteer2-20k-ja)
- rinna Dataset
* **Contributors**
- [Xinqi Chen](https://huggingface.co/Keely0419)
- [Koh Mitsuda](https://huggingface.co/mitsu-koh)
- [Toshiaki Wakatsuki](https://huggingface.co/t-w)
- [Kei Sawada](https://huggingface.co/keisawada)
---
# Benchmarking
Please refer to [rinna's LM benchmark page](https://rinnakk.github.io/research/benchmarks/lm/index.html).
---
# How to use the model
We found this instruction-tuned model tends to generate repeated text more often than its base counterpart, and thus we set repetition_penalty=1.1 for better generation performance. The same repetition penalty was applied to the instruction-tuned model in the aforementioned evaluation experiments.
~~~~python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "rinna/llama-3-youko-8b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "あなたは誠実で優秀なアシスタントです。どうか、簡潔かつ正直に答えてください。"},
{"role": "user", "content": "西田幾多郎とはどんな人物ですか?"},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.convert_tokens_to_ids("<|end_of_text|>"),
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(
input_ids,
max_new_tokens=512,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
repetition_penalty=1.1,
)
response = outputs[0][input_ids.shape[-1]:]
response = tokenizer.decode(response, skip_special_tokens=True)
print(response)
~~~~
---
# Tokenization
The model uses the original [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) tokenizer.
---
# How to cite
```bibtex
@misc{rinna-llama-3-youko-8b-instruct,
title = {rinna/llama-3-youko-8b-instruct},
author = {Chen, Xinqi and Mitsuda, Koh and Wakatsuki, Toshiaki and Sawada, Kei},
url = {https://huggingface.co/rinna/llama-3-youko-8b-instruct}
}
@inproceedings{sawada2024release,
title = {Release of Pre-Trained Models for the {J}apanese Language},
author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
month = {5},
year = {2024},
pages = {13898--13905},
url = {https://aclanthology.org/2024.lrec-main.1213},
note = {\url{https://arxiv.org/abs/2404.01657}}
}
```
---
# References
```bibtex
@article{llama3modelcard,
title = {Llama 3 Model Card},
author = {AI@Meta},
year = {2024},
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
}
@article{huang2023chat,
title = {Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages},
author = {Huang, Shih-Cheng and Li, Pin-Zu and Hsu, Yu-Chi and Chen, Kuang-Ming and Lin, Yu Tung and Hsiao, Shih-Kai and Tzong-Han Tsai, Richard and Lee, Hung-yi},
year = {2023},
url = {https://arxiv.org/abs/2310.04799}
}
```
---
# License
[Meta Llama 3 Community License](https://llama.meta.com/llama3/license/) |