Quantization made by Richard Erkhov. [Github](https://github.com/RichardErkhov) [Discord](https://discord.gg/pvy7H8DZMG) [Request more models](https://github.com/RichardErkhov/quant_request) youri-7b-chat - GGUF - Model creator: https://huggingface.co/rinna/ - Original model: https://huggingface.co/rinna/youri-7b-chat/ | Name | Quant method | Size | | ---- | ---- | ---- | | [youri-7b-chat.Q2_K.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.Q2_K.gguf) | Q2_K | 2.36GB | | [youri-7b-chat.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.IQ3_XS.gguf) | IQ3_XS | 2.6GB | | [youri-7b-chat.IQ3_S.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.IQ3_S.gguf) | IQ3_S | 2.75GB | | [youri-7b-chat.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.Q3_K_S.gguf) | Q3_K_S | 2.75GB | | [youri-7b-chat.IQ3_M.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.IQ3_M.gguf) | IQ3_M | 2.9GB | | [youri-7b-chat.Q3_K.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.Q3_K.gguf) | Q3_K | 3.07GB | | [youri-7b-chat.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.Q3_K_M.gguf) | Q3_K_M | 3.07GB | | [youri-7b-chat.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.Q3_K_L.gguf) | Q3_K_L | 3.35GB | | [youri-7b-chat.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.IQ4_XS.gguf) | IQ4_XS | 3.4GB | | [youri-7b-chat.Q4_0.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.Q4_0.gguf) | Q4_0 | 3.56GB | | [youri-7b-chat.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.IQ4_NL.gguf) | IQ4_NL | 3.58GB | | [youri-7b-chat.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.Q4_K_S.gguf) | Q4_K_S | 3.59GB | | [youri-7b-chat.Q4_K.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.Q4_K.gguf) | Q4_K | 3.8GB | | [youri-7b-chat.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.Q4_K_M.gguf) | Q4_K_M | 3.8GB | | [youri-7b-chat.Q4_1.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.Q4_1.gguf) | Q4_1 | 3.95GB | | [youri-7b-chat.Q5_0.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.Q5_0.gguf) | Q5_0 | 4.33GB | | [youri-7b-chat.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.Q5_K_S.gguf) | Q5_K_S | 4.33GB | | [youri-7b-chat.Q5_K.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.Q5_K.gguf) | Q5_K | 4.45GB | | [youri-7b-chat.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.Q5_K_M.gguf) | Q5_K_M | 4.45GB | | [youri-7b-chat.Q5_1.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.Q5_1.gguf) | Q5_1 | 4.72GB | | [youri-7b-chat.Q6_K.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.Q6_K.gguf) | Q6_K | 5.15GB | | [youri-7b-chat.Q8_0.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-chat-gguf/blob/main/youri-7b-chat.Q8_0.gguf) | Q8_0 | 6.67GB | Original model description: --- language: - ja - en license: llama2 datasets: - databricks/databricks-dolly-15k - kunishou/databricks-dolly-15k-ja - izumi-lab/llm-japanese-dataset thumbnail: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rinna.png inference: false model-index: - name: youri-7b-chat results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 51.19 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rinna/youri-7b-chat name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 76.09 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rinna/youri-7b-chat name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 46.06 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rinna/youri-7b-chat name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 41.17 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rinna/youri-7b-chat name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 75.06 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rinna/youri-7b-chat name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 1.52 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rinna/youri-7b-chat name: Open LLM Leaderboard --- # `rinna/youri-7b-chat` ![rinna-icon](./rinna.png) # Overview The model is the instruction-tuned version of [`rinna/youri-7b`](https://huggingface.co/rinna/youri-7b). It adopts a chat-style input format. * **Model architecture** A 32-layer, 4096-hidden-size transformer-based language model. Refer to the [llama2 paper](https://arxiv.org/abs/2307.09288) for architecture details. * **Fine-tuning** The fine-tuning data is the subset of the following datasets. * [Databricks Dolly data](https://huggingface.co/datasets/databricks/databricks-dolly-15k) * [Japanese Databricks Dolly data](https://huggingface.co/datasets/kunishou/databricks-dolly-15k-ja) * [Anthropic HH RLHF data](https://huggingface.co/datasets/Anthropic/hh-rlhf) and its Japanese translation * [FLAN Instruction Tuning data](https://github.com/google-research/FLAN) and its Japanese translation * [Izumi lab LLM Japanese dataset](https://github.com/masanorihirano/llm-japanese-dataset/tree/main) * The following sections are used * alt * aozora-txt * CourseraParallel * ParaNatCom * Tab-delimited_Bilingual_Sentence_Pairs * tanaka-corpus * wikinews * wordnet * yasashi-japanese * The [remaining sections](https://github.com/masanorihirano/llm-japanese-dataset/tree/main/datasets-cc-by-sa) contain commonly used evaluation corpora so they are skipped to prevent data leak. * **Contributors** - [Tianyu Zhao](https://huggingface.co/tianyuz) - [Kei Sawada](https://huggingface.co/keisawada) --- # Benchmarking Please refer to [rinna's LM benchmark page](https://rinnakk.github.io/research/benchmarks/lm/index.html). --- # How to use the model ~~~~python import torch from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("rinna/youri-7b-chat") model = AutoModelForCausalLM.from_pretrained("rinna/youri-7b-chat") if torch.cuda.is_available(): model = model.to("cuda") instruction = "次の日本語を英語に翻訳してください。" input = "自然言語による指示に基づきタスクが解けるよう学習させることを Instruction tuning と呼びます。" context = [ { "speaker": "設定", "text": instruction }, { "speaker": "ユーザー", "text": input } ] prompt = [ f"{uttr['speaker']}: {uttr['text']}" for uttr in context ] prompt = "\n".join(prompt) prompt = ( prompt + "\n" + "システム: " ) token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt") with torch.no_grad(): output_ids = model.generate( token_ids.to(model.device), max_new_tokens=200, do_sample=True, temperature=0.5, pad_token_id=tokenizer.pad_token_id, bos_token_id=tokenizer.bos_token_id, eos_token_id=tokenizer.eos_token_id ) output = tokenizer.decode(output_ids.tolist()[0]) print(output) """ 設定: 次の日本語を英語に翻訳してください。 ユーザー: 自然言語による指示に基づきタスクが解けるよう学習させることを Instruction tuning と呼びます。 システム: Learning to solve tasks based on natural language instructions is called instruction tuning. """ output = output[len(prompt):-len("")].strip() input = "大規模言語モデル(だいきぼげんごモデル、英: large language model、LLM)は、多数のパラメータ(数千万から数十億)を持つ人工ニューラルネットワークで構成されるコンピュータ言語モデルで、膨大なラベルなしテキストを使用して自己教師あり学習または半教師あり学習によって訓練が行われる。" context.extend([ { "speaker": "システム", "text": output }, { "speaker": "ユーザー", "text": input } ]) prompt = [ f"{uttr['speaker']}: {uttr['text']}" for uttr in context ] prompt = "\n".join(prompt) prompt = ( prompt + "\n" + "システム: " ) token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt") with torch.no_grad(): output_ids = model.generate( token_ids.to(model.device), max_new_tokens=200, do_sample=True, temperature=0.5, pad_token_id=tokenizer.pad_token_id, bos_token_id=tokenizer.bos_token_id, eos_token_id=tokenizer.eos_token_id ) output = tokenizer.decode(output_ids.tolist()[0]) print(output) """ 設定: 次の日本語を英語に翻訳してください。 ユーザー: 自然言語による指示に基づきタスクが解けるよう学習させることを Instruction tuning と呼びます。 システム: Learning to solve tasks based on natural language instructions is called instruction tuning. ユーザー: 大規模言語モデル(だいきぼげんごモデル、英: large language model、LLM)は、多数のパラメータ(数千万から数十億)を持つ人工ニューラルネットワークで構成されるコンピュータ言語モデルで、膨大なラベルなしテ キストを使用して自己教師あり学習または半教師あり学習によって訓練が行われる。 システム: Large language models (LLMs) are computer language models consisting of a deep artificial neural network with millions to billions of parameters that are trained by self-supervised learning or semi-supervised learning using vast unlabeled text corpora. """ ~~~~ --- # Tokenization The model uses the original llama-2 tokenizer. --- # How to cite ~~~ @misc{rinna-youri-7b-chat, title = {rinna/youri-7b-chat}, author={Zhao, Tianyu and Sawada, Kei} url = {https://huggingface.co/rinna/youri-7b-chat}, } @inproceedings{sawada2024release, title = {Release of Pre-Trained Models for the {J}apanese Language}, author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh}, booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)}, month = {5}, year = {2024}, url = {https://arxiv.org/abs/2404.01657}, } ~~~ --- # License [The llama2 license](https://ai.meta.com/llama/license/) # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_rinna__youri-7b-chat) | Metric |Value| |---------------------------------|----:| |Avg. |48.51| |AI2 Reasoning Challenge (25-Shot)|51.19| |HellaSwag (10-Shot) |76.09| |MMLU (5-Shot) |46.06| |TruthfulQA (0-shot) |41.17| |Winogrande (5-shot) |75.06| |GSM8k (5-shot) | 1.52|