--- inference: false license: llama2 language: - ja - en --- # Model Card for Model ID Original model [elyza/ELYZA-japanese-Llama-2-7b-instruct](https://huggingface.co/elyza/ELYZA-japanese-Llama-2-7b-instruct) which is based on Meta's "Llama 2" and has undergone additional pre-training in Japanese, and thier original post-training and speed up tuning. This model is a AWQ quantized(miniaturized to 3.8GB) version of the original model(13.48GB). ## Model Details Currently, this model **only works with the Colab A100** or RTX series. Even though there is enough GPU memory, the output may become abnormal on T4 and V100. The cause of the abnormal output has not yet been determined. Quantization reduces the amount of memory required and improves execution speed, but unfortunately performance deteriorates. In particular, the original model is tuned for the purpose of strengthening the ability to follow Japanese instructions, not as a benchmark. Although the ability to follow instructions cannot be measured using existing automated benchmarks, we have confirmed that quantized model significantly deteriorates the ability to follow instructions. This model has better ability to follow instructions than the [GPTQ version](https://huggingface.co/dahara1/ELYZA-japanese-Llama-2-7b-fast-instruct-GPTQ). ## Sample Script [AWQ version Colab sample A100 only](https://github.com/webbigdata-jp/python_sample/blob/main/ELYZA_japanese_Llama_2_7b_instruct_AWQ_sample.ipynb) local PC install Library. ``` pip install autoawq ``` from awq import AutoAWQForCausalLM from transformers import AutoTokenizer quant_path = 'dahara1/ELYZA-japanese-Llama-2-7b-instruct-AWQ' quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4 } quantized_model_dir = "ELYZA-japanese-Llama-2-7b-instruct-AWQ" quant_file = "awq_model_w4_g128.pt" model = AutoAWQForCausalLM.from_quantized(quantized_model_dir, quant_file) tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, trust_remote_code=True) B_INST, E_INST = "[INST]", "[/INST]" B_SYS, E_SYS = "<>\n", "\n<>\n\n" DEFAULT_SYSTEM_PROMPT = "あなたは誠実で優秀な日本人のアシスタントです。" elyza_tasks_100_over_4score_prompt = [ """リラックマが船橋市に行ってふなっしーと強敵(トモ)になり、最終的にはトー横に住みつくというプロットの短編小説を劇画風文体で書いてみてください。""", ] for i in range(len(elyza_tasks_100_over_4score_prompt)): prompt = "{bos_token}{b_inst} {system}{prompt} {e_inst} ".format( bos_token=tokenizer.bos_token, b_inst=B_INST, system=f"{B_SYS}{DEFAULT_SYSTEM_PROMPT}{E_SYS}", prompt=elyza_tasks_100_over_4score_prompt[i], e_inst=E_INST, ) tokens = tokenizer(prompt, return_tensors="pt").to("cuda:0").input_ids output = model.generate( input_ids=tokens, max_new_tokens=256, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id) print(tokenizer.decode(output[0])) ``` Output ``` [INST] <> あなたは誠実で優秀な日本人のアシスタントです。 <> リラックマが船橋市に行ってふなっしーと強敵(トモ)になり、最終的にはトー横に住みつくというプロットの短編小説を劇画風文体で書いてみてください。 [/INST] リラックマが船橋市にやってきた。 彼はふなっしーと強敵(トモ)になるために、船橋競艇場へと向かった。 ふなっしーはリラックマの登場に驚いたが、すぐに強気のレースを展開した。 リラックマはその走りに感化され、自身も熱くなっていく。 ふなっしーは最終周回で逆転を狙うが、リラックマはそれをかわして優勝を飾った。 ふなっしーは敗北を喫しながらも、リラックマの強さを認める。 ふなっしーはリラックマに船橋を後にするよ ``` ## See also [casper-hansen/AutoAWQ](https://github.com/casper-hansen/AutoAWQ)