dahara1 commited on
Commit
a4cf0fa
1 Parent(s): 947f0b2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -0
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ inference: false
3
+ license: llama2
4
+ language:
5
+ - ja
6
+ - en
7
+ ---
8
+
9
+ # Model Card for Model ID
10
+
11
+ Original model [elyza/ELYZA-japanese-Llama-2-7b-instruct](https://huggingface.co/elyza/ELYZA-japanese-Llama-2-7b-instruct) which is based on Meta's "Llama 2" and has undergone additional pre-training in Japanese, and thier original post-training and speed up tuning.
12
+
13
+ This model is a AWQ quantized(miniaturized to 3.8GB) version of the original model(13.48GB).
14
+
15
+ ## Model Details
16
+
17
+ Currently, this model **only works with the Colab A100** or RTX series. Even though there is enough GPU memory, the output may become abnormal on T4 and V100. The cause of the abnormal output has not yet been determined.
18
+
19
+ Quantization reduces the amount of memory required and improves execution speed, but unfortunately performance deteriorates.
20
+
21
+ In particular, the original model is tuned for the purpose of strengthening the ability to follow Japanese instructions, not as a benchmark.
22
+
23
+ Although the ability to follow instructions cannot be measured using existing automated benchmarks, we have confirmed that quantized model significantly deteriorates the ability to follow instructions.
24
+
25
+ This model has better ability to follow instructions than the [GPTQ version](https://huggingface.co/dahara1/ELYZA-japanese-Llama-2-7b-fast-instruct-GPTQ).
26
+
27
+ ## Sample Script
28
+
29
+ [AWQ version Colab sample A100 only](https://github.com/webbigdata-jp/python_sample/blob/main/ELYZA_japanese_Llama_2_7b_instruct_AWQ_sample.ipynb)
30
+
31
+
32
+ local PC
33
+
34
+ install Library.
35
+ ```
36
+ pip install autoawq
37
+
38
+ ```
39
+ from awq import AutoAWQForCausalLM
40
+ from transformers import AutoTokenizer
41
+
42
+ quant_path = 'dahara1/ELYZA-japanese-Llama-2-7b-instruct-AWQ'
43
+ quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4 }
44
+
45
+ quantized_model_dir = "ELYZA-japanese-Llama-2-7b-instruct-AWQ"
46
+ quant_file = "awq_model_w4_g128.pt"
47
+
48
+ model = AutoAWQForCausalLM.from_quantized(quantized_model_dir, quant_file)
49
+ tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, trust_remote_code=True)
50
+
51
+ B_INST, E_INST = "[INST]", "[/INST]"
52
+ B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
53
+ DEFAULT_SYSTEM_PROMPT = "あなたは誠実で優秀な日本人のアシスタントです。"
54
+ elyza_tasks_100_over_4score_prompt = [
55
+ """リラックマが船橋市に行ってふなっしーと強敵(トモ)になり、最終的にはトー横に住みつくというプロットの短編小説を劇画風文体で書いてみてください。""",
56
+ ]
57
+
58
+ for i in range(len(elyza_tasks_100_over_4score_prompt)):
59
+ prompt = "{bos_token}{b_inst} {system}{prompt} {e_inst} ".format(
60
+ bos_token=tokenizer.bos_token,
61
+ b_inst=B_INST,
62
+ system=f"{B_SYS}{DEFAULT_SYSTEM_PROMPT}{E_SYS}",
63
+ prompt=elyza_tasks_100_over_4score_prompt[i],
64
+ e_inst=E_INST,
65
+ )
66
+
67
+ tokens = tokenizer(prompt, return_tensors="pt").to("cuda:0").input_ids
68
+ output = model.generate(
69
+ input_ids=tokens,
70
+ max_new_tokens=256,
71
+ pad_token_id=tokenizer.pad_token_id,
72
+ eos_token_id=tokenizer.eos_token_id)
73
+ print(tokenizer.decode(output[0]))
74
+ ```
75
+
76
+ Output
77
+ ```
78
+ <s><s> [INST] <<SYS>>
79
+ あなたは誠実で優秀な日本人のアシスタントです。
80
+ <</SYS>>
81
+
82
+ リラックマが船橋市に行ってふなっしーと強敵(トモ)になり、最終的にはトー横に住みつくというプロットの短編小説を劇画風文体で書いてみてください。 [/INST] リラックマが船橋市にやってきた。
83
+ 彼はふなっしーと強敵(トモ)になるために、船橋競艇場へと向かった。
84
+
85
+ ふなっしーはリラックマの登場に驚いたが、すぐに強気のレースを展開した。
86
+
87
+ リラックマはその走りに感化され、自身も熱くなっていく。
88
+
89
+ ふなっしーは最終周回で逆転を狙うが、リラックマはそれをかわして優勝を飾った。
90
+
91
+ ふなっしーは敗北を喫しながらも、リラックマの強さを認める。
92
+
93
+ ふなっしーはリラックマに船橋を後にするよ
94
+ ```
95
+
96
+ ## See also
97
+ [casper-hansen/AutoAWQ](https://github.com/casper-hansen/AutoAWQ)