dahara1
/

ELYZA-japanese-Llama-2-7b-instruct-AWQ

Text Generation

Model card Files Files and versions Community

dahara1 commited on Sep 11, 2023

Commit

5afa541

•

1 Parent(s): 8434384

Update README.md

Files changed (1) hide show

README.md +8 -5

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ language:
 Original model [elyza/ELYZA-japanese-Llama-2-7b-instruct](https://huggingface.co/elyza/ELYZA-japanese-Llama-2-7b-instruct) which is based on Meta's "Llama 2" and has undergone additional pre-training in Japanese instruction.
-This model is a AWQ quantized(miniaturized to 3.89GB) version of the original model(13.48GB).
 ## Model Details
@@ -26,10 +26,12 @@ But this model has better ability to follow instructions than the previous [GPTQ
 ## Sample Script
 [AWQ version Colab sample A100 only](https://github.com/webbigdata-jp/python_sample/blob/main/ELYZA_japanese_Llama_2_7b_instruct_AWQ_sample.ipynb)
-for local PC script.
 install Library.
 ```
@@ -95,7 +97,7 @@ Output
 ふなっしーはリラックマに船橋を後にするよ
 ```
-### Citations
 This model is based on the work of the following people:
@@ -129,10 +131,11 @@ This model is based on the work of the following people:
 ```
-### about this work
 - **This Quantization work was done by :** [webbigdata](https://webbigdata.jp/)
-### See also
 [mit-han-lab/llm-awq](https://github.com/mit-han-lab/llm-awq)
 [casper-hansen/AutoAWQ](https://github.com/casper-hansen/AutoAWQ)

 Original model [elyza/ELYZA-japanese-Llama-2-7b-instruct](https://huggingface.co/elyza/ELYZA-japanese-Llama-2-7b-instruct) which is based on Meta's "Llama 2" and has undergone additional pre-training in Japanese instruction.
+This model is a [AWQ](https://arxiv.org/abs/2306.00978) quantized(miniaturized to 3.89GB) version of the original model(13.48GB).
 ## Model Details
 ## Sample Script
+### Colab
 [AWQ version Colab sample A100 only](https://github.com/webbigdata-jp/python_sample/blob/main/ELYZA_japanese_Llama_2_7b_instruct_AWQ_sample.ipynb)
+### local PC
 install Library.
 ```
 ふなっしーはリラックマに船橋を後にするよ
 ```
+## Citations
 This model is based on the work of the following people:
 ```
+## about this work
 - **This Quantization work was done by :** [webbigdata](https://webbigdata.jp/)
+## See also
+[AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://arxiv.org/abs/2306.00978)
 [mit-han-lab/llm-awq](https://github.com/mit-han-lab/llm-awq)
 [casper-hansen/AutoAWQ](https://github.com/casper-hansen/AutoAWQ)