casperhansen
/

mpt-7b-8k-chat-awq

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

casperhansen commited on Jul 26, 2023

Commit

7a52ff4

•

1 Parent(s): 5c660fe

Update README

Files changed (1) hide show

README.md +34 -1

README.md CHANGED Viewed

@@ -1,3 +1,36 @@
 ---
-license: mit
 ---

 ---
+{}
 ---
+# XGen-7B-8K-Inst-AWQ
+This model is originally released under Apache 2.0, and the AWQ weights are MIT licensed.
+Runs at ~50 tokens/s on a 4090 with 4-bit inference.
+Original model can be found at [https://huggingface.co/mosaicml/mpt-7b-8k-chat](https://huggingface.co/mosaicml/mpt-7b-8k-chat).
+## How to run
+You need to follow the build instructions in [https://github.com/mit-han-lab/llm-awq](https://github.com/mit-han-lab/llm-awq) to use this model.
+```sh
+pip install einops
+```
+```sh
+hfuser="casperhansen"
+model_name="mpt-7b-8k-chat-awq"
+group_size=128
+repo_path="$hfuser/$model_name"
+model_path="/workspace/llm-awq/$model_name"
+quantized_model_path="/workspace/llm-awq/$model_name/$model_name-w4-g$group_size.pt"
+git clone https://huggingface.co/$repo_path
+python3 tinychat/demo.py --model_type mpt \
+    --model_path $model_path \
+    --q_group_size $group_size \
+    --load_quant $quantized_model_path \
+    --precision W4A16
+```