casperhansen
commited on
Commit
•
7a52ff4
1
Parent(s):
5c660fe
Update README
Browse files
README.md
CHANGED
@@ -1,3 +1,36 @@
|
|
1 |
---
|
2 |
-
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
{}
|
3 |
---
|
4 |
+
|
5 |
+
# XGen-7B-8K-Inst-AWQ
|
6 |
+
|
7 |
+
This model is originally released under Apache 2.0, and the AWQ weights are MIT licensed.
|
8 |
+
|
9 |
+
Runs at ~50 tokens/s on a 4090 with 4-bit inference.
|
10 |
+
|
11 |
+
Original model can be found at [https://huggingface.co/mosaicml/mpt-7b-8k-chat](https://huggingface.co/mosaicml/mpt-7b-8k-chat).
|
12 |
+
|
13 |
+
## How to run
|
14 |
+
|
15 |
+
You need to follow the build instructions in [https://github.com/mit-han-lab/llm-awq](https://github.com/mit-han-lab/llm-awq) to use this model.
|
16 |
+
|
17 |
+
```sh
|
18 |
+
pip install einops
|
19 |
+
```
|
20 |
+
|
21 |
+
```sh
|
22 |
+
hfuser="casperhansen"
|
23 |
+
model_name="mpt-7b-8k-chat-awq"
|
24 |
+
group_size=128
|
25 |
+
repo_path="$hfuser/$model_name"
|
26 |
+
model_path="/workspace/llm-awq/$model_name"
|
27 |
+
quantized_model_path="/workspace/llm-awq/$model_name/$model_name-w4-g$group_size.pt"
|
28 |
+
|
29 |
+
git clone https://huggingface.co/$repo_path
|
30 |
+
|
31 |
+
python3 tinychat/demo.py --model_type mpt \
|
32 |
+
--model_path $model_path \
|
33 |
+
--q_group_size $group_size \
|
34 |
+
--load_quant $quantized_model_path \
|
35 |
+
--precision W4A16
|
36 |
+
```
|