casperhansen commited on
Commit
7a52ff4
1 Parent(s): 5c660fe

Update README

Browse files
Files changed (1) hide show
  1. README.md +34 -1
README.md CHANGED
@@ -1,3 +1,36 @@
1
  ---
2
- license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ {}
3
  ---
4
+
5
+ # XGen-7B-8K-Inst-AWQ
6
+
7
+ This model is originally released under Apache 2.0, and the AWQ weights are MIT licensed.
8
+
9
+ Runs at ~50 tokens/s on a 4090 with 4-bit inference.
10
+
11
+ Original model can be found at [https://huggingface.co/mosaicml/mpt-7b-8k-chat](https://huggingface.co/mosaicml/mpt-7b-8k-chat).
12
+
13
+ ## How to run
14
+
15
+ You need to follow the build instructions in [https://github.com/mit-han-lab/llm-awq](https://github.com/mit-han-lab/llm-awq) to use this model.
16
+
17
+ ```sh
18
+ pip install einops
19
+ ```
20
+
21
+ ```sh
22
+ hfuser="casperhansen"
23
+ model_name="mpt-7b-8k-chat-awq"
24
+ group_size=128
25
+ repo_path="$hfuser/$model_name"
26
+ model_path="/workspace/llm-awq/$model_name"
27
+ quantized_model_path="/workspace/llm-awq/$model_name/$model_name-w4-g$group_size.pt"
28
+
29
+ git clone https://huggingface.co/$repo_path
30
+
31
+ python3 tinychat/demo.py --model_type mpt \
32
+ --model_path $model_path \
33
+ --q_group_size $group_size \
34
+ --load_quant $quantized_model_path \
35
+ --precision W4A16
36
+ ```