jeremy-costello commited on
Commit
f8f67af
1 Parent(s): 70cf23f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -1,3 +1,17 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ 4-bit quantization of the vicuna-13b-v1.1 model.
5
+
6
+ The delta was added to the original LLaMa weights using FastChat.
7
+ Quantization and inference with GPTQ-For-LLaMa (commit 58c8ab4).
8
+
9
+ Quantization args: $MODEL_DIRECTORY, c4, wbits 4, true-sequential, act-order, groupsize 128.
10
+ Inference args: $MODEL_DIRECTORY, wbits 4, groupsize 128, load $CHECKPOINT_FILE, device=0 (if using GPU)
11
+ You may have to change min_length and max_length for better inference outputs.
12
+
13
+ The separator has been changed to \</s\>. Simple prompt is "Human: $REQUEST\</s\>Assistant:".
14
+
15
+ Delta: https://huggingface.co/lmsys/vicuna-13b-delta-v1.1
16
+ FastChat: https://github.com/lm-sys/FastChat
17
+ GTPQ-for-LLaMa: https://github.com/qwopqwop200/GPTQ-for-LLaMa