jeremy-costello commited on
Commit
6826888
1 Parent(s): 7728de5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -7,12 +7,12 @@ inference: false
7
  The delta was added to the original LLaMa weights using FastChat.
8
  Quantization and inference with GPTQ-For-LLaMa (commit 58c8ab4).
9
 
10
- Quantization args: $MODEL_DIRECTORY, c4, wbits 4, true-sequential, act-order, groupsize 128.
11
- Inference args: $MODEL_DIRECTORY, wbits 4, groupsize 128, load $CHECKPOINT_FILE, device=0 (if using GPU)
12
- You may have to change min_length and max_length for better inference outputs.
13
 
14
  The separator has been changed to \</s\>. Simple prompt is "Human: $REQUEST\</s\>Assistant:".
15
 
16
- Delta: https://huggingface.co/lmsys/vicuna-13b-delta-v1.1
17
- FastChat: https://github.com/lm-sys/FastChat
18
  GTPQ-for-LLaMa: https://github.com/qwopqwop200/GPTQ-for-LLaMa
 
7
  The delta was added to the original LLaMa weights using FastChat.
8
  Quantization and inference with GPTQ-For-LLaMa (commit 58c8ab4).
9
 
10
+ Quantization args: $MODEL_DIRECTORY, c4, wbits 4, true-sequential, act-order, groupsize 128. \
11
+ Inference args: $MODEL_DIRECTORY, wbits 4, groupsize 128, load $CHECKPOINT_FILE \
12
+ Add arg device=0 if using GPU for inference. You may have to change min_length and max_length for better inference outputs.
13
 
14
  The separator has been changed to \</s\>. Simple prompt is "Human: $REQUEST\</s\>Assistant:".
15
 
16
+ Delta: https://huggingface.co/lmsys/vicuna-13b-delta-v1.1 \
17
+ FastChat: https://github.com/lm-sys/FastChat \
18
  GTPQ-for-LLaMa: https://github.com/qwopqwop200/GPTQ-for-LLaMa