File size: 775 Bytes
4a5203f 7728de5 4a5203f f8f67af 847edef f8f67af 6826888 f8f67af 6826888 f8f67af |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
---
inference: false
---
4-bit quantization of the vicuna-13b-v1.1 model.
The delta was added to the original LLaMa weights using FastChat. \
Quantization and inference with GPTQ-For-LLaMa (commit 58c8ab4).
Quantization args: $MODEL_DIRECTORY, c4, wbits 4, true-sequential, act-order, groupsize 128. \
Inference args: $MODEL_DIRECTORY, wbits 4, groupsize 128, load $CHECKPOINT_FILE \
Add arg device=0 if using GPU for inference. You may have to change min_length and max_length for better inference outputs.
The separator has been changed to \</s\>. Simple prompt is "Human: $REQUEST\</s\>Assistant:".
Delta: https://huggingface.co/lmsys/vicuna-13b-delta-v1.1 \
FastChat: https://github.com/lm-sys/FastChat \
GTPQ-for-LLaMa: https://github.com/qwopqwop200/GPTQ-for-LLaMa
|