What library was used to quantize this model ?

#32

by ImWolf7 - opened Aug 12, 2023

Aug 12, 2023

First of all, I'm a big fan of your work and the support you provide for the community. Thanks for all the work and effort you put into this.
I'm having trouble finetuning this model using auto-GPTQ using peft. so far peft support with autoGPTQ is limited and doesn't support llama-2 finetuning.
I'm looking to actually finetune the original llama-2 model using bitsandbytes and QLoRa and then GPTQ quantize the result.
Thus, the question in the title arises :)
thanks for the help.

RonanMcGovern

Aug 12, 2023

•

edited Aug 12, 2023

There's quantization instructions on the github repo here.

TheBloke also shared a more detailed script for doing multiple quants at once here

I'm in the same boat, I'd prefer to do PEFT with auto-GPTQ, see this issue here because it's slow to do bnb and then have to quantize.

Edit: I also just found this script from HF. Not sure if it has any shortcomings, but seems complete and easy to use.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment