What library was used to quantize this model ?
First of all, I'm a big fan of your work and the support you provide for the community. Thanks for all the work and effort you put into this.
I'm having trouble finetuning this model using auto-GPTQ using peft. so far peft support with autoGPTQ is limited and doesn't support llama-2 finetuning.
I'm looking to actually finetune the original llama-2 model using bitsandbytes and QLoRa and then GPTQ quantize the result.
Thus, the question in the title arises :)
thanks for the help.
There's quantization instructions on the github repo here.
TheBloke also shared a more detailed script for doing multiple quants at once here
I'm in the same boat, I'd prefer to do PEFT with auto-GPTQ, see this issue here because it's slow to do bnb and then have to quantize.
Edit: I also just found this script from HF. Not sure if it has any shortcomings, but seems complete and easy to use.