Is there a way to finetune the model?
Impressive work at impressive speed, thank you for the this amazing work.
I just wanted to know is there a way to Finetune this quantized model on a custom dataset.
Fine tuning on an already-quantised GPTQ model is possible, but it's a bit complex right now. There is a project called Alpaca Lora 4bit which provides it, but I've never tried it.
AutoGPTQ, which is the latest and best GPTQ repo, has a PR to add PEFT support which will allow this. It's not released yet and I've not tried it myself, but you could investigate that. https://github.com/PanQiWei/AutoGPTQ/pull/102
Or you could look into QLoRA, which is the method that was used to train this model. So you would download guanaco-65B-HF and then fine tune it in 4bit using QLoRA. Exactly how they made Guanaco in the first place, but using Guanaco as a base for further tuning.
Note that you'll need a 48+ GB GPU to train a 65B model in 4bit, like an A6000 or L40 or A100.
Someone has released a LoRA for this model finetuned on medical domain jargon: https://huggingface.co/nmitchko/medguanaco-lora-65b-GPTQ so there is a way to do it for sure.
More info: https://old.reddit.com/r/LocalLLaMA/comments/13zlcva/medguanacolora65bgptq/