LoRA-only GGUF

by ngxson HF staff - opened Aug 4

Aug 4

Hi @mlabonne and thank you for your work.

In case you didn't know, I recently did a refactoring on llama.cpp to improve support for LoRA adapters. A script was also added to convert from PEFT model to GGUF.

It would be nice if abliterated LoRA can have GGUF version. The benefit would be to reduce the distributed model size. For example, an adapter rank=32 for llama-3 (in f16) weights only 176MB see here, the q8_0 is only half of that.

I'd be happy to help if you need. Thank you.

mlabonne

Owner Aug 4

Hey @ngxson I've never quantized a LoRA adapter with GGUF/llama.cpp but here is the repo with the LoRA adapter only: https://huggingface.co/mlabonne/Llama-3-70B-Instruct-abliterated-LORA

ngxson

Aug 4

I'm a bit confused: the link above is llama-3, not llama-3.1 right?

In anyway, the format looks good (although I'm not having a good bandwidth to download the base model - Will try this later)

In the near future, I'll make something like gguf-my-repo but for converting lora, hopefully that will simplify the conversion.

mlabonne

Owner Aug 4

Yes, it is. I provide more details in the model card.

That'd be cool!

ngxson

4 days ago

Hey @mlabonne , the mentioned tool to convert from PEFT model to GGUF is here: https://huggingface.co/blog/ngxson/gguf-my-lora

Could you give it a try? Thank you!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment