LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Abstract
Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning. In this work we focus on the scenario where quantization and LoRA fine-tuning are applied together on a pre-trained model. In such cases it is common to observe a consistent gap in the performance on downstream tasks between full fine-tuning and quantization plus LoRA fine-tuning approach. In response, we propose LoftQ (LoRA-Fine-Tuning-aware Quantization), a novel quantization framework that simultaneously quantizes an LLM and finds a proper low-rank initialization for LoRA fine-tuning. Such an initialization alleviates the discrepancy between the quantized and full-precision model and significantly improves the generalization in downstream tasks. We evaluate our method on natural language understanding, question answering, summarization, and natural language generation tasks. Experiments show that our method is highly effective and outperforms existing quantization methods, especially in the challenging 2-bit and 2/4-bit mixed precision regimes. We will release our code.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models (2023)
- NOLA: Networks as Linear Combination of Low Rank Random Basis (2023)
- Norm Tweaking: High-performance Low-bit Quantization of Large Language Models (2023)
- LoRA ensembles for large language model fine-tuning (2023)
- ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
If I understand this paper correctly, we can do quantized fintuning with LoftQ but the quantized weight we obtain will always be specific to the dataset that was used for the finetuning. We cannot train an adapter a1 on frozen model M and later train an adapter a2 on (M+a1) both frozen in let say 8bit or 4bit right
LoftQ is not task-specific. You can fine-tune any dataset with the same quantized model M and initial adapter a0. Your requirement of training an adapter a2 on (M+a1) is definitely feasible.
Moreover, LoftQ supports multi-task learning, which is the original motivation of LoRA. With, again, the same quantized model M and initial adapter a0, you can obtain many adapters a1, a2, ..., an, from different datasets, and plug each of them into the same quantized model M for deployment.
Hi Team,
LoftQ can be used for Vision Foundation models like OWL-ViT v2 and Grounding Dino?
Reference code regarding this will be helpful.
thanks
Models citing this paper 25
Browse 25 models citing this paperDatasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper