Papers
arxiv:2305.18403

LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning

Published on May 28, 2023
Authors:
,
,
,
,
,
,

Abstract

Large pre-trained models (LPMs), such as LLaMA and GLM, have shown exceptional performance across various tasks through fine-tuning. Although low-rank adaption (LoRA) has emerged to cheaply fine-tune these LPMs on downstream tasks, their deployment is still hindered by the vast model scale and computational costs. Neural network pruning offers a way to compress LPMs. However, the current pruning methods designed for LPMs are not compatible with LoRA. This is due to their utilization of unstructured pruning on LPMs, impeding the merging of LoRA weights, or their dependence on the gradients of pre-trained weights to guide pruning, which can impose significant memory overhead. To this end, we propose LoRAPrune, a new framework that delivers an accurate, compact model for efficient inference in a highly memory-effective manner. Specifically, we first design a LoRA-guided pruning criterion, which uses the weights and gradients of LoRA, rather than the gradients of pre-trained weights for importance estimation. We then propose a structured iterative pruning procedure, to remove redundant channels and heads. Extensive experimental results demonstrate the superior performance of our LoRAPrune over existing approaches on the LLaMA series models. For instance, at a 50\% compression rate, LoRAPrune outperforms LLM-Pruner by a perplexity reduction of 8.0 on WikiText2 and 16.05 on PTB datasets, while concurrently reducing memory usage by 52.6\%. The code will be released after review

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2305.18403 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2305.18403 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2305.18403 in a Space README.md to link it from this page.

Collections including this paper 5