Abstract
Low-rank adapation (LoRA) is a popular method that reduces the number of trainable parameters when finetuning large language models, but still faces acute storage challenges when scaling to even larger models or deploying numerous per-user or per-task adapted models. In this work, we present Vector-based Random Matrix Adaptation (VeRA), which reduces the number of trainable parameters by 10x compared to LoRA, yet maintains the same performance. It achieves this by using a single pair of low-rank matrices shared across all layers and learning small scaling vectors instead. We demonstrate its effectiveness on the GLUE and E2E benchmarks, and show its application in instruction-following with just 1.4M parameters using the Llama2 7B model.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- NOLA: Networks as Linear Combination of Low Rank Random Basis (2023)
- Decomposed Prompt Tuning via Low-Rank Reparameterization (2023)
- IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning (2023)
- LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models (2023)
- Hydra: Multi-head Low-rank Adaptation for Parameter Efficient Fine-tuning (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper