metadata
library_name: peft
README for Gemma-2-2B-IT Fine-Tuning with LoRA
This project fine-tunes the Gemma-2-2B-IT
model using LoRA (Low-Rank Adaptation) for Question Answering tasks, leveraging the Wikitext-2
dataset. The fine-tuning process is optimized for efficient training on limited GPU memory by freezing most model parameters and applying LoRA to specific layers.
Project Overview
- Model:
Gemma-2-2B-IT
, a causal language model. - Dataset:
Wikitext-2
for text generation and causal language modeling. - Training Strategy: LoRA adaptation for low-resource fine-tuning.
- Frameworks: Hugging Face
transformers
,peft
, anddatasets
.
Key Features
- LoRA Configuration:
- LoRA is applied to the following projection layers:
q_proj
,k_proj
,v_proj
,o_proj
,gate_proj
,up_proj
, anddown_proj
. - LoRA hyperparameters:
- Rank (
r
): 4 - LoRA Alpha: 8
- Dropout: 0.1
- Rank (
- LoRA is applied to the following projection layers:
- Training Configuration:
- Mixed precision (
fp16
) enabled for faster and more memory-efficient training. - Gradient accumulation with
32
steps to manage large model sizes on small GPUs. - Batch size of 1 due to GPU memory constraints.
- Learning rate:
5e-5
with weight decay:0.01
.
- Mixed precision (
System Requirements
- GPU: Required for efficient training. This script was tested with CUDA-enabled GPUs.
- Python Packages: Install dependencies with:
pip install -r requirements.txt
Notes
- This fine-tuned model leverages LoRA to adapt the large
Gemma-2-2B-IT
model with minimal trainable parameters, allowing fine-tuning even on hardware with limited memory. - The fine-tuned model can be further utilized for tasks like Question Answering, and it is optimized for resource-efficient deployment.
Memory Usage
- The training script includes CUDA memory summaries before and after the training process to monitor GPU memory consumption.