--- library_name: peft --- ### README for Gemma-2-2B-IT Fine-Tuning with LoRA This project fine-tunes the `Gemma-2-2B-IT` model using **LoRA (Low-Rank Adaptation)** for Question Answering tasks, leveraging the `Wikitext-2` dataset. The fine-tuning process is optimized for efficient training on limited GPU memory by freezing most model parameters and applying LoRA to specific layers. ### Project Overview - **Model**: `Gemma-2-2B-IT`, a causal language model. - **Dataset**: `Wikitext-2` for text generation and causal language modeling. - **Training Strategy**: LoRA adaptation for low-resource fine-tuning. - **Frameworks**: Hugging Face `transformers`, `peft`, and `datasets`. ### Key Features - **LoRA Configuration**: - LoRA is applied to the following projection layers: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, and `down_proj`. - LoRA hyperparameters: - Rank (`r`): 4 - LoRA Alpha: 8 - Dropout: 0.1 - **Training Configuration**: - Mixed precision (`fp16`) enabled for faster and more memory-efficient training. - Gradient accumulation with `32` steps to manage large model sizes on small GPUs. - Batch size of 1 due to GPU memory constraints. - Learning rate: `5e-5` with weight decay: `0.01`. ### System Requirements - **GPU**: Required for efficient training. This script was tested with CUDA-enabled GPUs. - **Python Packages**: Install dependencies with: ```bash pip install -r requirements.txt ``` ### Notes - This fine-tuned model leverages LoRA to adapt the large `Gemma-2-2B-IT` model with minimal trainable parameters, allowing fine-tuning even on hardware with limited memory. - The fine-tuned model can be further utilized for tasks like Question Answering, and it is optimized for resource-efficient deployment. ### Memory Usage - The training script includes CUDA memory summaries before and after the training process to monitor GPU memory consumption.