OOM when finetuning with lora.

#1
by zokica - opened

Not sure if this applies to just this model or also official version of gemma2, but when doing Peft finetuning, I always get OOM error at the time when the model gets saved.

There is enough memory for sure, 48GB card, uses less than 20GB and then a huge spike when saving the model. Does not happen even with 9B model.

#############
OutOfMemoryError: CUDA out of memory. Tried to allocate 15.26 GiB. GPU 0 has a total capacty of 47.40 GiB of which 11.35 GiB is free. Process 2397559 has 35.67 GiB memory in use. Of the allocated memory 34.28 GiB is allocated by PyTorch, and 903.30 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
###################################

Yes, the original model does the same.

Works with changed params:

optim="paged_adamw_8bit",
evaluation_strategy="no"
do_eval=False,

Unsloth AI org

Works with changed params:

optim="paged_adamw_8bit",
evaluation_strategy="no"
do_eval=False,

oh so that was the problem im guessing? glad you got it solved

Same problem here, old Gemma 2b wouldnt go OOM for LoRA though. Questioning why normal adamw should have these kinda issues.

Unsloth AI org

It's mostly because of Flash Attention is not installed for Gemma models. Please update Unsloth and it will tell you to install Flash Attention

Sign up or log in to comment