Spaces:
Running
on
L4
Running
on
L4
How to estimate the GPU memory needed to finetune whisper large model?
#124
by
MonoLeon
- opened
I recently use AdaLora to finetune whisper-large-v3. Base on my understanding, the most meomory-consumed parts are model parameter, gradient, and optimizer.
For full layer finetune, the estimate gpu memory needed is around 14.44GB. However, when i use PEFT, the meomory usage is far above this number. I have per_device_train_batch_size=1, gradient_accumulation_steps =2, num_workers=1. What are some important factors missing in the estimation?
Half precision (FP16): ~ 2.89GB 1550M * 2 bytes
AdamW: ~ 8.66GB (3 copies of parameter)
Gradient: ~ 2.89GB