openai/whisper · fine-tune whisper with CPU

Jan 25, 2023

Hi all !!!
After reading the great tutorial of @sanchit-gandhi , https://huggingface.co/blog/fine-tune-whisper, i am doing my own models of whisper.
But i have a problem. When i run the train , it ouputs:
***** Running training *****
Num examples = 9079
Num Epochs = 8
Instantaneous batch size per device = 16
Total train batch size (w. parallel, distributed & accumulation) = 16
Gradient Accumulation steps = 1
Total optimization steps = 4000
Number of trainable parameters = 1543304960
0%| | 0/4000 [00:00<?, ?it/s]
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.68 GiB (GPU 0; 14.61 GiB total capacity; 10.72 GiB already allocated; 13.12 MiB free; 13.62 GiB reserved in total by PyTorch)

I have a NVIDIA T4

Is there any way to train a fine-tune model but without GPU ? Using only the CPU ?

Thanks all for the help !!!

sanchit-gandhi

Jan 31, 2023

•

edited Jan 31, 2023

Hey @Santi69 !

I would strongly recommend using your GPU! Training will be extremely slow on just CPU 🐌

You have two options here:

Reduce the per_device_train_batch_size and increase the gradient_accumulation_steps
Try using DeepSpeed!

For 1, try setting per_device_train_batch_size=8 and gradient_accumulation_steps=2. If that still gives an OOM, try setting per_device_train_batch_size=4 and gradient_accumulation_steps=4. If that still gives an OOM, try setting per_device_train_batch_size=2 and gradient_accumulation_steps=8. You get the idea! You can use gradient accumulation steps to compensate for a lower per device batch size, as your effective batch size is per_device_train_batch_size * gradient_accumulation_steps, so in all of these cases we have an effective batch size of 16. The trade-off is that more gradient accumulation means slower training, so we should only use as much gradient accumulation as we need, and not more.

For 2, you can follow the guide here: https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event#deepspeed
You'll be able to train with a larger per device batch size this way

Santi69

Jan 31, 2023

Thank you very much @sanchit-gandhi

I am going to test the two options

Congratulations for you great work !!!