Out of memory issue.
Hi, I use the recommended way (from_pretrained(***) ) to load mixtral-8x7B but it says out-of-memory.
I use 8 x A100 GPUs to run this command. What is problem?
Thank you.
Hi @kxgong
I suggest to load the model in half-precision (torch_dtype=torch.float16
) or in 4-bit precisionload_in_4bit=True
in order to load your model in the most memory efficient manner possible
Thank you, I am using mixtral-8x7B for training. I wonder whether using 4bit will cause performance drop.
@kxgong if you use QLoRA you shouldn't expect performance drop with respect to full-finetuning. You can read more about QLoRA here: https://huggingface.co/blog/4bit-transformers-bitsandbytes and get started with resources on how to run QLoRA with this blogpost for example: https://pytorch.org/blog/finetune-llms/
Thanks for your help.