OutOfMemoryError: CUDA out of memory
I am modeling on my PC with GPU p40 24VRAM but currently getting error torch.OutOfMemoryError: CUDA out of memory. As far as I know when loading model 8B only need 16GVRAM.
My func:
def __init__(self):
model_path = '/app/core/model/llama3.1-8b-instruct'
self.device = torch.device(
"cuda" if torch.cuda.is_available() else "cpu")
if not os.path.exists(model_path):
# Load the tokenizer and model from the custom directory
tokenizer = transformers.AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3.1-8B-Instruct')
model = transformers.AutoModelForCausalLM.from_pretrained('meta-llama/Meta-Llama-3.1-8B-Instruct')
model.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)
# Load model và tokenizer từ thư mục đã lưu
self.tokenizer = transformers.AutoTokenizer.from_pretrained(model_path)
self.model = transformers.AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16).to(self.device)
# Khởi tạo pipeline
self.pipeline = transformers.pipeline(
"text-generation",
model=self.model,
tokenizer=self.tokenizer,
device=0 if torch.cuda.is_available() else -1,
)
My error:
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 112.00 MiB. GPU 0 has a total capacity of 24.00 GiB of which 163.86 MiB is free. Process 280286 has 15.58 GiB memory in use. Process 280570 has 5.12 GiB memory in use. Of the allocated memory 4.93 GiB is allocated by PyTorch, and 41.85 MiB is reserved by PyTorch but unallocated.
Looks like your code could load model twice.
I'm having a similar issue with Ubuntu. With a Geforce GT 1030 card. Only 2GB of GDDR, but the model is only trying to allocate 112 MB, but NVTOP and nvidia-smi show that nothing else is using the memory or the gpu. I know 2GB is too low, but I was successful running on win 10 pc with the exact same card. And, in case you are doubting that, I know it was using the win gpu because, not only did the model respond 4 x faster than on the cpu, task manager and nvidia-smi both showed it using the memory and the GPU. But that's not happening on the Ubuntu.
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.28.03 Driver Version: 560.28.03 CUDA Version: 12.6 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce GT 1030 Off | 00000000:06:00.0 Off | N/A | | 27% 33C P0 N/A / 30W | 0MiB / 2048MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 112.00 MiB. GPU
I can run the same exact script on the cpu. It's extremely slow, but it runs.