not enough memory
got this error when trying to load the model:
Do I need to run this in cloud? I've got 24GB Vram and NVIDIA card on my local machine
Traceback (most recent call last): File “E:\software\text-gen-webui\text-generation-webui\server.py”, line 69, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name) File “E:\software\text-gen-webui\text-generation-webui\modules\models.py”, line 94, in load_model output = load_func(model_name) File “E:\software\text-gen-webui\text-generation-webui\modules\models.py”, line 296, in AutoGPTQ_loader return modules.AutoGPTQ_loader.load_quantized(model_name) File “E:\software\text-gen-webui\text-generation-webui\modules\AutoGPTQ_loader.py”, line 53, in load_quantized model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params) File “E:\software\text-gen-webui\installer_files\env\lib\site-packages\auto_gptq\modeling\auto.py”, line 82, in from_quantized return quant_func( File “E:\software\text-gen-webui\installer_files\env\lib\site-packages\auto_gptq\modeling_base.py”, line 773, in from_quantized accelerate.utils.modeling.load_checkpoint_in_model( File “E:\software\text-gen-webui\installer_files\env\lib\site-packages\accelerate\utils\modeling.py”, line 998, in load_checkpoint_in_model checkpoint = load_state_dict(checkpoint_file, device_map=device_map) File “E:\software\text-gen-webui\installer_files\env\lib\site-packages\accelerate\utils\modeling.py”, line 859, in load_state_dict return safe_load_file(checkpoint_file, device=devices[0]) File “E:\software\text-gen-webui\installer_files\env\lib\site-packages\safetensors\torch.py”, line 261, in load_file result[k] = f.get_tensor(k) RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 524288000 bytes.
Give 33B model a try instead, because 65B is way too big for 24GB VRAM.
Yeah, you need 2 x 24GB GPUs or 1 x 48GB for 65B