low memory usage

#10

by Knut-J - opened Oct 5

Oct 5

Is there any way to use low memory, as my GPU only has 24 GB. I get this error message torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 462.00 MiB. GPU

fatihburakkaragoz

Oct 5

@Knut-J 24GB isn't gonna cut it for this beast of a model. NVLM-D 72B is huge. But don't give up yet! Try these tricks:

CPU offloading: Use device_map="auto" when loading the model. It'll be slow as molasses, but it might just work.
8-bit quantization: Add load_in_8bit=True to your model loading. It'll sacrifice some quality, but hey, beggars can't be choosers.
Last resort: Downgrade to a smaller model. Sometimes you gotta know when to fold 'em.

Fair warning: These hacks might make your inference slower than a snail on tranquilizers. But if you're dead set on using this model, it's worth a shot. Good luck!

Apiphine

Oct 20

Anybody use an external hard drive to run this?

Malini

about 1 month ago

•

edited about 1 month ago

Hi,

I have tried to run the inference code given here on AWS p3dn.24xlarge and p4de.24xlarge facing an space error.
But facing issues
OSError: [Errno 28] No space left on device

specs of the instance

Have tried the following the tips given here https://discuss.huggingface.co/t/no-space-left-on-device-when-downloading-a-large-model-for-the-sagemaker-training-job/43643

Any help is appreciated, please let me know if I am missing something
Thanks in advance

boxin-wbx

NVIDIA org 30 days ago

Hi Malini,

Thank you for your interests in our model.

From the first screenshot, it shows that your home directory has not enough disk space. Running NVLM-D requires at least around 200GB of disk space.

Your second screenshot suggests that it is likely that you have separate disks.

Please try run the following command on the disk with 1000GB.

Install Git Large File Storage (LFS) by running:

git lfs install

Clone the NVLM-D repository using:

git clone https://huggingface.co/nvidia/NVLM-D-72B

(say this clones the model into your local path "path/to/NVLM-D-72B")

Load the model with your local path

path = "path/to/NVLM-D-72B"
device_map = split_model()
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    use_flash_attn=False,
    trust_remote_code=True,
    device_map=device_map).eval()

Let me know if you encounter any issues, and I’d be happy to assist further!

Best,
Boxin

Malini

30 days ago

•

edited 30 days ago

Update!
Thank you @boxin-wbx ! Issue was fixed when i installed the LFS and cloned the repo.
I did need to run the device_map() to run the inference on text.
When I run the inference on images, am getting a memory error.Sharing the screenshot below

boxin-wbx

NVIDIA org 29 days ago

We haven't tested on V100 before. But a node with 2 H100 / A100 GPUs (each with 80GB of memory) should work.

Malini

29 days ago

Thanks @boxin-wbx .It worked on a ml.p4de.24xlarge instance. Appreciate your inputs.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment