Text Generation
Transformers
PyTorch
mpt
Composer
MosaicML
llm-foundry
custom_code
text-generation-inference

GPU Memory / RAM requierements

#19
by Rbn3D - opened

How much GPU memory does this model require to run? And in CPU mode, how much RAM? I'm currently trying to run it on GPU with a GTX 1080 8Gb, and I'm getting a "cannot allocate memory" error, I suppose this requires at least 16gb or so.

I would assume it takes about ~15 GBs of VRAM without any optimizations! However, you can very successfully run it on a CPU with 5-bit quantization with just ~5.3 GBs of RAM taken!

In theory, you might be able to run it in bfloat16 mode, but I don't know how, sry.

@Raspbfox I searched far and wide for a quantization example, but couldn't find one... =[

@danieldaugherty , just try searching for the GGML quantized models (usually q5_1) or GPTQ πŸ‘€

Ah yeah, I found that. But I didn't really understand how to use it...

GPTQ doesn't support MPT yet =[

When you run this MPT-7B model in FP16 then it would consume 14 GB of GPU memory. So you would need atleast 16 GB of GPU memory to run this model for Inference

Closing as stale.

Also noting that we added device_map support as of this PR: https://huggingface.co/mosaicml/mpt-7b-instruct/discussions/41

abhi-mosaic changed discussion status to closed

Sign up or log in to comment