How to quantize bloom with 4-bit
#268
by
char-1ee
- opened
Hi, I noticed that there already exists bloom-int8 and bloom-fp16 models. Anyone know where can find bloom-int4 model, or how can I quantize a 4bit model locally?
Hi @char-1ee
If you have enough CPU RAM to load the entire BLOOM model, you can easily quantize it on-the-fly in 4bit using bitsandbytes and the latest transformers package.
pip install -U bitsandbytes transformers
Simply pass load_in_4bit=True
when calling from_pretrained
and that should do the trick to quantize the model in 4bit precision.
Let me know how that goes for you!