Text Generation
Transformers
PyTorch
TensorBoard
Safetensors
bloom
Eval Results
text-generation-inference
Inference Endpoints

How to quantize bloom with 4-bit

#268
by char-1ee - opened

Hi, I noticed that there already exists bloom-int8 and bloom-fp16 models. Anyone know where can find bloom-int4 model, or how can I quantize a 4bit model locally?

BigScience Workshop org

Hi @char-1ee

If you have enough CPU RAM to load the entire BLOOM model, you can easily quantize it on-the-fly in 4bit using bitsandbytes and the latest transformers package.

pip install -U bitsandbytes transformers

Simply pass load_in_4bit=True when calling from_pretrained and that should do the trick to quantize the model in 4bit precision.

Let me know how that goes for you!

Sign up or log in to comment