How to use quantized version?

#2
by jawad1347 - opened

Kindly write code to use it in colab loading it with 4bit quants. Thanks

Salesforce org
edited Jun 26

Sure, loading it with 4bit quant can be used by BitsAndBytes

            from transformers import BitsAndBytesConfig
            # Ref: https://huggingface.co/blog/4bit-transformers-bitsandbytes
            quantization_config = BitsAndBytesConfig(
                load_in_4bit=True,
                bnb_4bit_quant_type="nf4",
                bnb_4bit_use_double_quant=True,
                bnb_4bit_compute_dtype=torch.bfloat16,
            )
            model = AutoModel.from_pretrained(
                'Salesforce/SFR-Embedding-2_R',
                device_map='auto',
                trust_remote_code=True,
                quantization_config=quantization_config,
                **model_kwargs,
            )

can be quantized to gptq o awq? or those format will not be compatible with this arch ?

Salesforce org
edited Jul 1

Hi @prudant ,

Of course, it can be used by other quantization methods.

Sign up or log in to comment