Salesforce/SFR-Embedding-2_R · How to use quantized version?

jawad1347

Jun 22

Kindly write code to use it in colab loading it with 4bit quants. Thanks

yliu279

Salesforce org Jun 26

•

edited Jun 26

Sure, loading it with 4bit quant can be used by BitsAndBytes

            from transformers import BitsAndBytesConfig
            # Ref: https://huggingface.co/blog/4bit-transformers-bitsandbytes
            quantization_config = BitsAndBytesConfig(
                load_in_4bit=True,
                bnb_4bit_quant_type="nf4",
                bnb_4bit_use_double_quant=True,
                bnb_4bit_compute_dtype=torch.bfloat16,
            )
            model = AutoModel.from_pretrained(
                'Salesforce/SFR-Embedding-2_R',
                device_map='auto',
                trust_remote_code=True,
                quantization_config=quantization_config,
                **model_kwargs,
            )

prudant

Jun 28

can be quantized to gptq o awq? or those format will not be compatible with this arch ?

yliu279

Salesforce org Jul 1

•

edited Jul 1

Hi @prudant ,

Of course, it can be used by other quantization methods.