How did you make these weights?

#3
by adatkins - opened

How did you make these weights? Did you use a particular script based on PyTorch and BitsAndBytes?

The model card simply says: This repository contains meta-llama/Meta-Llama-3.1-8B-Instruct quantized using bitsandbytes from BF16 down to NF4 with a block size of 64.

@adatkins you can read about it on HF's docs for bitsandbytes. When quantizing to 4bit precisionwith nitsandbytes, you would do something like this:

from transformers import BitsAndBytesConfig

nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    llm_int8_enable_fp32_cpu_offload=True,
    )

model_nf4 = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=nf4_config)

Sign up or log in to comment