How did you make these weights?
#3
by
adatkins
- opened
How did you make these weights? Did you use a particular script based on PyTorch and BitsAndBytes?
The model card simply says: This repository contains meta-llama/Meta-Llama-3.1-8B-Instruct quantized using bitsandbytes from BF16 down to NF4 with a block size of 64.
@adatkins you can read about it on HF's docs for bitsandbytes. When quantizing to 4bit precisionwith nitsandbytes, you would do something like this:
from transformers import BitsAndBytesConfig
nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
llm_int8_enable_fp32_cpu_offload=True,
)
model_nf4 = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=nf4_config)