How did you make these weights?

by adatkins - opened Aug 15

Discussion

adatkins

Aug 15

How did you make these weights? Did you use a particular script based on PyTorch and BitsAndBytes?

adatkins

Aug 15

The model card simply says: This repository contains meta-llama/Meta-Llama-3.1-8B-Instruct quantized using bitsandbytes from BF16 down to NF4 with a block size of 64.

fsaudm

Aug 21

•

edited Aug 21

@adatkins you can read about it on HF's docs for bitsandbytes. When quantizing to 4bit precisionwith nitsandbytes, you would do something like this:

from transformers import BitsAndBytesConfig

nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    llm_int8_enable_fp32_cpu_offload=True,
    )

model_nf4 = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=nf4_config)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment