File size: 512 Bytes
b221737 d994b72 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
---
license: mit
---
Llama 2 7B quantized in 3-bit with GPTQ.
```
from transformers import AutoModelForCausalLM, AutoTokenizer
from optimum.gptq import GPTQQuantizer
import torch
w = 3
model_path = meta-llama/Llama-2-7b-hf
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16)
quantizer = GPTQQuantizer(bits=w, dataset="c4", model_seqlen = 4096)
quantized_model = quantizer.quantize_model(model, tokenizer)
``` |