Medical and Scientific Literature
Collection
Models for working with medical and scientific literature.
•
4 items
•
Updated
•
2
This is Llama-3.1_OpenScholar-8B with AWQ Quantization applied using the following code.
Based on this example code.
import torch
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
# Input and output path
path = "OpenScholar/Llama-3.1_OpenScholar-8B"
output = "Llama-3.1_OpenScholar-8B-AWQ"
# Quantization config
config = {
"zero_point": True,
"q_group_size": 128,
"w_bit": 4,
"version": "GEMM"
}
# Load model
model = AutoAWQForCausalLM.from_pretrained(
model_path=path,
low_cpu_mem_usage=True,
use_cache=False,
safetensors=False,
device_map="cuda",
torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(path)
# Quantize
model.quantize(tokenizer, quant_config=config)
# Save quantized model
model.save_quantized(output)
# Save tokenizer
# Note: Transformers >= 4.45.0 doubles size of tokenizer.json
# See https://github.com/huggingface/transformers/issues/34744
tokenizer.save_pretrained(output)
print(f'Model is quantized and saved to "{output}"')
Base model
meta-llama/Llama-3.1-8B