mlabonne
/

gpt2-GPTQ-4bit

Text Generation

Inference Endpoints

Model card Files Files and versions Community

gpt2-GPTQ-4bit / README.md

mlabonne's picture

Update how to load the model with model_basename

cac9082 over 1 year ago

|

1.3 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- AutoGPTQ
	- 4bit
	- GPTQ
	---

	Model created using [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) on a [GPT-2](https://huggingface.co/gpt2) model with 4-bit quantization.

	You can load this model with the AutoGPTQ library, installed with the following command:

	```
	pip install auto-gptq
	```

	You can then download the model from the hub using the following code:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

	model_name = "mlabonne/gpt2-GPTQ-4bit"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	quantize_config = BaseQuantizeConfig.from_pretrained(model_name)
	model = AutoGPTQForCausalLM.from_quantized(model_name,
	model_basename="gptq_model-4bit-128g",
	device="cuda:0",
	use_triton=True,
	use_safetensors=True,
	quantize_config=quantize_config)
	```

	This model works with the traditional [Text Generation pipeline](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.TextGenerationPipeline).