seonglae
/

opt-125m-4bit-gptq

@@ -2,6 +2,37 @@
 license: mit
 tags:
 - auto-gptq
 ---
-This model should use [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) so inference from Hugging face might not working

 license: mit
 tags:
 - auto-gptq
+- opt
+- gptq
+- 4bit
 ---
+This model should use [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) so you need to use `auto-gptq`
+```py
+from transformers import AutoTokenizer, pipeline, AutoModelForCausalLM, LlamaForCausalLM, LlamaTokenizer, StoppingCriteria, PreTrainedTokenizerBase
+from auto_gptq import AutoGPTQForCausalLM
+model_id = 'seonglae/opt-125m-4bit-gptq'
+tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
+model = AutoGPTQForCausalLM.from_quantized(
+        model_id,
+        model_basename=model_basename,
+        trust_remote_code=True,
+        device='cuda:0',
+        use_triton=False,
+        use_safetensors=True,
+)
+pipe = pipeline(
+      "text-generation",
+      model=model,
+      tokenizer=tokenizer,
+      temperature=0.5,
+      top_p=0.95,
+      max_new_tokens=100,
+      repetition_penalty=1.15,
+)
+prompt = "USER: Are you AI?\nASSISTANT:"
+pipe(prompt)
+```