thesven
/

Mistral-7B-v0.3-GPTQ

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

thesven commited on May 25

Commit

9241943

•

1 Parent(s): 6442f6e

Update README.md

Files changed (1) hide show

README.md +25 -0

README.md CHANGED Viewed

@@ -5,6 +5,31 @@ license: apache-2.0
 ## Quantization Description
 This repo contains a GPTQ 4 bit quantized version of the Mistral-7B-Instruct-v0.3 Large Language Model.
 ## Model Description
 The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.

 ## Quantization Description
 This repo contains a GPTQ 4 bit quantized version of the Mistral-7B-Instruct-v0.3 Large Language Model.
+### Using the GPTQ model
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
+model_name_or_path = "thesven/Mistral-7B-v0.3-GPTQ"
+tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
+model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
+                                             device_map="auto",
+                                             trust_remote_code=False,
+                                             revision="main")
+model.pad_token = model.config.eos_token_id
+prompt_template=f'''
+<s>[INST]Write a story about Ai</s>[/INST]
+<s>[ASSISTANT]
+'''
+input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
+output = model.generate(inputs=input_ids, temperature=0.1, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
+```
 ## Model Description
 The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.