|
--- |
|
license: cc |
|
datasets: |
|
- VMware/open-instruct-v1-oasst-dolly-hhrlhf |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# SearchUnify-ML/xgen-7b-8k-open-instruct-gptq |
|
|
|
These are GPTQ 4bit model files for [VMWare's XGEN 7B 8K Open Instruct](https://huggingface.co/VMware/xgen-7b-8k-open-instruct). |
|
|
|
It is the result of quantising to 4bit using GPTQ-for-LLaMa. |
|
|
|
The model is open for COMMERCIAL USE. |
|
|
|
|
|
# How to use this GPTQ model from Python code |
|
|
|
First, make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed: |
|
|
|
#### pip install auto-gptq |
|
|
|
|
|
<code> |
|
|
|
from transformers import AutoTokenizer, pipeline, logging |
|
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig |
|
import argparse |
|
|
|
model_name_or_path = "SearchUnify-ML/xgen-7b-8k-open-instruct-gptq" |
|
model_basename = "gptq_model-4bit-128g" |
|
|
|
use_triton = False |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True) |
|
|
|
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path, |
|
model_basename=model_basename, |
|
use_safetensors=True, |
|
trust_remote_code=False, |
|
device="cuda:0", |
|
use_triton=use_triton, |
|
quantize_config=None) |
|
|
|
# Note: check the prompt template is correct for this model. |
|
prompt = "Tell me about AI" |
|
prompt_template=f'''### Instruction: {prompt} |
|
### Response:''' |
|
|
|
print("\n\n*** Generate:") |
|
|
|
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda() |
|
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512) |
|
print(tokenizer.decode(output[0])) |
|
|
|
# Inference can also be done using transformers' pipeline |
|
|
|
# Prevent printing spurious transformers error when using pipeline with AutoGPTQ |
|
logging.set_verbosity(logging.CRITICAL) |
|
|
|
print("*** Pipeline:") |
|
pipe = pipeline( |
|
"text-generation", |
|
model=model, |
|
tokenizer=tokenizer, |
|
max_new_tokens=1024, |
|
temperature=0.3, |
|
top_p=0.95, |
|
repetition_penalty=1.15 |
|
) |
|
|
|
print(pipe(prompt_template)[0]['generated_text']) |
|
|
|
|
|
</code> |
|
|
|
|
|
|
|
|