File size: 3,182 Bytes
d3f780f b8d8a11 0358917 cc44d48 873af6d cc44d48 b9dd81c cc44d48 b9dd81c cc44d48 b9dd81c cc44d48 8011cbf cc44d48 d31444e cc44d48 30ffb0c cc44d48 30ffb0c cc44d48 b10873c cc44d48 30ffb0c 2e1ba24 30ffb0c 8011cbf 2e1ba24 8011cbf 2e1ba24 8011cbf 30ffb0c 2e1ba24 8011cbf 30ffb0c cc44d48 30ffb0c cc44d48 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
---
license: cc
datasets:
- VMware/open-instruct-v1-oasst-dolly-hhrlhf
language:
- en
pipeline_tag: text-generation
inference: false
---
# SearchUnify/xgen-7b-8k-open-instruct-gptq
With its industry-first robust LLM Integrations across its suite of products ([Cognitive Search](https://www.searchunify.com/products/cognitive-search/?utm_source=link&utm_medium=ml-model&utm_campaign=hugging-face), [SUVA](https://www.searchunify.com/products/suva/), [Knowbler](https://www.searchunify.com/products/knowbler/?utm_source=link&utm_medium=ml-model&utm_campaign=hugging-face), [Escalation Predictor](https://applications.searchunify.com/escalation-predictor?utm_source=link&utm_medium=ml-model&utm_campaign=hugging-face), [Agent Helper](https://applications.searchunify.com/agent-helper?utm_source=link&utm_medium=ml-model&utm_campaign=hugging-face) and [Community Helper](https://applications.searchunify.com/community-helper?utm_source=link&utm_medium=ml-model&utm_campaign=hugging-face)) coupled with the federated retrieval augmented generation (FRAG) architecture, [SearchUnify's unified cognitive platform](https://www.searchunify.com/?utm_source=link&utm_medium=ml-model&utm_campaign=hugging-face) fetches relevant information or responses to deliver more accurate and contextually appropriate support and self-service experiences.
Leveraging the state-of-the-art GPTQ quantization method, SearchUnify optimized the XGen-7B Model for low memory footprint and rapid response generation.
These are GPTQ 4bit model files for [VMWare's XGEN 7B 8K Open Instruct](https://huggingface.co/VMware/xgen-7b-8k-open-instruct). It is the result of quantizing to 4bit using GPTQ-for-LLaMa.
# How to use this GPTQ model from Python code
First, make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed:
```
pip install auto-gptq
```
Second, install tiktoken in order to use the tokenizer
```
pip install tiktoken
```
```
from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM
model_name_or_path = "SearchUnify-ML/xgen-7b-8k-open-instruct-gptq"
model_basename = "gptq_model-4bit-128g"
use_triton = False
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path,
use_fast=False,
trust_remote_code=True)
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
model_basename=model_basename,
use_safetensors=False,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton)
# Note: check the prompt template is correct for this model.
prompt = "Explain the rules of field hockey to a novice."
prompt_template = f'''### Instruction: {prompt}
### Response:'''
print("\n\n*** Generate:")
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.3, max_new_tokens=512)
print(f"\n\n {tokenizer.decode(output[0]).split('### Response:')[1]}")
```
|