SearchUnify-ML
/

xgen-7b-8k-open-instruct-gptq

Text Generation

text-generation-inference

Model card Files Files and versions Community

xgen-7b-8k-open-instruct-gptq / README.md

SearchUnify-ML's picture

Update README.md

873af6d about 1 year ago

|

history blame contribute delete

No virus

3.18 kB

	---
	license: cc
	datasets:
	- VMware/open-instruct-v1-oasst-dolly-hhrlhf
	language:
	- en
	pipeline_tag: text-generation
	inference: false
	---

	# SearchUnify/xgen-7b-8k-open-instruct-gptq

	With its industry-first robust LLM Integrations across its suite of products ([Cognitive Search](https://www.searchunify.com/products/cognitive-search/?utm_source=link&utm_medium=ml-model&utm_campaign=hugging-face), [SUVA](https://www.searchunify.com/products/suva/), [Knowbler](https://www.searchunify.com/products/knowbler/?utm_source=link&utm_medium=ml-model&utm_campaign=hugging-face), [Escalation Predictor](https://applications.searchunify.com/escalation-predictor?utm_source=link&utm_medium=ml-model&utm_campaign=hugging-face), [Agent Helper](https://applications.searchunify.com/agent-helper?utm_source=link&utm_medium=ml-model&utm_campaign=hugging-face) and [Community Helper](https://applications.searchunify.com/community-helper?utm_source=link&utm_medium=ml-model&utm_campaign=hugging-face)) coupled with the federated retrieval augmented generation (FRAG) architecture, [SearchUnify's unified cognitive platform](https://www.searchunify.com/?utm_source=link&utm_medium=ml-model&utm_campaign=hugging-face) fetches relevant information or responses to deliver more accurate and contextually appropriate support and self-service experiences.

	Leveraging the state-of-the-art GPTQ quantization method, SearchUnify optimized the XGen-7B Model for low memory footprint and rapid response generation.

	These are GPTQ 4bit model files for [VMWare's XGEN 7B 8K Open Instruct](https://huggingface.co/VMware/xgen-7b-8k-open-instruct). It is the result of quantizing to 4bit using GPTQ-for-LLaMa.


	# How to use this GPTQ model from Python code

	First, make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed:

	```
	pip install auto-gptq

	```

	Second, install tiktoken in order to use the tokenizer

	```
	pip install tiktoken
	```

	```

	from transformers import AutoTokenizer
	from auto_gptq import AutoGPTQForCausalLM

	model_name_or_path = "SearchUnify-ML/xgen-7b-8k-open-instruct-gptq"
	model_basename = "gptq_model-4bit-128g"

	use_triton = False

	tokenizer = AutoTokenizer.from_pretrained(model_name_or_path,
	use_fast=False,
	trust_remote_code=True)

	model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
	model_basename=model_basename,
	use_safetensors=False,
	trust_remote_code=True,
	device="cuda:0",
	use_triton=use_triton)

	# Note: check the prompt template is correct for this model.
	prompt = "Explain the rules of field hockey to a novice."
	prompt_template = f'''### Instruction: {prompt}
	### Response:'''

	print("\n\n*** Generate:")

	input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
	output = model.generate(inputs=input_ids, temperature=0.3, max_new_tokens=512)
	print(f"\n\n {tokenizer.decode(output[0]).split('### Response:')[1]}")

	```