effi 7b GPTQ is a quantized version of effi 7b whiich is a 7 billion parameter model built by AI Planet. We have used Auto-gptq for quantising the model
Model Details
Model Description
This original model has been fine-tuned on Chain of Thought datasets, which has context from mixed sources with corresponding rationale. The final finetuned Large Language Model(LLM) have shown enhanced capabilities of solving novel tasks by providing a reasoning.And the final model was quantized into GPTQ format
- Developed by: AI Planet
- Model type: Casual Decoder only
- Language(s) (NLP): English
- Quantisation type: GPTQ(4-bit)
- License: Apache 2.0
- Quantized from model: Effi-7b
Qunatization Configuration
- bits: 4,
- damp_percent 0.1,
- dataset: "wikitext2",
- desc_act: false,
- group_size: 128,
- modules_in_block_to_quantize: null,
- quant_method: "gptq",
- sym: true,
- true_sequential: true
Example of usage
import torch
from transformers import AutoTokenizer , AutoModelForCausalLM
quant_path = "aiplanet/effi-7b-gptq"
model = AutoModelForCausalLM.from_pretrained(quant_path , device_map='cuda')
tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True , safetensors=True , fuse_layers=True)
tst = """
### INSTRUCTION:
Virgin Australia, the trading name of Virgin Australia Airlines Pty Ltd, is an Australian-based airline. It is the largest airline by fleet size to use the Virgin brand. It commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. It suddenly found itself as a major airline in Australia's domestic market after the collapse of Ansett Australia in September 2001. The airline has since grown to directly serve 32 cities in Australia, from hubs in Brisbane, Melbourne and Sydney.Is Virgin Australia and Virgin Blue the same airlines?
"""
system_message = "Given your chain of thought reasoning, provide a rationale for the context in the source."
template=f"""
Context: {system_message}
Human: {tst}
"""
# Tokenize the input
input_ids = tokenizer(template, return_tensors="pt", truncation=True).input_ids.cuda()
# Run the model to infere an output
outputs = model.generate(input_ids=input_ids, max_new_tokens=512, top_p=0.9,temperature=0.1 , top_k=1, repetition_penalty=1.1)
# Print the result
print(f"{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(template):]}")
Framework versions
- Transformers 4.37.2
- optimum 1.16.2
- auto-gptq 0.6.0
Citation
@misc {bhavyaaiplanet,
author = { {Bhavya Bhola} },
title = { Quantized version of effi-7b by AI Planet},
year = 2024,
url = { https://huggingface.co/aiplanet/effi-7b-gptq },
publisher = { Hugging Face }
}
- Downloads last month
- 8