aiplanet/effi-7b-gptq · Hugging Face

effi 7b GPTQ is a quantized version of effi 7b whiich is a 7 billion parameter model built by AI Planet. We have used Auto-gptq for quantising the model

Model Details

Model Description

This original model has been fine-tuned on Chain of Thought datasets, which has context from mixed sources with corresponding rationale. The final finetuned Large Language Model(LLM) have shown enhanced capabilities of solving novel tasks by providing a reasoning.And the final model was quantized into GPTQ format

Developed by: AI Planet
Model type: Casual Decoder only
Language(s) (NLP): English
Quantisation type: GPTQ(4-bit)
License: Apache 2.0
Quantized from model: Effi-7b

Qunatization Configuration

bits: 4,
damp_percent 0.1,
dataset: "wikitext2",
desc_act: false,
group_size: 128,
modules_in_block_to_quantize: null,
quant_method: "gptq",
sym: true,
true_sequential: true

Example of usage

import torch
from transformers import AutoTokenizer , AutoModelForCausalLM

quant_path = "aiplanet/effi-7b-gptq"

model = AutoModelForCausalLM.from_pretrained(quant_path , device_map='cuda')
tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True , safetensors=True , fuse_layers=True)


tst = """

### INSTRUCTION:
Virgin Australia, the trading name of Virgin Australia Airlines Pty Ltd, is an Australian-based airline. It is the largest airline by fleet size to use the Virgin brand. It commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. It suddenly found itself as a major airline in Australia's domestic market after the collapse of Ansett Australia in September 2001. The airline has since grown to directly serve 32 cities in Australia, from hubs in Brisbane, Melbourne and Sydney.Is Virgin Australia and Virgin Blue the same airlines?

"""

system_message = "Given your chain of thought reasoning, provide a rationale for the context in the source."

template=f"""
 Context: {system_message}
 Human: {tst}
"""

# Tokenize the input
input_ids = tokenizer(template, return_tensors="pt", truncation=True).input_ids.cuda()
# Run the model to infere an output
outputs = model.generate(input_ids=input_ids, max_new_tokens=512, top_p=0.9,temperature=0.1 , top_k=1, repetition_penalty=1.1)


# Print the result

print(f"{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(template):]}")

Framework versions

Transformers 4.37.2
optimum 1.16.2
auto-gptq 0.6.0

Citation

@misc {bhavyaaiplanet,
    author       = { {Bhavya Bhola} },
    title        = { Quantized version of effi-7b by AI Planet},
    year         = 2024,
    url          = { https://huggingface.co/aiplanet/effi-7b-gptq },
    publisher    = { Hugging Face }
}