--- datasets: - yahma/alpaca-cleaned --- # Platypus2-70B-instruct-4bit-gptq Platypus2-70B-instruct-4bit-gptq is a qunatnized version of [`garage-bAInd/Platypus2-70B-instruct`](https://huggingface.co/garage-bAInd/Platypus2-70B-instruct) using GPTQ Quantnization. This model is only 35 GB in size in comparision with the original garage-bAInd/Platypus2-70B-instruct 127 GB and can run on a single A6000 GPU ### Benchmark Metrics will report soon ### Model Details * **Quantnized by**: Mohamad.Alhajar@wiro.ai ; * **Model type:** quantnized version of Platypus2-70B-instruct using 4bit quantnization * **Language(s)**: English * **License**: Non-Commercial Creative Commons license ([CC BY-NC-4.0](https://creativecommons.org/licenses/by-nc/4.0/)) ### Prompt Template ``` ### Instruction: (without the <>) ### Response: ``` ### Training Dataset `Platypus2-70B-instruct-4bit-gptq` quantnized using gptq on Alpaca dataset [`yahma/alpaca-cleaned`](https://huggingface.co/datasets/yahma/alpaca-cleaned). ### Training Procedure `garage-bAInd/Platypus2-70B` was instruction fine-tuned using gptq on 2 L40 48GB. ## How to Get Started with the Model First install auto_gptq with ```shell pip install auto_gptq ``` Use the code sample provided in the original post to interact with the model. ```python from transformers import AutoTokenizer from auto_gptq import AutoGPTQForCausalLM model_id = "malhajar/Platypus2-70B-instruct-4bit-gptq" model = AutoGPTQForCausalLM.from_quantized(model_id,inject_fused_attention=False, use_safetensors=True, trust_remote_code=False, use_triton=False, quantize_config=None) tokenizer = AutoTokenizer.from_pretrained(model_id) question: "Who was the first person to walk on the moon?" # For generating a response prompt = ''' ### Instruction: {question} ### Response:''' input_ids = tokenizer(prompt, return_tensors="pt").input_ids output = model.generate(input_ids) response = tokenizer.decode(output[0]) print(response) ``` ### Citations ```bibtex @article{platypus2023, title={Platypus: Quick, Cheap, and Powerful Refinement of LLMs}, author={Ariel N. Lee and Cole J. Hunter and Nataniel Ruiz}, booktitle={arXiv preprint arxiv:2308.07317}, year={2023} } ``` ```bibtex @misc{touvron2023llama, title={Llama 2: Open Foundation and Fine-Tuned Chat Models}, author={Hugo Touvron and Louis Martin and Kevin Stone and Peter Albert and Amjad Almahairi and Yasmine Babaei and Nikolay Bashlykov year={2023}, eprint={2307.09288}, archivePrefix={arXiv}, } ``` ```bibtex @misc{frantar2023gptq, title={GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers}, author={Elias Frantar and Saleh Ashkboos and Torsten Hoefler and Dan Alistarh}, year={2023}, eprint={2210.17323}, archivePrefix={arXiv}, primaryClass={cs.LG} } ```