--- library_name: transformers license: apache-2.0 pipeline_tag: text-generation tags: - bitsandbytes - bnb - 4bit - falcon - tiiuae - 7b - quantized --- # Model Card for alokabhishek/falcon-7b-instruct-bnb-4bit This repo contains 4-bit quantized (using bitsandbytes) model of Technology Innovation Institute's tiiuae/falcon-7b-instruct ## Model Details - Model creator: [Technology Innovation Institute](https://huggingface.co/tiiuae) - Original model: [falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) ### About 4 bit quantization using bitsandbytes - QLoRA: Efficient Finetuning of Quantized LLMs: [arXiv - QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314) - Hugging Face Blog post on 4-bit quantization using bitsandbytes: [Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA](https://huggingface.co/blog/4bit-transformers-bitsandbytes) - bitsandbytes github repo: [bitsandbytes github repo](https://github.com/TimDettmers/bitsandbytes) # How to Get Started with the Model Use the code below to get started with the model. ## How to run from Python code #### First install the package ```shell pip install -q -U bitsandbytes accelerate torch huggingface_hub pip install -q -U git+https://github.com/huggingface/transformers.git # Install latest version of transformers pip install -q -U git+https://github.com/huggingface/peft.git pip install flash-attn --no-build-isolation ``` # Import ```python import torch import os from torch import bfloat16 from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig, LlamaForCausalLM ``` # Use a pipeline as a high-level helper ```python model_id_falcon = "alokabhishek/falcon-7b-instruct-bnb-4bit" tokenizer_falcon = AutoTokenizer.from_pretrained(model_id_falcon, use_fast=True) model_falcon = AutoModelForCausalLM.from_pretrained( model_id_falcon, device_map="auto" ) pipe_falcon = pipeline(model=model_falcon, tokenizer=tokenizer_falcon, task='text-generation') prompt_falcon = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar." output_falcon = pipe_falcon(prompt_falcon, max_new_tokens=512) print(output_falcon[0]["generated_text"]) ``` ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]