---
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
tags:
- bitsandbytes
- bnb
- 4bit
- falcon
- tiiuae
- 7b
- quantized
---

# Model Card for alokabhishek/falcon-7b-instruct-bnb-4bit

<!-- Provide a quick summary of what the model is/does. -->
This repo contains 4-bit quantized (using bitsandbytes) model of Technology Innovation Institute's tiiuae/falcon-7b-instruct


## Model Details

- Model creator: [Technology Innovation Institute](https://huggingface.co/tiiuae)
- Original model: [falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct)


### About 4 bit quantization using bitsandbytes


- QLoRA: Efficient Finetuning of Quantized LLMs: [arXiv - QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314)

- Hugging Face Blog post on 4-bit quantization using bitsandbytes: [Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA](https://huggingface.co/blog/4bit-transformers-bitsandbytes)

- bitsandbytes github repo: [bitsandbytes github repo](https://github.com/TimDettmers/bitsandbytes)


# How to Get Started with the Model

Use the code below to get started with the model.


## How to run from Python code

#### First install the package
```shell
pip install -q -U bitsandbytes accelerate torch huggingface_hub
pip install -q -U git+https://github.com/huggingface/transformers.git # Install latest version of transformers
pip install -q -U git+https://github.com/huggingface/peft.git
pip install flash-attn --no-build-isolation
```

# Import 

```python
import torch
import os
from torch import bfloat16
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig, LlamaForCausalLM
```

# Use a pipeline as a high-level helper

```python
model_id_falcon = "alokabhishek/falcon-7b-instruct-bnb-4bit"

tokenizer_falcon = AutoTokenizer.from_pretrained(model_id_falcon, use_fast=True)

model_falcon = AutoModelForCausalLM.from_pretrained(
    model_id_falcon,
    device_map="auto"
)


pipe_falcon = pipeline(model=model_falcon, tokenizer=tokenizer_falcon, task='text-generation')

prompt_falcon = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."

output_falcon = pipe_falcon(prompt_falcon, max_new_tokens=512)

print(output_falcon[0]["generated_text"])

```


## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

[More Information Needed]

### Downstream Use [optional]

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

[More Information Needed]

### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

[More Information Needed]

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

[More Information Needed]


## Model Card Authors [optional]

[More Information Needed]

## Model Card Contact

[More Information Needed]