---
library_name: transformers
datasets:
- bergr7f/databricks-dolly-15k-subset-general_qa
language:
- en
base_model:
- meta-llama/Llama-3.2-1B
pipeline_tag: text-generation
---

## Model Description

Llama-3.2-1B-finetuned-generalQA-peft-4bit is a fine-tuned version of the Llama-3.2-1B model, specialized for general question-answering tasks. The model has been fine-tuned using Low-Rank Adaptation (LoRA) with 4-bit quantization, making it efficient for deployment on resource-constrained hardware.
Model Architecture

    Base Model: Llama-3.2-1B
    Parameters: Approximately 1 Billion
    Quantization: 4-bit using the bitsandbytes library
    Fine-tuning Method: PEFT with LoRA

## Training Data

The model was fine-tuned on the Databricks Dolly 15k Subset for General QA dataset. This dataset is a subset focusing on general question-answering tasks, derived from the larger Databricks Dolly 15k dataset.

### Training Procedure

    Fine-tuning Configuration:
        LoRA Rank (r): 8
        LoRA Alpha: 16
        LoRA Dropout: 0.5
        Number of Epochs: 30
        Batch Size: 2 (per device)
        Learning Rate: 2e-5
        Evaluation Strategy: Evaluated at each epoch
        Optimizer: AdamW
        Mixed Precision: FP16
    Hardware Used: Single RTX 4070 8GB
    Libraries:
        transformers
        datasets
        peft
        bitsandbytes
        trl
        evaluate

## Intended Use

The model is intended for generating informative answers to general questions. It can be integrated into applications such as chatbots, virtual assistants, educational tools, and information retrieval systems.

## Limitations and Biases

Knowledge Cutoff: The model's knowledge is limited to the data it was trained on. It may not have information on events or developments that occurred after the dataset was created.
Accuracy: While the model strives to provide accurate answers, it may occasionally produce incorrect or nonsensical responses. Always verify critical information from reliable sources.
Biases: The model may inherit biases present in the training data. Users should be cautious and critically evaluate the model's outputs, especially in sensitive contexts.


## Acknowledgements

Base Model: <a href="https://huggingface.co/meta-llama/Llama-3.2-1B">Meta AI's Llama-3.2-1B </a>
Dataset: <a href="https://huggingface.co/datasets/bergr7f/databricks-dolly-15k-subset-general_qa">Databricks Dolly 15k Subset for General QA</a>
Libraries Used:
  <li>Transformers</li>
  <li>PEFT</li>
  <li>TRL</li>
  <li>BitsAndBytes</li>


## How to Use
```python
  from transformers import AutoModelForCausalLM, AutoTokenizer
  from peft import PeftModel, PeftConfig
  
  peft_model_id = "Chryslerx10/Llama-3.2-1B-finetuned-generalQA-peft-4bit"
  config = PeftConfig.from_pretrained(peft_model_id, device_map='auto')
  
  model = AutoModelForCausalLM.from_pretrained(
      config.base_model_name_or_path,
      device_map='auto',
      return_dict=True
  )
  
  tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
  tokenizer.pad_token = tokenizer.eos_token
  
  peft_loaded_model = PeftModel.from_pretrained(model, peft_model_id, device_map='auto')
```

## Inference the model
```python
  def create_chat_template(question, context):
      text = f"""
          [Instruction] You are a question-answering agent which answers the question based on the related reviews. 
          If related reviews are not provided, you can generate the answer based on the question.\n
          [Question] {question}\n
          [Related Reviews] {context}\n
          [Answer]
      """
      return text
  
  def generate_response(question, context):
      text = create_chat_template(question, context)
      inputs = tokenizer([text], return_tensors='pt', padding=True, truncation=True).to(device)
      
      config = GenerationConfig(
          max_length=256,
          temperature=0.5,
          top_k=5,
          top_p=0.95,
          repetition_penalty=1.2,
          do_sample=True,
          penalty_alpha=0.6
      )
      
      response = model.generate(**inputs, generation_config=config)
      output = tokenizer.decode(response[0], skip_special_tokens=True)
      return output
  
  # Example usage
  question = "Explain the process of photosynthesis."
  response = generate_response(question)
  print(response)
```