Meta-Llama 3.1 8B Text-to-SQL GPTQ Model

This repository provides a quantized 8-billion-parameter Meta-Llama model fine-tuned for text-to-SQL tasks. The model is optimized with GPTQ quantization for efficient inference. Below you'll find instructions to load, use, and fine-tune the model.

Model Details

Model Size: 8B
Quantization: GPTQ (4-bit)
Languages Supported: English, Italian
Task: Text-to-SQL generation
License: Apache 2.0

Installation Requirements

Before using the model, ensure that you have the following dependencies installed. We recommend using the same versions to avoid any compatibility issues.

# Install the required PyTorch version with CUDA support (ensure CUDA 12.1 is installed)
!pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121

# Install AutoGPTQ for quantized model handling
!pip install auto-gptq --no-build-isolation

# Install Optimum for model optimization
!pip install optimum

After installing the dependencies, reset your instance to ensure everything works correctly.

Loading the Model

To load the quantized Meta-Llama 3.1 model and use it for text-to-SQL tasks, use the following Python code:

from transformers import AutoTokenizer, pipeline
from auto_gptq import AutoGPTQForCausalLM
import torch

# Define the Alpaca-style prompt template
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
"""

# Model directory and tokenizer
quantized_model_dir = "meta-llama-8b-quantized-4bit"  # Path where quantized model is saved
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)

# Load the quantized model
model = AutoGPTQForCausalLM.from_quantized(
    quantized_model_dir,
    device_map="auto",  # Automatically map the model to the available device (GPU or CPU)
    torch_dtype=torch.float16,  # Ensure FP16 for efficiency
    use_safetensors=True  # If you saved the model using safetensors format, set this to True
)

# Set up the text generation pipeline without specifying the device
pipeline = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer
)

# Function to generate SQL query from input text using the Alpaca prompt
def generate_sql(input_text):
    # Format the prompt
    prompt = alpaca_prompt.format(
        "Provide the SQL query",
        input_text
    )

    # Generate the response using the pipeline
    generated_text = pipeline(
        prompt, 
        max_length=200, 
        eos_token_id=tokenizer.eos_token_id
    )[0]["generated_text"]

    # Clean the output by removing the prompt and any extra newlines
    cleaned_output = generated_text.replace(prompt, '').strip()

    return cleaned_output

# Example usage
italian_input = "Seleziona tutte le colonne della tabella table1 dove la colonna anni è uguale a 2020"
sql_query = generate_sql(italian_input)
print(sql_query)

Example Usage

The example script shows how to generate SQL queries from natural language text. Simply provide a request in Italian or English, and the model will generate an appropriate SQL query.

Example input:

italian_input = "Seleziona tutte le colonne della tabella table1 dove la colonna anni è uguale a 2020"
sql_query = generate_sql(italian_input)
print(sql_query)

Example output:

SELECT * FROM table1 WHERE anni = 2020;

Model Tags

text-generation-inference
transformers
llama
trl
sft

License

This model is released under the Apache License 2.0.