Fine-tuning Mistral on Your Dataset
This tutorial will walk you through the process of fine-tuning the Mistral-7B-Instruct model on your own dataset using the Hugging Face Transformers and PEFT libraries
Step 0: Install required libraries
!pip install -q datasets accelerate evaluate trl accelerate bitsandbytes peft
Step 1: Load and format your dataset
We'll define a function to format the prompts in the dataset and load the dataset:
EOS_TOKEN = tokenizer.eos_token
def format_prompts(examples):
"""
Define the format for your dataset
This function should return a dictionary with a 'text' key containing the formatted prompts
"""
pass
from datasets import load_dataset
dataset = load_dataset("your_dataset_name", split="train")
dataset = dataset.map(format_prompts, batched=True)
Step 2: Set up the model and tokenizer
Next, we'll load the pre-trained Mistral-7B-Instruct model and tokenizer, and set up the model for quantization and gradient checkpointing.
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
model_id = "mistralai/Mistral-7B-Instruct-v0.3"
tokenizer = AutoTokenizer.from_pretrained(model_id)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config)
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)
Step 3: Set up PEFT (Parameter-Efficient Fine-Tuning)
We'll use the PEFT technique to fine-tune the model efficiently. This involves setting up a LoraConfig and getting the PEFT model.
from peft import LoraConfig, get_peft_model
config = LoraConfig(
r=32,
lora_alpha=64,
target_modules=[
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj",
"lm_head",
],
bias="none",
lora_dropout=0.05,
task_type="CAUSAL_LM",
)
model = get_peft_model(model, config)
Step 4: Set up the training arguments
We'll define the training arguments for the fine-tuning process.
from transformers import TrainingArguments
args = TrainingArguments(
output_dir="your_model_name",
num_train_epochs=40, # replace this, depending on your dataset
per_device_train_batch_size=16,
learning_rate=1e-5,
save_steps=100000,
optim="sgd",
optim_target_modules=["attn", "mlp"]
)
Replace "your_model_name"
with the desired name for your fine-tuned model.
Step 5: Initialize the trainer and fine-tune the model
Now, we'll initialize the SFTTrainer from the trl
library and train the model.
from trl import SFTTrainer
trainer = SFTTrainer(
model=model,
args=args,
train_dataset=dataset,
dataset_text_field='text',
max_seq_length=1024,
)
trainer.train()
Step 6: Merge the adapter and model back together
When the fine tuning is finished, you can merge the model back together.
adapter_model = trainer.model
merged_model = adapter_model.merge_and_unload()
trained_tokenizer = trainer.tokenizer
Step 7: Push the fine-tuned model to the Hugging Face Hub
After all of that, you can optionally push the fine-tuned model to the Hugging Face Hub for easier sharing and deployment.
repo_id = "your_repo_name"
merged_model.push_to_hub(repo_id)
trained_tokenizer.push_to_hub(repo_id)