LoRA methods
A popular way to efficiently train large models is to insert (typically in the attention blocks) smaller trainable matrices that are a low-rank decomposition of the delta weight matrix to be learnt during finetuning. The pretrained model’s original weight matrix is frozen and only the smaller matrices are updated during training. This reduces the number of trainable parameters, reducing memory usage and training time which can be very expensive for large models.
There are several different ways to express the weight matrix as a low-rank decomposition, but Low-Rank Adaptation (LoRA) is the most common method. The PEFT library supports several other LoRA variants, such as Low-Rank Hadamard Product (LoHa), Low-Rank Kronecker Product (LoKr), and Adaptive Low-Rank Adaptation (AdaLoRA). You can learn more about how these methods work conceptually in the Adapters guide. If you’re interested in applying these methods to other tasks and use cases like semantic segmentation, token classification, take a look at our notebook collection!
Additionally, PEFT supports the X-LoRA Mixture of LoRA Experts method.
This guide will show you how to quickly train an image classification model - with a low-rank decomposition method - to identify the class of food shown in an image.
Some familiarity with the general process of training an image classification model would be really helpful and allow you to focus on the low-rank decomposition methods. If you’re new, we recommend taking a look at the Image classification guide first from the Transformers documentation. When you’re ready, come back and see how easy it is to drop PEFT in to your training!
Before you begin, make sure you have all the necessary libraries installed.
pip install -q peft transformers datasets
Dataset
In this guide, you’ll use the Food-101 dataset which contains images of 101 food classes (take a look at the dataset viewer to get a better idea of what the dataset looks like).
Load the dataset with the load_dataset function.
from datasets import load_dataset
ds = load_dataset("food101")
Each food class is labeled with an integer, so to make it easier to understand what these integers represent, you’ll create a label2id
and id2label
dictionary to map the integer to its class label.
labels = ds["train"].features["label"].names
label2id, id2label = dict(), dict()
for i, label in enumerate(labels):
label2id[label] = i
id2label[i] = label
id2label[2]
"baklava"
Load an image processor to properly resize and normalize the pixel values of the training and evaluation images.
from transformers import AutoImageProcessor
image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")
You can also use the image processor to prepare some transformation functions for data augmentation and pixel scaling.
from torchvision.transforms import (
CenterCrop,
Compose,
Normalize,
RandomHorizontalFlip,
RandomResizedCrop,
Resize,
ToTensor,
)
normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
train_transforms = Compose(
[
RandomResizedCrop(image_processor.size["height"]),
RandomHorizontalFlip(),
ToTensor(),
normalize,
]
)
val_transforms = Compose(
[
Resize(image_processor.size["height"]),
CenterCrop(image_processor.size["height"]),
ToTensor(),
normalize,
]
)
def preprocess_train(example_batch):
example_batch["pixel_values"] = [train_transforms(image.convert("RGB")) for image in example_batch["image"]]
return example_batch
def preprocess_val(example_batch):
example_batch["pixel_values"] = [val_transforms(image.convert("RGB")) for image in example_batch["image"]]
return example_batch
Define the training and validation datasets, and use the set_transform function to apply the transformations on-the-fly.
train_ds = ds["train"]
val_ds = ds["validation"]
train_ds.set_transform(preprocess_train)
val_ds.set_transform(preprocess_val)
Finally, you’ll need a data collator to create a batch of training and evaluation data and convert the labels to torch.tensor
objects.
import torch
def collate_fn(examples):
pixel_values = torch.stack([example["pixel_values"] for example in examples])
labels = torch.tensor([example["label"] for example in examples])
return {"pixel_values": pixel_values, "labels": labels}
Model
Now let’s load a pretrained model to use as the base model. This guide uses the google/vit-base-patch16-224-in21k model, but you can use any image classification model you want. Pass the label2id
and id2label
dictionaries to the model so it knows how to map the integer labels to their class labels, and you can optionally pass the ignore_mismatched_sizes=True
parameter if you’re finetuning a checkpoint that has already been finetuned.
from transformers import AutoModelForImageClassification, TrainingArguments, Trainer
model = AutoModelForImageClassification.from_pretrained(
"google/vit-base-patch16-224-in21k",
label2id=label2id,
id2label=id2label,
ignore_mismatched_sizes=True,
)
PEFT configuration and model
Every PEFT method requires a configuration that holds all the parameters specifying how the PEFT method should be applied. Once the configuration is setup, pass it to the get_peft_model() function along with the base model to create a trainable PeftModel.
Call the print_trainable_parameters() method to compare the number of parameters of PeftModel versus the number of parameters in the base model!
LoRA decomposes the weight update matrix into two smaller matrices. The size of these low-rank matrices is determined by its rank or r
. A higher rank means the model has more parameters to train, but it also means the model has more learning capacity. You’ll also want to specify the target_modules
which determine where the smaller matrices are inserted. For this guide, you’ll target the query and value matrices of the attention blocks. Other important parameters to set are lora_alpha
(scaling factor), bias
(whether none
, all
or only the LoRA bias parameters should be trained), and modules_to_save
(the modules apart from the LoRA layers to be trained and saved). All of these parameters - and more - are found in the LoraConfig.
from peft import LoraConfig, get_peft_model
config = LoraConfig(
r=16,
lora_alpha=16,
target_modules=["query", "value"],
lora_dropout=0.1,
bias="none",
modules_to_save=["classifier"],
)
model = get_peft_model(model, config)
model.print_trainable_parameters()
"trainable params: 667,493 || all params: 86,543,818 || trainable%: 0.7712775047664294"
Training
For training, let’s use the Trainer class from Transformers. The Trainer
contains a PyTorch training loop, and when you’re ready, call train to start training. To customize the training run, configure the training hyperparameters in the TrainingArguments class. With LoRA-like methods, you can afford to use a higher batch size and learning rate.
AdaLoRA has an update_and_allocate() method that should be called at each training step to update the parameter budget and mask, otherwise the adaptation step is not performed. This requires writing a custom training loop or subclassing the Trainer to incorporate this method. As an example, take a look at this custom training loop.
from transformers import TrainingArguments, Trainer
account = "stevhliu"
peft_model_id = f"{account}/google/vit-base-patch16-224-in21k-lora"
batch_size = 128
args = TrainingArguments(
peft_model_id,
remove_unused_columns=False,
eval_strategy="epoch",
save_strategy="epoch",
learning_rate=5e-3,
per_device_train_batch_size=batch_size,
gradient_accumulation_steps=4,
per_device_eval_batch_size=batch_size,
fp16=True,
num_train_epochs=5,
logging_steps=10,
load_best_model_at_end=True,
label_names=["labels"],
)
Begin training with train.
trainer = Trainer( model, args, train_dataset=train_ds, eval_dataset=val_ds, tokenizer=image_processor, data_collator=collate_fn, ) trainer.train()
Share your model
Once training is complete, you can upload your model to the Hub with the push_to_hub method. You’ll need to login to your Hugging Face account first and enter your token when prompted.
from huggingface_hub import notebook_login
notebook_login()
Call push_to_hub to save your model to your repositoy.
model.push_to_hub(peft_model_id)
Inference
Let’s load the model from the Hub and test it out on a food image.
from peft import PeftConfig, PeftModel
from transformers import AutoImageProcessor
from PIL import Image
import requests
config = PeftConfig.from_pretrained("stevhliu/vit-base-patch16-224-in21k-lora")
model = AutoModelForImageClassification.from_pretrained(
config.base_model_name_or_path,
label2id=label2id,
id2label=id2label,
ignore_mismatched_sizes=True,
)
model = PeftModel.from_pretrained(model, "stevhliu/vit-base-patch16-224-in21k-lora")
url = "https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/beignets.jpeg"
image = Image.open(requests.get(url, stream=True).raw)
image
Convert the image to RGB and return the underlying PyTorch tensors.
encoding = image_processor(image.convert("RGB"), return_tensors="pt")
Now run the model and return the predicted class!
with torch.no_grad():
outputs = model(**encoding)
logits = outputs.logits
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])
"Predicted class: beignets"