Low Code Large Language Model Alignment

Community Article Published November 19, 2024

In this tutorial we will fine-tune a language model without writing any complex code, and mainly using UI tools. We will use the autotrain-advanced library to fine-tune a small language model on a custom dataset. And we will use Argilla's UI to create and review the dataset. The tutorial will follow these core steps:

Create a dataset using Argilla's UI
Export the dataset to the hub
Train a language model using the AutoTrain UI
Evaluate the model using lighteval

Create a Dataset

First we will create a dataset in Argilla based on an existing dataset. We will take a general approach to this, but in a real-world scenario you would want to create a dataset based on your specific use case. For example, you could filter the dataset to only include certain categories or topics.

Start from an opensource dataset

We will work with Maxime Labonne's dataset of 40k samples with chosen and rejected completions. The dataset is available in the hub at mlabonne/orpo-dpo-mix-40k. Below is a preview of the dataset.

Importing the dataset into Argilla

To import the dataset into Argilla, we will use the 'Create Dataset' feature in the UI. For a more detailed guide on how to create a dataset in Argilla, check out this blog post.

The video below shows how to import the dataset into Argilla by pasting the dataset's repo id into the 'Create Dataset' form. In our case, the repo id is mlabonne/orpo-dpo-mix-40k.

Argilla will suggest a configuration based on the dataset. We can then add questions in the UI like a rating question for relevance. This will allow us to filter the dataset based on categories or topics.

The dataset configurator lets you define a the fields and questions for review. The fields can be text, conversation, or images. The questions can labels, ratings, rankings, or text.

Filter the dataset

After importing the dataset, we can filter it based on the relevance question we added. In our case, we will filter the dataset to only include completions that are relevant to the prompt. We will also filter the dataset to only include completions that are at least 10 tokens long.

Export the dataset to the Hub

After filtering the dataset, we can export it to the hub. This will allow us to access the dataset in AutoTrain. Unfortunately, the export feature is not yet available in the UI, so we will have to use the python library to export the dataset.

import argilla as rg
from datasets import Dataset

client = rg.Argilla(api_key="<argilla_api_key>", api_url="<argilla_api_url>")
dataset = client.datasets("dataset_path")

# Process Argilla records by dealing with multiple responses

dataset_rows = []

for record in dataset.records(with_suggestions=True, with_responses=True):
    row = record.fields

    if len(record.responses) == 0:
        answer = record.suggestions["correct_answer"].value
        row["correct_answer"] = answer
    else:
        for response in record.responses:
            if response.question_name == "correct_answer":
                row["correct_answer"] = response.value
                
    dataset_rows.append(row)

# Create Hugging Face dataset and push to Hub

hf_dataset = Dataset.from_list(dataset_rows)
hf_dataset.push_to_hub(repo_id=args.dataset_repo_id)

Fine-tune

With a dataset on the hub, we can now fine-tune a language model using the autotrain UI. Alternatively, you can use the autotrain-advanced library to fine-tune a language model. Check that out if you want to fine-tune a model using CLI commands.

Select the algorithm

There are countless fine-tuning algorithms for LLMs to choose from, and many of them are supported by AutoTrain. We will work with the ORPO algorithm because it's simple to use and delivers significant improvements on base models.

ORPO (Online Reward Policy Optimization) is a streamlined fine-tuning technique that merges two stages—supervised fine-tuning (SFT) and preference alignment—into one. This integration reduces both the computational load and training time. Traditionally, fine-tuning large language models (LLMs) for specific tasks involves SFT to adapt the model’s domain knowledge and then preference alignment (such as RLHF or Direct Preference Optimization) to prioritize preferred responses over undesirable ones. ORPO addresses an issue in the traditional method where SFT inadvertently increases the probability of both desirable and undesirable outputs, necessitating an additional alignment phase.

Developed by Hong and Lee in 2024, ORPO combines these processes by modifying the model’s objective function to include a loss term that rewards preferred responses while penalizing rejected ones. This approach has shown to outperform other alignment methods across different model sizes and tasks. ORPO’s efficiency and improved alignment make it a promising alternative in fine-tuning LLMs like Llama 3.

In the AutoTrain UI, you can select the ORPO algorithm from the dropdown menu on the left. As shown in the image below, you can also adjust the hyperparameters for the algorithm.

Select the base model

The Hugging Face Hub contains thousands of language models that we could use as a base model for fine-tuning. Many of them are evaluated on general benchmarks and can be used as a starting point for fine-tuning. To access benchmark scores for a model, you can use the Open LLM Leaderboard.

We will use the SmolLM2 model as a base because it's only 1.7 billion parameters which means it will run on a wide range of hardware. It's also performed well on general benchmarks so we can expect reasonable performance from it on a number of use cases.

Train the model

We can train the model in AutoTrain using the UI. The AutoTrain homepage lets you select the hardware you want to use for training, then it creates a space on Hugging Face.

You will need to select the hardware you want to use for training. The hardware you select will determine the number of GPUs you can use and the amount of VRAM you have access to. We reccoemnd starting with a Nvidia L40, which is available.

After a few minutes the AutoTrain UI will be ready for you to start training your model. You can start training by selecting the dataset you want to use, the model you want to finetune, and the training parameters you want to use.

To begin, start off with the default parameters and adjust them as needed. Below is a list of the parameters you can adjust.

Parameter	Example	Description
model	HuggingFaceTB/SmolLM2-135M	The base model to finetune from
project-name	my-autotrain-llm	The name of your finetuned model
data-path	autotrain_data	The dataset repo id
trainer	orpo	The training algorithm to use
lr	2e-5	The learning rate
batch-size	4	The batch size for training and evaluation
epochs	1	The number of epochs to train
block-size	512	The maximum input token length
warmup-ratio	0.1	Ratio for learning rate warmup
lora-r	16	LoRA rank
lora-alpha	32	LoRA alpha
lora-dropout	0.05	LoRA dropout rate
weight-decay	0.01	Weight decay for regularization
gradient-accumulation	4	Gradient accumulation steps
mixed-precision	bf16	Mixed precision format (e.g., bf16, fp16)
max-prompt-length	256	Maximum length for prompts
max-completion-length	256	Maximum length for completions
logging-steps	10	Steps interval for logging
save-total-limit	2	Limit for total number of saved checkpoints
seed	42	Random seed for reproducibility

If you have limited hardware consider reducing the batch_size and block_size.
If you have more VRAM than 40GB, you should increase your batch size.
Once you've evaluated your model's checkpoints, you might wish to tweak epochs, weight-decay, and LoRa parameters.

Evaluate the model

You can now evaluate your trained model. Here we will use some general benchmarks which can help to determine whether our model's performance has changed compared to its previous training.

lighteval accelerate \
    --model_args "pretrained=HuggingFaceTB/SmolLM2-135M" \
    --tasks "leaderboard|truthfulqa:mc|0|0" \
    --override_batch_size 1 \
    --output_dir="./evals/"

For a real-world use case, you would want to to evaluate your model on the task that you plan to use it for. In this guide on custom evaluation we show how to do that.

Upvote