**Model Card for gemma2-2b-M.O.M-gemma-sprint

This model is fine-tuned from the base google/gemma-2-2b-it model using the M.O.M dataset.

What is M.O.M Project?

The Motivational Organizer & Mentor (M.O.M.) project is designed to replicate the familiar and persistent encouragement that a caring parent might provide. By leveraging large language models (LLMs), M.O.M. delivers timely reminders, motivational "nags," and personalized feedback to keep users focused and productive. This service helps users manage their tasks by offering gentle yet persistent nudges, task prioritization, and empathetic guidance, ultimately reducing procrastination and boosting accountability.

Model Details

Model Description

M.O.M model uses the "nagging mom" concept to provide motivation to users through warm but persistent reminders based on the tasks they need to accomplish.

This model receives keywords representing the user's daily tasks and turns them into motivational messages delivered in the tone of a loving yet slightly exasperated mother. The model skillfully weaves four provided keywords into a cohesive story, ensuring the tone is warm while also urging the user to take action.

Key Features:

Input: Keywords representing the tasks the user needs to do.
Output: Motivational "mom nagging" messages.
Tone: Warm but persistently urging action.
Purpose: To motivate users to manage their time effectively and take responsibility for their tasks.

This model helps users stop procrastinating by giving them structured yet loving reminders, encouraging them to be more productive in their daily lives.

Training Procedure

Make Q/A Pairs

First, to fine-tune the Gemma 2b model as M.O.M, a Q/A Pair dataset is required. Typically, a QA Pair dataset can either be manually created or generated by prompting a good model with clear instructions. In my case, I used Prompt Engineering to create 600 Q/A Pairs based on examples I crafted myself. The resulting Q/A Pair dataset can be found at: nooynoos/M.O.M_Dataset_GemmaSprint.

Below is the code for generating the Q/A Pairs.

import json
from langchain_openai import ChatOpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain_core.prompts import PromptTemplate

# Set OpenAI API key
openai_api_key = ""  # Enter your API key here

# Define the prompt template
prompt = PromptTemplate.from_template(
    """너는 지구에서 자녀를 가장 사랑하지만, 잔소리가 정말 많은 엄마야.
    키워드는 20대 청년이 일상생활에서 해야하는 일을 적어주면 돼.
    그 키워드에 맞춰 엄마가 사랑스럽지만 약간 짜증난 듯한 잔소리로 동기부여해주는 답변을 작성해줘. 
    엄마의 잔소리는 4개의 키워드를 연결된 스토리로 자연스럽게 포함해야 해. 
    잔소리는 따뜻하지만 꾸준히 행동을 촉구하는 톤으로 작성되어야 하고. 다음 형식에 따라 답변을 생성해줘!: 

    Format에 맞춰서, 새로운 키워드와 함께 새로운 QA PAIR 5개를 생성해주면 돼. 

    #Format:
    ```json
    {{
        "QUESTION": "미소사 과제, 코딩 공부",
        "ANSWER": "미소사 과제랑 코딩 공부 둘 다 언제 할 거야? 과제는 끝낼 기미도 안 보이고, 코딩은 시작도 안 했잖아! 하루 종일 핸드폰만 만지작거릴 게 아니라, 그 시간에 차라리 코딩이라도 조금씩 해둬. 그리고 과제도 미리미리 해놔야 나중에 안 힘들지! 너 과제 몰아서 하다가 밤샘할까 봐 걱정돼 죽겠네."
    }},
    {{
        "QUESTION": "방 정리, 자기소개서 작성",
        "ANSWER": "방이 이렇게 어질러져 있으면 네 생각도 정리가 안 될 거야! 빨리 방부터 치우고, 자기소개서나 좀 써! 마감은 얼마 안 남았는데, 네 방 상태랑 자소서 상태가 똑같아 보인다, 진짜. 방금 치우고 자기소개서 조금씩 쓰면 마음도 더 가벼워질 거야."    
    }},
    {{
        "QUESTION": "Cousera 강의, LLM Fine Tuning",
        "ANSWER": "Cousera 강의 얼른 들어야지. 이거 마감 얼마 남지 않았잖아! Cousera 강의 빠르게 마무리 해야, LLM Fine Tuning까지 마무리 할 수 있지 않겠어? 조금 더 집중해서 빨리 해!"    
    }}
    ```
    """
)

# Custom JSON parser function
def custom_json_parser(response):
    json_string = response.content.strip().removeprefix("```json\n").removesuffix("\n```").strip()
    json_string = f'[{json_string}]'
    return json.loads(json_string)

# Configure the chain
chain = (
    prompt
    | ChatOpenAI(
        model="gpt-4o",
        temperature=0,
        streaming=True,
        callbacks=[StreamingStdOutCallbackHandler()],
        openai_api_key=openai_api_key  # Use the API key set directly
    )
    | custom_json_parser
)

# List to store QA pairs
qa_pairs = []

# Repeat 60 times to generate a total of 300 QA pairs
for i in range(1):  
    response = chain.invoke({"domain": "AI", "num_questions": "3"})  
    # Add the results to qa_pairs
    qa_pairs.extend(response)

# Finally, 300 QA pairs are stored in the qa_pairs list.
print(f"A total of {len(qa_pairs)} QA pairs have been generated.")

And save this dataset as a jsonl file.

from datasets import load_dataset

# Path to the JSONL file
jsonl_file = "qa_pair.jsonl"

# Load the JSONL file as a Dataset
dataset = load_dataset("json", data_files=jsonl_file)

# Save the QA pairs to a JSONL file

Loading/Preparing Training Data

The dataset uploaded to HuggingFace is loaded, and a function is applied to split it into Instruction and Response.

from datasets import load_dataset

# EOS_TOKEN is the token that indicates the end of a sentence. This token must be added.
EOS_TOKEN = tokenizer.eos_token

# Function to format instructions using AlpacaPrompt.
alpaca_prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{}

### Response:
{}"""

# Function to format the given examples.
def formatting_prompts_func(examples):
    instructions = examples["instruction"]  # Get the instructions.
    outputs = examples["output"]  # Get the outputs.
    texts = []  # List to store the formatted texts.
    for instruction, output in zip(instructions, outputs):
        # The EOS_TOKEN must be added; otherwise, generation may continue indefinitely.
        text = alpaca_prompt.format(instruction, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,  # Return the formatted texts.
    }

# Load the dataset from the specified source.
dataset = load_dataset("nooynoos/M.O.M_Dataset_GemmaSprint", split="train")

# Apply the formatting_prompts_func to the dataset with batch processing enabled.
dataset = dataset.map(
    formatting_prompts_func,
    batched=True,
)

Training the Model

Unsloth

Fine-tune using Unsloth. The reason for using Unsloth is that it supports 16-bit LoRA or 4-bit QLoRA, which allows for faster fine-tuning speeds.

First, use the FastLanguageModel.from_pretrained function to load the pre-trained Gemma 2-2b model.

from unsloth import FastLanguageModel
import torch

max_seq_length = 1024 # Set the maximum sequence length
dtype = None
# Use 4-bit quantization to reduce memory usage
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/gemma-2-2b",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # Use if working with gated models like meta-llama/Llama-2-7b-hf
)

Additionally, use the LoRA adapter to update only 1–10% of all parameters.

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Training the Model

Train the model. If you want to reduce VRAM usage, you can adjust the batch size.

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 100,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

trainer_stats = trainer.train()

Testing the Model

Let's check if it has become the 'nagging LLM' we wanted.

from transformers import StoppingCriteria, StoppingCriteriaList

class StopOnToken(StoppingCriteria):
    def __init__(self, stop_token_id):
        self.stop_token_id = stop_token_id  # Initialize the stop token ID.

    def __call__(self, input_ids, scores, **kwargs):
        return (
            self.stop_token_id in input_ids[0]
        )  # Stop if the stop token ID is present in the input IDs.

from transformers import TextStreamer

# Set inference speed to be twice as fast using FastLanguageModel.
FastLanguageModel.for_inference(model)
inputs = tokenizer(
    [
        alpaca_prompt.format(
            "운동, 코딩, 과제",  # Instruction
            "",  # Output - leave this blank for generation!
        )
    ],
    return_tensors="pt",
).to("cuda")

text_streamer = TextStreamer(tokenizer)
_ = model.generate(
    **inputs,
    streamer=text_streamer,
    max_new_tokens=4096,  # Set the maximum number of tokens to generate.
    stopping_criteria=stopping_criteria  # Set the criteria to stop generation.
)

The detailed results are as follows.

Save the merged model

base_model = "unsloth/gemma-2-2b"  # Base model to be merged.
huggingface_token = ""  # HuggingFace token.
huggingface_repo = "gemma2-2b-M.O.M-gemma-sprint"  # Repository to upload the model.
save_method = (
    "merged_16bit"  # Options: "merged_4bit", "merged_4bit_forced", "merged_16bit", "lora".
)
model.save_pretrained_merged(
    base_model,
    tokenizer,
    save_method=save_method,  # Set the save method to 16-bit merged.
)

Push the merged model to the Hugging Face Hub

merged_model.push_to_hub("Hyeonseo/gemma2-2b-it-finetuned-ko-bias-detection_merged", safe_serialization=True)

# Upload to the Hub
model.push_to_hub_merged(
    huggingface_repo,
    tokenizer,
    save_method=save_method,
    token=huggingface_token,
)