**Model Card for gemma2-2b-M.O.M-gemma-sprint

This model is fine-tuned from the base google/gemma-2-2b-it model using the M.O.M dataset.

What is M.O.M Project?

The Motivational Organizer & Mentor (M.O.M.) project is designed to replicate the familiar and persistent encouragement that a caring parent might provide. By leveraging large language models (LLMs), M.O.M. delivers timely reminders, motivational "nags," and personalized feedback to keep users focused and productive. This service helps users manage their tasks by offering gentle yet persistent nudges, task prioritization, and empathetic guidance, ultimately reducing procrastination and boosting accountability.

Model Details

Model Description

M.O.M model uses the "nagging mom" concept to provide motivation to users through warm but persistent reminders based on the tasks they need to accomplish.

This model receives keywords representing the user's daily tasks and turns them into motivational messages delivered in the tone of a loving yet slightly exasperated mother. The model skillfully weaves four provided keywords into a cohesive story, ensuring the tone is warm while also urging the user to take action.

Key Features:

  • Input: Keywords representing the tasks the user needs to do.
  • Output: Motivational "mom nagging" messages.
  • Tone: Warm but persistently urging action.
  • Purpose: To motivate users to manage their time effectively and take responsibility for their tasks.

This model helps users stop procrastinating by giving them structured yet loving reminders, encouraging them to be more productive in their daily lives.

Training Procedure

Make Q/A Pairs

First, to fine-tune the Gemma 2b model as M.O.M, a Q/A Pair dataset is required. Typically, a QA Pair dataset can either be manually created or generated by prompting a good model with clear instructions. In my case, I used Prompt Engineering to create 600 Q/A Pairs based on examples I crafted myself. The resulting Q/A Pair dataset can be found at: nooynoos/M.O.M_Dataset_GemmaSprint.

Below is the code for generating the Q/A Pairs.

import json
from langchain_openai import ChatOpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain_core.prompts import PromptTemplate

# Set OpenAI API key
openai_api_key = ""  # Enter your API key here

# Define the prompt template
prompt = PromptTemplate.from_template(
    """๋„ˆ๋Š” ์ง€๊ตฌ์—์„œ ์ž๋…€๋ฅผ ๊ฐ€์žฅ ์‚ฌ๋ž‘ํ•˜์ง€๋งŒ, ์ž”์†Œ๋ฆฌ๊ฐ€ ์ •๋ง ๋งŽ์€ ์—„๋งˆ์•ผ.
    ํ‚ค์›Œ๋“œ๋Š” 20๋Œ€ ์ฒญ๋…„์ด ์ผ์ƒ์ƒํ™œ์—์„œ ํ•ด์•ผํ•˜๋Š” ์ผ์„ ์ ์–ด์ฃผ๋ฉด ๋ผ.
    ๊ทธ ํ‚ค์›Œ๋“œ์— ๋งž์ถฐ ์—„๋งˆ๊ฐ€ ์‚ฌ๋ž‘์Šค๋Ÿฝ์ง€๋งŒ ์•ฝ๊ฐ„ ์งœ์ฆ๋‚œ ๋“ฏํ•œ ์ž”์†Œ๋ฆฌ๋กœ ๋™๊ธฐ๋ถ€์—ฌํ•ด์ฃผ๋Š” ๋‹ต๋ณ€์„ ์ž‘์„ฑํ•ด์ค˜. 
    ์—„๋งˆ์˜ ์ž”์†Œ๋ฆฌ๋Š” 4๊ฐœ์˜ ํ‚ค์›Œ๋“œ๋ฅผ ์—ฐ๊ฒฐ๋œ ์Šคํ† ๋ฆฌ๋กœ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ํฌํ•จํ•ด์•ผ ํ•ด. 
    ์ž”์†Œ๋ฆฌ๋Š” ๋”ฐ๋œปํ•˜์ง€๋งŒ ๊พธ์ค€ํžˆ ํ–‰๋™์„ ์ด‰๊ตฌํ•˜๋Š” ํ†ค์œผ๋กœ ์ž‘์„ฑ๋˜์–ด์•ผ ํ•˜๊ณ . ๋‹ค์Œ ํ˜•์‹์— ๋”ฐ๋ผ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•ด์ค˜!: 

    Format์— ๋งž์ถฐ์„œ, ์ƒˆ๋กœ์šด ํ‚ค์›Œ๋“œ์™€ ํ•จ๊ป˜ ์ƒˆ๋กœ์šด QA PAIR 5๊ฐœ๋ฅผ ์ƒ์„ฑํ•ด์ฃผ๋ฉด ๋ผ. 

    #Format:
    ```json
    {{
        "QUESTION": "๋ฏธ์†Œ์‚ฌ ๊ณผ์ œ, ์ฝ”๋”ฉ ๊ณต๋ถ€",
        "ANSWER": "๋ฏธ์†Œ์‚ฌ ๊ณผ์ œ๋ž‘ ์ฝ”๋”ฉ ๊ณต๋ถ€ ๋‘˜ ๋‹ค ์–ธ์ œ ํ•  ๊ฑฐ์•ผ? ๊ณผ์ œ๋Š” ๋๋‚ผ ๊ธฐ๋ฏธ๋„ ์•ˆ ๋ณด์ด๊ณ , ์ฝ”๋”ฉ์€ ์‹œ์ž‘๋„ ์•ˆ ํ–ˆ์ž–์•„! ํ•˜๋ฃจ ์ข…์ผ ํ•ธ๋“œํฐ๋งŒ ๋งŒ์ง€์ž‘๊ฑฐ๋ฆด ๊ฒŒ ์•„๋‹ˆ๋ผ, ๊ทธ ์‹œ๊ฐ„์— ์ฐจ๋ผ๋ฆฌ ์ฝ”๋”ฉ์ด๋ผ๋„ ์กฐ๊ธˆ์”ฉ ํ•ด๋‘ฌ. ๊ทธ๋ฆฌ๊ณ  ๊ณผ์ œ๋„ ๋ฏธ๋ฆฌ๋ฏธ๋ฆฌ ํ•ด๋†”์•ผ ๋‚˜์ค‘์— ์•ˆ ํž˜๋“ค์ง€! ๋„ˆ ๊ณผ์ œ ๋ชฐ์•„์„œ ํ•˜๋‹ค๊ฐ€ ๋ฐค์ƒ˜ํ• ๊นŒ ๋ด ๊ฑฑ์ •๋ผ ์ฃฝ๊ฒ ๋„ค."
    }},
    {{
        "QUESTION": "๋ฐฉ ์ •๋ฆฌ, ์ž๊ธฐ์†Œ๊ฐœ์„œ ์ž‘์„ฑ",
        "ANSWER": "๋ฐฉ์ด ์ด๋ ‡๊ฒŒ ์–ด์งˆ๋Ÿฌ์ ธ ์žˆ์œผ๋ฉด ๋„ค ์ƒ๊ฐ๋„ ์ •๋ฆฌ๊ฐ€ ์•ˆ ๋  ๊ฑฐ์•ผ! ๋นจ๋ฆฌ ๋ฐฉ๋ถ€ํ„ฐ ์น˜์šฐ๊ณ , ์ž๊ธฐ์†Œ๊ฐœ์„œ๋‚˜ ์ข€ ์จ! ๋งˆ๊ฐ์€ ์–ผ๋งˆ ์•ˆ ๋‚จ์•˜๋Š”๋ฐ, ๋„ค ๋ฐฉ ์ƒํƒœ๋ž‘ ์ž์†Œ์„œ ์ƒํƒœ๊ฐ€ ๋˜‘๊ฐ™์•„ ๋ณด์ธ๋‹ค, ์ง„์งœ. ๋ฐฉ๊ธˆ ์น˜์šฐ๊ณ  ์ž๊ธฐ์†Œ๊ฐœ์„œ ์กฐ๊ธˆ์”ฉ ์“ฐ๋ฉด ๋งˆ์Œ๋„ ๋” ๊ฐ€๋ฒผ์›Œ์งˆ ๊ฑฐ์•ผ."    
    }},
    {{
        "QUESTION": "Cousera ๊ฐ•์˜, LLM Fine Tuning",
        "ANSWER": "Cousera ๊ฐ•์˜ ์–ผ๋ฅธ ๋“ค์–ด์•ผ์ง€. ์ด๊ฑฐ ๋งˆ๊ฐ ์–ผ๋งˆ ๋‚จ์ง€ ์•Š์•˜์ž–์•„! Cousera ๊ฐ•์˜ ๋น ๋ฅด๊ฒŒ ๋งˆ๋ฌด๋ฆฌ ํ•ด์•ผ, LLM Fine Tuning๊นŒ์ง€ ๋งˆ๋ฌด๋ฆฌ ํ•  ์ˆ˜ ์žˆ์ง€ ์•Š๊ฒ ์–ด? ์กฐ๊ธˆ ๋” ์ง‘์ค‘ํ•ด์„œ ๋นจ๋ฆฌ ํ•ด!"    
    }}
    ```
    """
)

# Custom JSON parser function
def custom_json_parser(response):
    json_string = response.content.strip().removeprefix("```json\n").removesuffix("\n```").strip()
    json_string = f'[{json_string}]'
    return json.loads(json_string)

# Configure the chain
chain = (
    prompt
    | ChatOpenAI(
        model="gpt-4o",
        temperature=0,
        streaming=True,
        callbacks=[StreamingStdOutCallbackHandler()],
        openai_api_key=openai_api_key  # Use the API key set directly
    )
    | custom_json_parser
)

# List to store QA pairs
qa_pairs = []

# Repeat 60 times to generate a total of 300 QA pairs
for i in range(1):  
    response = chain.invoke({"domain": "AI", "num_questions": "3"})  
    # Add the results to qa_pairs
    qa_pairs.extend(response)

# Finally, 300 QA pairs are stored in the qa_pairs list.
print(f"A total of {len(qa_pairs)} QA pairs have been generated.")

And save this dataset as a jsonl file.

from datasets import load_dataset

# Path to the JSONL file
jsonl_file = "qa_pair.jsonl"

# Load the JSONL file as a Dataset
dataset = load_dataset("json", data_files=jsonl_file)

# Save the QA pairs to a JSONL file

Loading/Preparing Training Data

The dataset uploaded to HuggingFace is loaded, and a function is applied to split it into Instruction and Response.

from datasets import load_dataset

# EOS_TOKEN is the token that indicates the end of a sentence. This token must be added.
EOS_TOKEN = tokenizer.eos_token

# Function to format instructions using AlpacaPrompt.
alpaca_prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{}

### Response:
{}"""

# Function to format the given examples.
def formatting_prompts_func(examples):
    instructions = examples["instruction"]  # Get the instructions.
    outputs = examples["output"]  # Get the outputs.
    texts = []  # List to store the formatted texts.
    for instruction, output in zip(instructions, outputs):
        # The EOS_TOKEN must be added; otherwise, generation may continue indefinitely.
        text = alpaca_prompt.format(instruction, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,  # Return the formatted texts.
    }

# Load the dataset from the specified source.
dataset = load_dataset("nooynoos/M.O.M_Dataset_GemmaSprint", split="train")

# Apply the formatting_prompts_func to the dataset with batch processing enabled.
dataset = dataset.map(
    formatting_prompts_func,
    batched=True,
)

Training the Model

Unsloth

Fine-tune using Unsloth. The reason for using Unsloth is that it supports 16-bit LoRA or 4-bit QLoRA, which allows for faster fine-tuning speeds.

First, use the FastLanguageModel.from_pretrained function to load the pre-trained Gemma 2-2b model.

from unsloth import FastLanguageModel
import torch

max_seq_length = 1024 # Set the maximum sequence length
dtype = None
# Use 4-bit quantization to reduce memory usage
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/gemma-2-2b",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # Use if working with gated models like meta-llama/Llama-2-7b-hf
)

Additionally, use the LoRA adapter to update only 1โ€“10% of all parameters.

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Training the Model

Train the model. If you want to reduce VRAM usage, you can adjust the batch size.

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 100,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

trainer_stats = trainer.train()

Testing the Model

Let's check if it has become the 'nagging LLM' we wanted.

from transformers import StoppingCriteria, StoppingCriteriaList

class StopOnToken(StoppingCriteria):
    def __init__(self, stop_token_id):
        self.stop_token_id = stop_token_id  # Initialize the stop token ID.

    def __call__(self, input_ids, scores, **kwargs):
        return (
            self.stop_token_id in input_ids[0]
        )  # Stop if the stop token ID is present in the input IDs.

from transformers import TextStreamer

# Set inference speed to be twice as fast using FastLanguageModel.
FastLanguageModel.for_inference(model)
inputs = tokenizer(
    [
        alpaca_prompt.format(
            "์šด๋™, ์ฝ”๋”ฉ, ๊ณผ์ œ",  # Instruction
            "",  # Output - leave this blank for generation!
        )
    ],
    return_tensors="pt",
).to("cuda")

text_streamer = TextStreamer(tokenizer)
_ = model.generate(
    **inputs,
    streamer=text_streamer,
    max_new_tokens=4096,  # Set the maximum number of tokens to generate.
    stopping_criteria=stopping_criteria  # Set the criteria to stop generation.
)

The detailed results are as follows.

Save the merged model

base_model = "unsloth/gemma-2-2b"  # Base model to be merged.
huggingface_token = ""  # HuggingFace token.
huggingface_repo = "gemma2-2b-M.O.M-gemma-sprint"  # Repository to upload the model.
save_method = (
    "merged_16bit"  # Options: "merged_4bit", "merged_4bit_forced", "merged_16bit", "lora".
)
model.save_pretrained_merged(
    base_model,
    tokenizer,
    save_method=save_method,  # Set the save method to 16-bit merged.
)

Push the merged model to the Hugging Face Hub

merged_model.push_to_hub("Hyeonseo/gemma2-2b-it-finetuned-ko-bias-detection_merged", safe_serialization=True)

# Upload to the Hub
model.push_to_hub_merged(
    huggingface_repo,
    tokenizer,
    save_method=save_method,
    token=huggingface_token,
)

Performance

Fine-tuned Model(gemma2-2b-M.O.M)

image/png

image/png

image/png

image/png

More detailed results and the code can be found at the following GitHub link.

https://github.com/nooynoos/M.O.M-Gemma-Sprint

Downloads last month
18
Safetensors
Model size
2.61B params
Tensor type
BF16
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for nooynoos/gemma2-2b-M.O.M-gemma-sprint

Base model

unsloth/gemma-2-2b
Finetuned
(12)
this model

Dataset used to train nooynoos/gemma2-2b-M.O.M-gemma-sprint