README.md · JPBianchi/llm_uplimit_P2

metadata

license: llama3.2
datasets:
  - mlabonne/orpo-dpo-mix-40k
language:
  - en
base_model:
  - meta-llama/Llama-3.2-1B
library_name: transformers
pipeline_tag: text-generation
model-index:
  - name: week2-llama3-1B
    results:
      - task:
          type: text-generation
        dataset:
          name: mlabonne/orpo-dpo-mix-40k
          type: mlabonne/orpo-dpo-mix-40k
        metrics:
          - name: EQ-Bench (0-Shot)
            type: EQ-Bench (0-Shot)
            value: 1.5355

Model Overview

This model is a fine-tuned variant of Llama-3.2-1B, leveraging ORPO (Optimized Regularization for Prompt Optimization) for enhanced performance. It has been fine-tuned using the mlabonne/orpo-dpo-mix-40k dataset as part of the Finetuning Open Source LLMs Course - Week 2 Project.

Intended Use

This model is optimized for general-purpose language tasks, including text parsing, understanding contextual prompts, and enhanced interpretability in natural language processing applications.

Evaluation Results

The model was evaluated on the following benchmarks, with the following performance metrics:

Tasks	Version	Filter	Metric		Value		Stderr
eq_bench	2.1	none	eqbench	↑	1.5355	±	0.9184
		none	percent_parseable	↑	16.9591	±	2.8782
hellaswag	1	none	acc	↑	0.4812	±	0.0050
		none	acc_norm	↑	0.6467	±	0.0049
ifeval	4	none	inst_level_loose_acc	↑	0.3984	±	N/A
		none	inst_level_strict_acc	↑	0.2974	±	N/A
		none	prompt_level_loose_acc	↑	0.2755	±	0.0193
		none	prompt_level_strict_acc	↑	0.1848	±	0.0168
tinyMMLU	0	none	acc_norm	↑	0.3995	±	N/A

Key Features

Model Size: 1 Billion parameters
Fine-tuning Method: ORPO
Dataset: mlabonne/orpo-dpo-mix-40k