chintanshrinath's picture
Update README.md
1d592ef verified
metadata
license: mit
datasets:
  - mlabonne/orpo-dpo-mix-40k
language:
  - en
base_model:
  - meta-llama/Llama-3.2-1B-Instruct
pipeline_tag: text-classification

Model Details

Model Description

  • Developed by: Chintan Shah

  • Model type: meta-llama/Llama-3.2-1B-Instruct

  • Finetuned from model [optional]: meta-llama/Llama-3.2-1B-Instruct

Training Details

Training Data

mlabonne/orpo-dpo-mix-40k

Training Procedure

ORPO

Training Parameters

Training Arguments:

  • Learning Rate: 1e-5
  • Batch Size: 1
  • max_steps: 1
  • Block Size: 512
  • Warmup Ratio: 0.1
  • Weight Decay: 0.01
  • Gradient Accumulation: 4
  • Mixed Precision: bf16

Training Hyperparameters

  • Training regime: fp16 mixed precision

LoRA Configuration:

  • R: 16
  • Alpha: 32
  • Dropout: 0.05

Evaluation

Tasks Version Filter n-shot Metric Value Stderr
hellaswag 1 none 0 acc 0.4408 ± 0.0050
none 0 acc_norm 0.5922 ± 0.0049

Testing Data, Factors & Metrics

Testing Data

https://github.com/EleutherAI/lm-evaluation-harnes