metadata
license: mit
datasets:
- mlabonne/orpo-dpo-mix-40k
language:
- en
base_model:
- meta-llama/Llama-3.2-1B-Instruct
pipeline_tag: text-classification
Model Details
Model Description
Developed by: Chintan Shah
Model type: meta-llama/Llama-3.2-1B-Instruct
Finetuned from model [optional]: meta-llama/Llama-3.2-1B-Instruct
Training Details
Training Data
mlabonne/orpo-dpo-mix-40k
Training Procedure
ORPO
Training Parameters
Training Arguments:
- Learning Rate: 1e-5
- Batch Size: 1
- max_steps: 1
- Block Size: 512
- Warmup Ratio: 0.1
- Weight Decay: 0.01
- Gradient Accumulation: 4
- Mixed Precision: bf16
Training Hyperparameters
- Training regime: fp16 mixed precision
LoRA Configuration:
- R: 16
- Alpha: 32
- Dropout: 0.05
Evaluation
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
hellaswag | 1 | none | 0 | acc | ↑ | 0.4408 | ± | 0.0050 |
none | 0 | acc_norm | ↑ | 0.5922 | ± | 0.0049 |