--- license: mit datasets: - mlabonne/orpo-dpo-mix-40k language: - en base_model: - meta-llama/Llama-3.2-1B-Instruct pipeline_tag: text-classification --- ## Model Details ### Model Description - **Developed by:** Chintan Shah - **Model type:** meta-llama/Llama-3.2-1B-Instruct - **Finetuned from model [optional]:** meta-llama/Llama-3.2-1B-Instruct ## Training Details ### Training Data mlabonne/orpo-dpo-mix-40k ### Training Procedure ORPO ### Training Parameters ## Training Arguments: - Learning Rate: 1e-5 - Batch Size: 1 - max_steps: 1 - Block Size: 512 - Warmup Ratio: 0.1 - Weight Decay: 0.01 - Gradient Accumulation: 4 - Mixed Precision: bf16 #### Training Hyperparameters - **Training regime:** fp16 mixed precision ### LoRA Configuration: - R: 16 - Alpha: 32 - Dropout: 0.05 ## Evaluation | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |---------|------:|------|-----:|--------|---|-----:|---|-----:| |hellaswag| 1|none | 0|acc |↑ |0.4408|± |0.0050| | | |none | 0|acc_norm|↑ |0.5922|± |0.0049| ### Testing Data, Factors & Metrics #### Testing Data https://github.com/EleutherAI/lm-evaluation-harnes