|
--- |
|
license: mit |
|
datasets: |
|
- mlabonne/orpo-dpo-mix-40k |
|
language: |
|
- en |
|
base_model: |
|
- meta-llama/Llama-3.2-1B-Instruct |
|
pipeline_tag: text-classification |
|
--- |
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Developed by:** Chintan Shah |
|
|
|
- **Model type:** meta-llama/Llama-3.2-1B-Instruct |
|
- **Finetuned from model [optional]:** meta-llama/Llama-3.2-1B-Instruct |
|
|
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
mlabonne/orpo-dpo-mix-40k |
|
|
|
|
|
### Training Procedure |
|
|
|
ORPO |
|
|
|
### Training Parameters |
|
## Training Arguments: |
|
|
|
- Learning Rate: 1e-5 |
|
- Batch Size: 1 |
|
- max_steps: 1 |
|
- Block Size: 512 |
|
- Warmup Ratio: 0.1 |
|
- Weight Decay: 0.01 |
|
- Gradient Accumulation: 4 |
|
- Mixed Precision: bf16 |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** |
|
fp16 mixed precision |
|
|
|
|
|
### LoRA Configuration: |
|
|
|
- R: 16 |
|
- Alpha: 32 |
|
- Dropout: 0.05 |
|
|
|
|
|
## Evaluation |
|
|
|
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |
|
|---------|------:|------|-----:|--------|---|-----:|---|-----:| |
|
|hellaswag| 1|none | 0|acc |↑ |0.4408|± |0.0050| |
|
| | |none | 0|acc_norm|↑ |0.5922|± |0.0049| |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
#### Testing Data |
|
|
|
https://github.com/EleutherAI/lm-evaluation-harnes |