metadata
library_name: peft
base_model: NousResearch/Meta-Llama-3-70B-Instruct
license: apache-2.0
Model Card for radm/Llama-3-70B-Instruct-AH-AWQ
This model fine-tuned to be a judge on Arena Hard (https://github.com/lm-sys/arena-hard-auto). The base model was trained using LoRA, combined with an adapter and converted to AWQ format.
Only LoRA adapter for base model can be found here (https://huggingface.co/radm/Llama-3-70B-Instruct-AH-lora)
Model Details
Model Description
- Developed by: [radm]
- Model type: [Llama-3-70b]
- Language(s) (NLP): [English]
- License: [apache-2.0]
- Finetuned from model [optional]: [NousResearch/Meta-Llama-3-70B-Instruct]
Uses
[More Information Needed]
Training Details
Training Data
Datasets:
- radm/arenahard_gpt4vsllama3
- radm/truthy-dpo-v0.1-ru
- jondurbin/truthy-dpo-v0.1
Training Hyperparameters
- Training regime: [bf16]
- Load in 4 bit: [True]
- Target modules: [all]
- LoRA rank: [16]
- Max seq length: [8192]
- Use gradient checkpointing: [unsloth]
- trainer: [ORPOTrainer]
- Batch size: [1]
- Gradient accumulation steps: [4]
- Epochs: [1]
Results
[More Information Needed]
Hardware
- Hardware Type: [Nvidia A100 80 gb]
- Hours used: [11 hours]
Framework versions
- PEFT 0.10.0