metadata

library_name: peft
base_model: NousResearch/Meta-Llama-3-70B-Instruct
license: apache-2.0

Model Card for radm/Llama-3-70B-Instruct-AH-AWQ

This model fine-tuned to be a judge on Arena Hard (https://github.com/lm-sys/arena-hard-auto). The base model was trained using LoRA, combined with an adapter and converted to AWQ format.

Only LoRA adapter for base model can be found here (https://huggingface.co/radm/Llama-3-70B-Instruct-AH-lora)

Model Details

Model Description

Developed by: [radm]
Model type: [Llama-3-70b]
Language(s) (NLP): [English]
License: [apache-2.0]
Finetuned from model [optional]: [NousResearch/Meta-Llama-3-70B-Instruct]

Uses

[More Information Needed]

Training Details

Training Data

Datasets:

radm/arenahard_gpt4vsllama3
radm/truthy-dpo-v0.1-ru
jondurbin/truthy-dpo-v0.1

Training Hyperparameters

Training regime: [bf16]
Load in 4 bit: [True]
Target modules: [all]
LoRA rank: [16]
Max seq length: [8192]
Use gradient checkpointing: [unsloth]
trainer: [ORPOTrainer]
Batch size: [1]
Gradient accumulation steps: [4]
Epochs: [1]

Results

[More Information Needed]

Hardware

Hardware Type: [Nvidia A100 80 gb]
Hours used: [11 hours]

Framework versions

PEFT 0.10.0