radm's picture
Update README.md
91e1e02 verified
|
raw
history blame
1.95 kB
metadata
library_name: peft
base_model: NousResearch/Meta-Llama-3-70B-Instruct
license: apache-2.0

Model Card for radm/Llama-3-70B-Instruct-AH-AWQ

This model fine-tuned to be a judge on Arena Hard (https://github.com/lm-sys/arena-hard-auto). The base model was trained using LoRA, combined with an adapter and converted to AWQ format.

Only LoRA adapter for base model can be found here (https://huggingface.co/radm/Llama-3-70B-Instruct-AH-lora)

Model Details

Model Description

  • Developed by: [radm]
  • Model type: [Llama-3-70b]
  • Language(s) (NLP): [English]
  • License: [apache-2.0]
  • Finetuned from model [optional]: [NousResearch/Meta-Llama-3-70B-Instruct]

Uses

[More Information Needed]

Training Details

Training Data

Datasets:

  • radm/arenahard_gpt4vsllama3
  • radm/truthy-dpo-v0.1-ru
  • jondurbin/truthy-dpo-v0.1

Training Hyperparameters

  • Training regime: [bf16]
  • Load in 4 bit: [True]
  • Target modules: [all]
  • LoRA rank: [16]
  • Max seq length: [8192]
  • Use gradient checkpointing: [unsloth]
  • trainer: [ORPOTrainer]
  • Batch size: [1]
  • Gradient accumulation steps: [4]
  • Epochs: [1]

Results

[More Information Needed]

Hardware

  • Hardware Type: [Nvidia A100 80 gb]
  • Hours used: [11 hours]

Framework versions

  • PEFT 0.10.0