chargoddard's picture
Adding Evaluation Results (#1)
e7eb3e5
metadata
datasets:
  - chargoddard/Open-Platypus-Chat
language:
  - en
tags:
  - llama

Experimental ReLoRA-trained model using the OpenPlatypus dataset. Ran for one epoch, with three lora restarts.

Not recommended for use yet. Mostly tossing this up for testing.

Base model was llama2-22b-blocktriangular.

Relevant training parameters:

adapter: qlora
load_in_4bit: true
lora_r: 32
lora_alpha: 16
lora_dropout: 0.001
lora_target_linear: true
relora_steps: 150
relora_warmup_steps: 10
gradient_accumulation_steps: 2
micro_batch_size: 3

Uses the same prompt format as Ypotryll-22b. Prefix messages with " ***System:", " ***Query:", or " ***Response:", paying attention to whitespace.

Built with Axolotl

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 52.21
ARC (25-shot) 57.68
HellaSwag (10-shot) 82.44
MMLU (5-shot) 55.33
TruthfulQA (0-shot) 43.61
Winogrande (5-shot) 77.35
GSM8K (5-shot) 6.6
DROP (3-shot) 42.46