spmurrayzzz's picture
Adding Evaluation Results (#3)
7fa40e3 verified
|
raw
history blame
2.8 kB
metadata
language:
  - en
license: apache-2.0
tags:
  - mistral-7b
  - instruct
  - finetune
  - synthetic data
  - distillation
base_model: mistralai/Mistral-7B-v0.1
model-index:
  - name: Mistral-Syndicate-7B
    results: []

Mistral-Syndicate-7B

Model Description:

Mistral Syndicate is in no way a state-of-the-art model, rather it is a fine-tuning experiment to explore the training dynamics specific to large language models. The dataset used in finetuning was generated via a "syndicate" of other open language models both of similar parameter size and larger. Each model would generate a response for a given instruction, and the group would vote on which model's response was best.

The instruction inputs used for the output label synthesis were a curated subset of VMWare/open-instruct with additional instructions synthesized from scratch.

Prompt template

With context

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:

### Input:

### Response:

Without context

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:

### Response:

Evaluation Results

12.30.23

Benchmark Result
ARC 60.84
HellaSwag 82.91
MMLU 60.83
TruthfulQA 43.71
Winogrande 78.61
GSM8K 44.50

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 61.90
AI2 Reasoning Challenge (25-Shot) 60.84
HellaSwag (10-Shot) 82.91
MMLU (5-Shot) 60.83
TruthfulQA (0-shot) 43.71
Winogrande (5-shot) 78.61
GSM8k (5-shot) 44.50

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 13.85
IFEval (0-Shot) 24.96
BBH (3-Shot) 20.51
MATH Lvl 5 (4-Shot) 2.42
GPQA (0-shot) 3.47
MuSR (0-shot) 13.62
MMLU-PRO (5-shot) 18.13