Mistral-Syndicate-7B
Model Description:
Mistral Syndicate is in no way a state-of-the-art model, rather it is a fine-tuning experiment to explore the training dynamics specific to large language models. The dataset used in finetuning was generated via a "syndicate" of other open language models both of similar parameter size and larger. Each model would generate a response for a given instruction, and the group would vote on which model's response was best.
The instruction inputs used for the output label synthesis were a curated subset of VMWare/open-instruct with additional instructions synthesized from scratch.
Prompt template
With context
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
### Input:
### Response:
Without context
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
### Response:
Evaluation Results
12.30.23
Benchmark | Result |
---|---|
ARC | 60.84 |
HellaSwag | 82.91 |
MMLU | 60.83 |
TruthfulQA | 43.71 |
Winogrande | 78.61 |
GSM8K | 44.50 |
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 61.90 |
AI2 Reasoning Challenge (25-Shot) | 60.84 |
HellaSwag (10-Shot) | 82.91 |
MMLU (5-Shot) | 60.83 |
TruthfulQA (0-shot) | 43.71 |
Winogrande (5-shot) | 78.61 |
GSM8k (5-shot) | 44.50 |
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 13.85 |
IFEval (0-Shot) | 24.96 |
BBH (3-Shot) | 20.51 |
MATH Lvl 5 (4-Shot) | 2.42 |
GPQA (0-shot) | 3.47 |
MuSR (0-shot) | 13.62 |
MMLU-PRO (5-shot) | 18.13 |
- Downloads last month
- 9
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.