Testing Might be broken
Collection
testing only models,
•
10 items
•
Updated
•
1
Another trial of merging models with different sizes, still under testing, it's unstable at the moment.
Recipe:
merge_method: task_anysize
base_model: princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT
models:
- model: SanjiWatsuki/Silicon-Maid-7B
parameters:
weight: 0.1
dtype: float16
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 35.82 |
AI2 Reasoning Challenge (25-Shot) | 36.18 |
HellaSwag (10-Shot) | 51.12 |
MMLU (5-Shot) | 25.56 |
TruthfulQA (0-shot) | 44.85 |
Winogrande (5-shot) | 57.22 |
GSM8k (5-shot) | 0.00 |