cstr's picture
Update README.md
5f651ca verified
---
base_model:
- cstr/llama3.1-8b-spaetzle-v85
- cstr/llama3.1-8b-spaetzle-v86
- cstr/llama3.1-8b-spaetzle-v74
tags:
- merge
- mergekit
- lazymergekit
- cstr/llama3.1-8b-spaetzle-v85
- cstr/llama3.1-8b-spaetzle-v86
- cstr/llama3.1-8b-spaetzle-v74
license: llama3
language:
- en
- de
---
# llama3.1-8b-spaetzle-v90
llama3.1-8b-spaetzle-v90 is a progressive merge of merges.
# evaluation
German EQ-Bench v2_de: 69.93 (171/171). English (v2): 77.88 (171/171)
[Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_cstr__llama3.1-8b-spaetzle-v90)
| Metric |Value|
|-------------------|----:|
|Avg. |27.59|
|IFEval (0-Shot) |73.56|
|BBH (3-Shot) |32.76|
|MATH Lvl 5 (4-Shot)|13.37|
|GPQA (0-shot) | 4.36|
|MuSR (0-shot) |11.15|
|MMLU-PRO (5-shot) |30.34|
| Model |AGIEval|TruthfulQA|Bigbench|
|--------------------------------------------------------------------------------|------:|---------:|-------:|
|[llama3.1-8b-spaetzle-v90](https://huggingface.co/cstr/llama3.1-8b-spaetzle-v90)| 42.05| 57.2| 44.75|
### AGIEval
| Task |Version| Metric |Value| |Stderr|
|------------------------------|------:|--------|----:|---|-----:|
|agieval_aqua_rat | 0|acc |24.02|± | 2.69|
| | |acc_norm|23.62|± | 2.67|
|agieval_logiqa_en | 0|acc |40.09|± | 1.92|
| | |acc_norm|39.78|± | 1.92|
|agieval_lsat_ar | 0|acc |22.17|± | 2.75|
| | |acc_norm|21.74|± | 2.73|
|agieval_lsat_lr | 0|acc |50.39|± | 2.22|
| | |acc_norm|45.29|± | 2.21|
|agieval_lsat_rc | 0|acc |64.31|± | 2.93|
| | |acc_norm|58.36|± | 3.01|
|agieval_sat_en | 0|acc |81.07|± | 2.74|
| | |acc_norm|73.79|± | 3.07|
|agieval_sat_en_without_passage| 0|acc |45.15|± | 3.48|
| | |acc_norm|38.83|± | 3.40|
|agieval_sat_math | 0|acc |40.91|± | 3.32|
| | |acc_norm|35.00|± | 3.22|
Average: 42.05%
### TruthfulQA
| Task |Version|Metric|Value| |Stderr|
|-------------|------:|------|----:|---|-----:|
|truthfulqa_mc| 1|mc1 |39.66|± | 1.71|
| | |mc2 |57.20|± | 1.51|
Average: 57.2%
### Bigbench
| Task |Version| Metric |Value| |Stderr|
|------------------------------------------------|------:|---------------------|----:|---|-----:|
|bigbench_causal_judgement | 0|multiple_choice_grade|58.42|± | 3.59|
|bigbench_date_understanding | 0|multiple_choice_grade|70.46|± | 2.38|
|bigbench_disambiguation_qa | 0|multiple_choice_grade|31.40|± | 2.89|
|bigbench_geometric_shapes | 0|multiple_choice_grade|33.43|± | 2.49|
| | |exact_str_match | 0.00|± | 0.00|
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|30.00|± | 2.05|
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|24.29|± | 1.62|
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|56.00|± | 2.87|
|bigbench_movie_recommendation | 0|multiple_choice_grade|38.20|± | 2.18|
|bigbench_navigate | 0|multiple_choice_grade|50.20|± | 1.58|
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|69.50|± | 1.03|
|bigbench_ruin_names | 0|multiple_choice_grade|54.46|± | 2.36|
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|32.77|± | 1.49|
|bigbench_snarks | 0|multiple_choice_grade|65.19|± | 3.55|
|bigbench_sports_understanding | 0|multiple_choice_grade|50.30|± | 1.59|
|bigbench_temporal_sequences | 0|multiple_choice_grade|45.70|± | 1.58|
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|22.08|± | 1.17|
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|17.03|± | 0.90|
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|56.00|± | 2.87|
Average: 44.75%
# merge tree
The merge tree involves the following models:
- NousResearch/Hermes-3-Llama-3.1-8B
- Undi95/Meta-Llama-3.1-8B-Claude
- Dampfinchen/Llama-3.1-8B-Ultra-Instruct
- VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct
- akjindal53244/Llama-3.1-Storm-8B
- nbeerbower/llama3.1-gutenberg-8B
- Undi95/Meta-Llama-3.1-8B-Claude
- DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1
- nbeerbower/llama-3-wissenschaft-8B-v2
- Azure99/blossom-v5-llama3-8b
- VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct
- princeton-nlp/Llama-3-Instruct-8B-SimPO
- Locutusque/llama-3-neural-chat-v1-8b
- Locutusque/Llama-3-Orca-1.0-8B
- DiscoResearch/Llama3_DiscoLM_German_8b_v0.1_experimental
- seedboxai/Llama-3-Kafka-8B-v0.2
- VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct
- nbeerbower/llama-3-wissenschaft-8B-v2
- mlabonne/Daredevil-8B-abliterated-dpomix
There have been a number of steps involved, among which, slep merging of only middle layers compensating for tokenizer / chat template differences. An illustration below.
## 🧩 Configuration
The final merge for this was:
```yaml
models:
- model: cstr/llama3.1-8b-spaetzle-v59
# no parameters necessary for base model
- model: cstr/llama3.1-8b-spaetzle-v85
parameters:
density: 0.65
weight: 0.3
- model: cstr/llama3.1-8b-spaetzle-v86
parameters:
density: 0.65
weight: 0.3
- model: cstr/llama3.1-8b-spaetzle-v74
parameters:
density: 0.65
weight: 0.3
merge_method: dare_ties
base_model: cstr/llama3.1-8b-spaetzle-v59
parameters:
int8_mask: true
dtype: bfloat16
random_seed: 0
tokenizer_source: base
```
Among the previous steps:
```yaml
models:
- model: NousResearch/Hermes-3-Llama-3.1-8B
merge_method: slerp
base_model: cstr/llama3.1-8b-spaetzle-v74
parameters:
t:
- value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0, 0]
dtype: float16
```
## 💻 Usage
Use with llama3 chat template as common. Here are GGUF quants for use with llama.cpp & wrappers as e.g. ollama: [cstr/llama3.1-8b-spaetzle-v90-GGUF](https://huggingface.co/cstr/llama3.1-8b-spaetzle-v90-GGUF)