Update README.md

5f651ca verified 2 months ago

6.99 kB

	---
	base_model:
	- cstr/llama3.1-8b-spaetzle-v85
	- cstr/llama3.1-8b-spaetzle-v86
	- cstr/llama3.1-8b-spaetzle-v74
	tags:
	- merge
	- mergekit
	- lazymergekit
	- cstr/llama3.1-8b-spaetzle-v85
	- cstr/llama3.1-8b-spaetzle-v86
	- cstr/llama3.1-8b-spaetzle-v74
	license: llama3
	language:
	- en
	- de
	---

	# llama3.1-8b-spaetzle-v90

	llama3.1-8b-spaetzle-v90 is a progressive merge of merges.

	# evaluation

	German EQ-Bench v2_de: 69.93 (171/171). English (v2): 77.88 (171/171)

	[Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_cstr__llama3.1-8b-spaetzle-v90)

	\| Metric \|Value\|
	\|-------------------\|----:\|
	\|Avg. \|27.59\|
	\|IFEval (0-Shot) \|73.56\|
	\|BBH (3-Shot) \|32.76\|
	\|MATH Lvl 5 (4-Shot)\|13.37\|
	\|GPQA (0-shot) \| 4.36\|
	\|MuSR (0-shot) \|11.15\|
	\|MMLU-PRO (5-shot) \|30.34\|

	\| Model \|AGIEval\|TruthfulQA\|Bigbench\|
	\|--------------------------------------------------------------------------------\|------:\|---------:\|-------:\|
	\|[llama3.1-8b-spaetzle-v90](https://huggingface.co/cstr/llama3.1-8b-spaetzle-v90)\| 42.05\| 57.2\| 44.75\|

	### AGIEval
	\| Task \|Version\| Metric \|Value\| \|Stderr\|
	\|------------------------------\|------:\|--------\|----:\|---\|-----:\|
	\|agieval_aqua_rat \| 0\|acc \|24.02\|± \| 2.69\|
	\| \| \|acc_norm\|23.62\|± \| 2.67\|
	\|agieval_logiqa_en \| 0\|acc \|40.09\|± \| 1.92\|
	\| \| \|acc_norm\|39.78\|± \| 1.92\|
	\|agieval_lsat_ar \| 0\|acc \|22.17\|± \| 2.75\|
	\| \| \|acc_norm\|21.74\|± \| 2.73\|
	\|agieval_lsat_lr \| 0\|acc \|50.39\|± \| 2.22\|
	\| \| \|acc_norm\|45.29\|± \| 2.21\|
	\|agieval_lsat_rc \| 0\|acc \|64.31\|± \| 2.93\|
	\| \| \|acc_norm\|58.36\|± \| 3.01\|
	\|agieval_sat_en \| 0\|acc \|81.07\|± \| 2.74\|
	\| \| \|acc_norm\|73.79\|± \| 3.07\|
	\|agieval_sat_en_without_passage\| 0\|acc \|45.15\|± \| 3.48\|
	\| \| \|acc_norm\|38.83\|± \| 3.40\|
	\|agieval_sat_math \| 0\|acc \|40.91\|± \| 3.32\|
	\| \| \|acc_norm\|35.00\|± \| 3.22\|

	Average: 42.05%

	### TruthfulQA
	\| Task \|Version\|Metric\|Value\| \|Stderr\|
	\|-------------\|------:\|------\|----:\|---\|-----:\|
	\|truthfulqa_mc\| 1\|mc1 \|39.66\|± \| 1.71\|
	\| \| \|mc2 \|57.20\|± \| 1.51\|

	Average: 57.2%

	### Bigbench
	\| Task \|Version\| Metric \|Value\| \|Stderr\|
	\|------------------------------------------------\|------:\|---------------------\|----:\|---\|-----:\|
	\|bigbench_causal_judgement \| 0\|multiple_choice_grade\|58.42\|± \| 3.59\|
	\|bigbench_date_understanding \| 0\|multiple_choice_grade\|70.46\|± \| 2.38\|
	\|bigbench_disambiguation_qa \| 0\|multiple_choice_grade\|31.40\|± \| 2.89\|
	\|bigbench_geometric_shapes \| 0\|multiple_choice_grade\|33.43\|± \| 2.49\|
	\| \| \|exact_str_match \| 0.00\|± \| 0.00\|
	\|bigbench_logical_deduction_five_objects \| 0\|multiple_choice_grade\|30.00\|± \| 2.05\|
	\|bigbench_logical_deduction_seven_objects \| 0\|multiple_choice_grade\|24.29\|± \| 1.62\|
	\|bigbench_logical_deduction_three_objects \| 0\|multiple_choice_grade\|56.00\|± \| 2.87\|
	\|bigbench_movie_recommendation \| 0\|multiple_choice_grade\|38.20\|± \| 2.18\|
	\|bigbench_navigate \| 0\|multiple_choice_grade\|50.20\|± \| 1.58\|
	\|bigbench_reasoning_about_colored_objects \| 0\|multiple_choice_grade\|69.50\|± \| 1.03\|
	\|bigbench_ruin_names \| 0\|multiple_choice_grade\|54.46\|± \| 2.36\|
	\|bigbench_salient_translation_error_detection \| 0\|multiple_choice_grade\|32.77\|± \| 1.49\|
	\|bigbench_snarks \| 0\|multiple_choice_grade\|65.19\|± \| 3.55\|
	\|bigbench_sports_understanding \| 0\|multiple_choice_grade\|50.30\|± \| 1.59\|
	\|bigbench_temporal_sequences \| 0\|multiple_choice_grade\|45.70\|± \| 1.58\|
	\|bigbench_tracking_shuffled_objects_five_objects \| 0\|multiple_choice_grade\|22.08\|± \| 1.17\|
	\|bigbench_tracking_shuffled_objects_seven_objects\| 0\|multiple_choice_grade\|17.03\|± \| 0.90\|
	\|bigbench_tracking_shuffled_objects_three_objects\| 0\|multiple_choice_grade\|56.00\|± \| 2.87\|

	Average: 44.75%

	# merge tree

	The merge tree involves the following models:

	- NousResearch/Hermes-3-Llama-3.1-8B
	- Undi95/Meta-Llama-3.1-8B-Claude
	- Dampfinchen/Llama-3.1-8B-Ultra-Instruct
	- VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct
	- akjindal53244/Llama-3.1-Storm-8B
	- nbeerbower/llama3.1-gutenberg-8B
	- Undi95/Meta-Llama-3.1-8B-Claude
	- DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1
	- nbeerbower/llama-3-wissenschaft-8B-v2
	- Azure99/blossom-v5-llama3-8b
	- VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct
	- princeton-nlp/Llama-3-Instruct-8B-SimPO
	- Locutusque/llama-3-neural-chat-v1-8b
	- Locutusque/Llama-3-Orca-1.0-8B
	- DiscoResearch/Llama3_DiscoLM_German_8b_v0.1_experimental
	- seedboxai/Llama-3-Kafka-8B-v0.2
	- VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct
	- nbeerbower/llama-3-wissenschaft-8B-v2
	- mlabonne/Daredevil-8B-abliterated-dpomix

	There have been a number of steps involved, among which, slep merging of only middle layers compensating for tokenizer / chat template differences. An illustration below.

	## 🧩 Configuration

	The final merge for this was:

	```yaml
	models:
	- model: cstr/llama3.1-8b-spaetzle-v59
	# no parameters necessary for base model
	- model: cstr/llama3.1-8b-spaetzle-v85
	parameters:
	density: 0.65
	weight: 0.3
	- model: cstr/llama3.1-8b-spaetzle-v86
	parameters:
	density: 0.65
	weight: 0.3
	- model: cstr/llama3.1-8b-spaetzle-v74
	parameters:
	density: 0.65
	weight: 0.3
	merge_method: dare_ties
	base_model: cstr/llama3.1-8b-spaetzle-v59
	parameters:
	int8_mask: true
	dtype: bfloat16
	random_seed: 0
	tokenizer_source: base
	```

	Among the previous steps:
	```yaml
	models:
	- model: NousResearch/Hermes-3-Llama-3.1-8B
	merge_method: slerp
	base_model: cstr/llama3.1-8b-spaetzle-v74
	parameters:
	t:
	- value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0, 0]
	dtype: float16
	```

	## 💻 Usage

	Use with llama3 chat template as common. Here are GGUF quants for use with llama.cpp & wrappers as e.g. ollama: [cstr/llama3.1-8b-spaetzle-v90-GGUF](https://huggingface.co/cstr/llama3.1-8b-spaetzle-v90-GGUF)