Spaetzle
Collection
German-English models, mostly merged, some sft/dpo
β’
117 items
β’
Updated
These are q4_k_m quants made with llama.cpp b3472 from cstr/llama3.1-8b-spaetzle-v90 which is a progressive merge of merges.
EQ-Bench v2_de: 69.93 (171/171).
The merge tree involves the following models:
There have been a number of steps involved, among which, slep merging of only middle layers compensating for tokenizer / chat template differences. An illustration below.
The final merge for this was:
models:
- model: cstr/llama3.1-8b-spaetzle-v59
# no parameters necessary for base model
- model: cstr/llama3.1-8b-spaetzle-v85
parameters:
density: 0.65
weight: 0.3
- model: cstr/llama3.1-8b-spaetzle-v86
parameters:
density: 0.65
weight: 0.3
- model: cstr/llama3.1-8b-spaetzle-v74
parameters:
density: 0.65
weight: 0.3
merge_method: dare_ties
base_model: cstr/llama3.1-8b-spaetzle-v59
parameters:
int8_mask: true
dtype: bfloat16
random_seed: 0
tokenizer_source: base
Among the previous steps:
models:
- model: NousResearch/Hermes-3-Llama-3.1-8B
merge_method: slerp
base_model: cstr/llama3.1-8b-spaetzle-v74
parameters:
t:
- value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0, 0]
dtype: float16
Use with llama3 chat template as common. The q4km quants here are from cstr/llama3.1-8b-spaetzle-v90.