Update README.md
Browse files
README.md
CHANGED
@@ -5,19 +5,26 @@ language:
|
|
5 |
- de
|
6 |
library_name: transformers
|
7 |
pipeline_tag: text-generation
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
---
|
9 |
|
10 |
![SauerkrautLM](images/hero.png "SauerkrautLM-7b-HerO")
|
11 |
## VAGO solutions SauerkrautLM-7b-HerO
|
12 |
-
Introducing SauerkrautLM-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
|
14 |
-
We are thrilled to unveil our **very first release**, **SauerkrautLM-v1**. This remarkable creation marks a significant milestone as it is specifically **tailored for the German-speaking community**. In a landscape where German language models are scarce, we are proud to offer a solution that fills this void.
|
15 |
-
What sets SauerkrautLM-v1 apart is its versatility. Whether you are an individual looking to harness its capabilities for personal use or a business seeking to integrate it into your projects, our model is designed to accommodate all. It operates under the Apache 2.0 License, providing you with the freedom to explore its potential in both private and commercial applications.
|
16 |
-
Performance is at the heart of SauerkrautLM-v1. We put it to the **test using a customized version of MT-Bench for the German language**, and the results speak volumes. It currently stands as the most robust German Language Model on Hugging Face (based on german mt-bench results), showcasing its exceptional capabilities. Rest assured, this model is here to shine and set new standards. And the best thing is it comes in four different sizes (3B, 7B, 13B, 70B) to address your individual needs.
|
17 |
-
Our model's journey began with meticulous training using an **augmented dataset within the QLoRA approach**. This is just the beginning of our model series, promising even more innovative and powerful solutions in the future.
|
18 |
-
|
19 |
-
Join us on this exciting adventure as we redefine the possibilities of language modeling for the German-speaking world.
|
20 |
-
SauerkrautLM-v1 is here to empower your language-related endeavors like never before.
|
21 |
|
22 |
## All HerO Models
|
23 |
|
@@ -38,7 +45,7 @@ Data augmentation techniques were used to grant grammatical, syntactical correct
|
|
38 |
|
39 |
SauerkrautLM-7b-HerO was merged on 1 A100 with [mergekit](https://github.com/cg123/mergekit).
|
40 |
The merged model contains [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) and [Open-Orca/Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca).
|
41 |
-
We
|
42 |
|
43 |
|
44 |
- **Model Type:** SauerkrautLM-7b-HerO is an auto-regressive language model based on the transformer architecture
|
@@ -66,7 +73,7 @@ Bitte erkläre mir, wie die Zusammenführung von Modellen durch bestehende Spitz
|
|
66 |
score
|
67 |
model turn
|
68 |
SauerkrautLM-70b-v1 1 7.25000
|
69 |
-
SauerkrautLM-7b-HerO
|
70 |
SauerkrautLM-7b-v1-mistral 1 6.30625
|
71 |
leo-hessianai-13b-chat 1 6.18750
|
72 |
SauerkrautLM-13b-v1 1 6.16250
|
@@ -85,7 +92,7 @@ open_llama_3b_v2 1 1.68750
|
|
85 |
score
|
86 |
model turn
|
87 |
SauerkrautLM-70b-v1 2 6.83125
|
88 |
-
SauerkrautLM-7b-HerO
|
89 |
vicuna-13b-v1.5 2 5.63125
|
90 |
SauerkrautLM-13b-v1 2 5.34375
|
91 |
SauerkrautLM-7b-v1-mistral 2 5.26250
|
@@ -104,7 +111,7 @@ Llama-2-7b 2 1.07500
|
|
104 |
score
|
105 |
model
|
106 |
SauerkrautLM-70b-v1 7.040625
|
107 |
-
SauerkrautLM-7b-HerO
|
108 |
SauerkrautLM-7b-v1-mistral 5.784375
|
109 |
SauerkrautLM-13b-v1 5.753125
|
110 |
vicuna-13b-v1.5 5.715625
|
@@ -125,7 +132,7 @@ Llama-2-7b 1.181250
|
|
125 |
score
|
126 |
model turn
|
127 |
OpenHermes-2.5-Mistral-7B 1 8.21875
|
128 |
-
SauerkrautLM-7b-HerO
|
129 |
Mistral-7B-OpenOrca 1 7.65625
|
130 |
neural-chat-7b-v3-1 1 7.22500
|
131 |
|
@@ -133,7 +140,7 @@ neural-chat-7b-v3-1 1 7.22500
|
|
133 |
score
|
134 |
model turn
|
135 |
OpenHermes-2.5-Mistral-7B 2 7.1000
|
136 |
-
SauerkrautLM-7b-HerO
|
137 |
neural-chat-7b-v3-1 2 6.4000
|
138 |
Mistral-7B-OpenOrca 2 6.1750
|
139 |
|
@@ -141,7 +148,7 @@ Mistral-7B-OpenOrca 2 6.1750
|
|
141 |
score
|
142 |
model
|
143 |
OpenHermes-2.5-Mistral-7B 7.659375
|
144 |
-
SauerkrautLM-7b-HerO
|
145 |
Mistral-7B-OpenOrca 6.915625
|
146 |
neural-chat-7b-v3-1 6.812500
|
147 |
```
|
@@ -175,7 +182,4 @@ If you are interested in customized LLMs for business applications, please get i
|
|
175 |
We are also keenly seeking support and investment for our startup, VAGO solutions, where we continuously advance the development of robust language models designed to address a diverse range of purposes and requirements. If the prospect of collaboratively navigating future challenges excites you, we warmly invite you to reach out to us.
|
176 |
|
177 |
## Acknowledgement
|
178 |
-
Many thanks to [OpenOrca](https://huggingface.co/Open-Orca) and [teknium](https://huggingface.co/teknium) for providing such valuable models to the Open-Source community.
|
179 |
-
|
180 |
-
|
181 |
-
|
|
|
5 |
- de
|
6 |
library_name: transformers
|
7 |
pipeline_tag: text-generation
|
8 |
+
tags:
|
9 |
+
- mistral
|
10 |
+
- finetune
|
11 |
+
- chatml
|
12 |
+
- augmentation
|
13 |
+
- german
|
14 |
---
|
15 |
|
16 |
![SauerkrautLM](images/hero.png "SauerkrautLM-7b-HerO")
|
17 |
## VAGO solutions SauerkrautLM-7b-HerO
|
18 |
+
Introducing **SauerkrautLM-7b-HerO** – the pinnacle of German language model technology!
|
19 |
+
Crafted through the **merging** of **[Teknium's OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B)** and **[Open-Orca's Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca)**, this model is **uniquely fine-tuned with the Sauerkraut dataset.**
|
20 |
+
SauerkrautLM-7b-HerO represents a breakthrough in language modeling, achieving an optimal balance between extensive German data and essential international sources.
|
21 |
+
This ensures the model not only excels in understanding the nuances of the German language but also retains its global capabilities.
|
22 |
+
Harnessing the innovative power of the **gradient SLERP method from MergeKit**, we've achieved a groundbreaking fusion of two of the most best performing 7B models based on the Mistral framework.
|
23 |
+
This merge has allowed us to combine the best features of both models, creating an unparalleled synergy.
|
24 |
+
Coupled with the German Sauerkraut dataset, which consists of a mix of augmented and translated data, we have successfully taught the English-speaking merged model the intricacies of the German language.
|
25 |
+
This was achieved *without the typical loss of core competencies often associated with fine-tuning in another language of models previously trained mainly in English.*
|
26 |
+
Our approach ensures that the model retains its original strengths while acquiring a profound understanding of German, **setting a new benchmark in bilingual language model proficiency.**
|
27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
|
29 |
## All HerO Models
|
30 |
|
|
|
45 |
|
46 |
SauerkrautLM-7b-HerO was merged on 1 A100 with [mergekit](https://github.com/cg123/mergekit).
|
47 |
The merged model contains [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) and [Open-Orca/Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca).
|
48 |
+
We applied the gradient SLURP method.
|
49 |
|
50 |
|
51 |
- **Model Type:** SauerkrautLM-7b-HerO is an auto-regressive language model based on the transformer architecture
|
|
|
73 |
score
|
74 |
model turn
|
75 |
SauerkrautLM-70b-v1 1 7.25000
|
76 |
+
SauerkrautLM-7b-HerO <--- 1 6.96875
|
77 |
SauerkrautLM-7b-v1-mistral 1 6.30625
|
78 |
leo-hessianai-13b-chat 1 6.18750
|
79 |
SauerkrautLM-13b-v1 1 6.16250
|
|
|
92 |
score
|
93 |
model turn
|
94 |
SauerkrautLM-70b-v1 2 6.83125
|
95 |
+
SauerkrautLM-7b-HerO <--- 2 6.30625
|
96 |
vicuna-13b-v1.5 2 5.63125
|
97 |
SauerkrautLM-13b-v1 2 5.34375
|
98 |
SauerkrautLM-7b-v1-mistral 2 5.26250
|
|
|
111 |
score
|
112 |
model
|
113 |
SauerkrautLM-70b-v1 7.040625
|
114 |
+
SauerkrautLM-7b-HerO <--- 6.637500
|
115 |
SauerkrautLM-7b-v1-mistral 5.784375
|
116 |
SauerkrautLM-13b-v1 5.753125
|
117 |
vicuna-13b-v1.5 5.715625
|
|
|
132 |
score
|
133 |
model turn
|
134 |
OpenHermes-2.5-Mistral-7B 1 8.21875
|
135 |
+
SauerkrautLM-7b-HerO <--- 1 8.03125
|
136 |
Mistral-7B-OpenOrca 1 7.65625
|
137 |
neural-chat-7b-v3-1 1 7.22500
|
138 |
|
|
|
140 |
score
|
141 |
model turn
|
142 |
OpenHermes-2.5-Mistral-7B 2 7.1000
|
143 |
+
SauerkrautLM-7b-HerO <--- 2 6.7875
|
144 |
neural-chat-7b-v3-1 2 6.4000
|
145 |
Mistral-7B-OpenOrca 2 6.1750
|
146 |
|
|
|
148 |
score
|
149 |
model
|
150 |
OpenHermes-2.5-Mistral-7B 7.659375
|
151 |
+
SauerkrautLM-7b-HerO <--- 7.409375
|
152 |
Mistral-7B-OpenOrca 6.915625
|
153 |
neural-chat-7b-v3-1 6.812500
|
154 |
```
|
|
|
182 |
We are also keenly seeking support and investment for our startup, VAGO solutions, where we continuously advance the development of robust language models designed to address a diverse range of purposes and requirements. If the prospect of collaboratively navigating future challenges excites you, we warmly invite you to reach out to us.
|
183 |
|
184 |
## Acknowledgement
|
185 |
+
Many thanks to [OpenOrca](https://huggingface.co/Open-Orca) and [teknium](https://huggingface.co/teknium) for providing such valuable models to the Open-Source community.
|
|
|
|
|
|