Update README.md

Browse files

Files changed (1) hide show

README.md +23 -19

README.md CHANGED Viewed

@@ -5,19 +5,26 @@ language:
 - de
 library_name: transformers
 pipeline_tag: text-generation
 ---
 ![SauerkrautLM](images/hero.png "SauerkrautLM-7b-HerO")
 ## VAGO solutions SauerkrautLM-7b-HerO
-Introducing SauerkrautLM-v1 - Your German Language Powerhouse!
-We are thrilled to unveil our **very first release**, **SauerkrautLM-v1**. This remarkable creation marks a significant milestone as it is specifically **tailored for the German-speaking community**. In a landscape where German language models are scarce, we are proud to offer a solution that fills this void.
-What sets SauerkrautLM-v1 apart is its versatility. Whether you are an individual looking to harness its capabilities for personal use or a business seeking to integrate it into your projects, our model is designed to accommodate all. It operates under the Apache 2.0 License, providing you with the freedom to explore its potential in both private and commercial applications.
-Performance is at the heart of SauerkrautLM-v1. We put it to the **test using a customized version of MT-Bench for the German language**, and the results speak volumes. It currently stands as the most robust German Language Model on Hugging Face (based on german mt-bench results), showcasing its exceptional capabilities. Rest assured, this model is here to shine and set new standards. And the best thing is it comes in four different sizes (3B, 7B, 13B, 70B) to address your individual needs.
-Our model's journey began with meticulous training using an **augmented dataset within the QLoRA approach**. This is just the beginning of our model series, promising even more innovative and powerful solutions in the future.
-Join us on this exciting adventure as we redefine the possibilities of language modeling for the German-speaking world.
-SauerkrautLM-v1 is here to empower your language-related endeavors like never before.
 ## All HerO Models
@@ -38,7 +45,7 @@ Data augmentation techniques were used to grant grammatical, syntactical correct
 SauerkrautLM-7b-HerO was merged on 1 A100 with [mergekit](https://github.com/cg123/mergekit).
 The merged model contains [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) and [Open-Orca/Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca).
-We used the gradient SLURP method.
 - **Model Type:** SauerkrautLM-7b-HerO is an auto-regressive language model based on the transformer architecture
@@ -66,7 +73,7 @@ Bitte erkläre mir, wie die Zusammenführung von Modellen durch bestehende Spitz
                                                            score
 model                                              turn
 SauerkrautLM-70b-v1                                1     7.25000
-SauerkrautLM-7b-HerO                               1     6.96875
 SauerkrautLM-7b-v1-mistral                         1     6.30625
 leo-hessianai-13b-chat                             1     6.18750
 SauerkrautLM-13b-v1                                1     6.16250
@@ -85,7 +92,7 @@ open_llama_3b_v2                                   1     1.68750
                                                            score
 model                                              turn
 SauerkrautLM-70b-v1                                2     6.83125
-SauerkrautLM-7b-HerO                               2     6.30625
 vicuna-13b-v1.5                                    2     5.63125
 SauerkrautLM-13b-v1                                2     5.34375
 SauerkrautLM-7b-v1-mistral                         2     5.26250
@@ -104,7 +111,7 @@ Llama-2-7b                                         2     1.07500
                                                        score
 model
 SauerkrautLM-70b-v1                                 7.040625
-SauerkrautLM-7b-HerO                                6.637500
 SauerkrautLM-7b-v1-mistral                          5.784375
 SauerkrautLM-13b-v1                                 5.753125
 vicuna-13b-v1.5                                     5.715625
@@ -125,7 +132,7 @@ Llama-2-7b                                          1.181250
                                                            score
 model                                              turn
 OpenHermes-2.5-Mistral-7B                          1     8.21875
-SauerkrautLM-7b-HerO                               1     8.03125
 Mistral-7B-OpenOrca                                1     7.65625
 neural-chat-7b-v3-1                                1     7.22500
@@ -133,7 +140,7 @@ neural-chat-7b-v3-1                                1     7.22500
                                                           score
 model                                              turn
 OpenHermes-2.5-Mistral-7B                          2     7.1000
-SauerkrautLM-7b-HerO                               2     6.7875
 neural-chat-7b-v3-1                                2     6.4000
 Mistral-7B-OpenOrca                                2     6.1750
@@ -141,7 +148,7 @@ Mistral-7B-OpenOrca                                2     6.1750
                                                        score
 model
 OpenHermes-2.5-Mistral-7B                           7.659375
-SauerkrautLM-7b-HerO                                7.409375
 Mistral-7B-OpenOrca                                 6.915625
 neural-chat-7b-v3-1                                 6.812500
 ```
@@ -175,7 +182,4 @@ If you are interested in customized LLMs for business applications, please get i
 We are also keenly seeking support and investment for our startup, VAGO solutions, where we continuously advance the development of robust language models designed to address a diverse range of purposes and requirements. If the prospect of collaboratively navigating future challenges excites you, we warmly invite you to reach out to us.
 ## Acknowledgement
-Many thanks to [OpenOrca](https://huggingface.co/Open-Orca) and [teknium](https://huggingface.co/teknium) for providing such valuable models to the Open-Source community.

 - de
 library_name: transformers
 pipeline_tag: text-generation
+tags:
+- mistral
+- finetune
+- chatml
+- augmentation
+- german
 ---
 ![SauerkrautLM](images/hero.png "SauerkrautLM-7b-HerO")
 ## VAGO solutions SauerkrautLM-7b-HerO
+Introducing **SauerkrautLM-7b-HerO** – the pinnacle of German language model technology!
+Crafted through the **merging** of **[Teknium's OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B)** and **[Open-Orca's Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca)**, this model is **uniquely fine-tuned with the Sauerkraut dataset.**
+SauerkrautLM-7b-HerO represents a breakthrough in language modeling, achieving an optimal balance between extensive German data and essential international sources.
+This ensures the model not only excels in understanding the nuances of the German language but also retains its global capabilities.
+Harnessing the innovative power of the **gradient SLERP method from MergeKit**, we've achieved a groundbreaking fusion of two of the most best performing 7B models based on the Mistral framework.
+This merge  has allowed us to combine the best features of both models, creating an unparalleled synergy.
+Coupled with the German Sauerkraut dataset, which consists of a mix of augmented and translated data, we have successfully taught the English-speaking merged model the intricacies of the German language.
+This was achieved *without the typical loss of core competencies often associated with fine-tuning in another language of models previously trained mainly in English.*
+Our approach ensures that the model retains its original strengths while acquiring a profound understanding of German, **setting a new benchmark in bilingual language model proficiency.**
 ## All HerO Models
 SauerkrautLM-7b-HerO was merged on 1 A100 with [mergekit](https://github.com/cg123/mergekit).
 The merged model contains [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) and [Open-Orca/Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca).
+We applied the gradient SLURP method.
 - **Model Type:** SauerkrautLM-7b-HerO is an auto-regressive language model based on the transformer architecture
                                                            score
 model                                              turn
 SauerkrautLM-70b-v1                                1     7.25000
+SauerkrautLM-7b-HerO  <---                         1     6.96875
 SauerkrautLM-7b-v1-mistral                         1     6.30625
 leo-hessianai-13b-chat                             1     6.18750
 SauerkrautLM-13b-v1                                1     6.16250
                                                            score
 model                                              turn
 SauerkrautLM-70b-v1                                2     6.83125
+SauerkrautLM-7b-HerO  <---                         2     6.30625
 vicuna-13b-v1.5                                    2     5.63125
 SauerkrautLM-13b-v1                                2     5.34375
 SauerkrautLM-7b-v1-mistral                         2     5.26250
                                                        score
 model
 SauerkrautLM-70b-v1                                 7.040625
+SauerkrautLM-7b-HerO   <---                         6.637500
 SauerkrautLM-7b-v1-mistral                          5.784375
 SauerkrautLM-13b-v1                                 5.753125
 vicuna-13b-v1.5                                     5.715625
                                                            score
 model                                              turn
 OpenHermes-2.5-Mistral-7B                          1     8.21875
+SauerkrautLM-7b-HerO    <---                       1     8.03125
 Mistral-7B-OpenOrca                                1     7.65625
 neural-chat-7b-v3-1                                1     7.22500
                                                           score
 model                                              turn
 OpenHermes-2.5-Mistral-7B                          2     7.1000
+SauerkrautLM-7b-HerO  <---                         2     6.7875
 neural-chat-7b-v3-1                                2     6.4000
 Mistral-7B-OpenOrca                                2     6.1750
                                                        score
 model
 OpenHermes-2.5-Mistral-7B                           7.659375
+SauerkrautLM-7b-HerO  <---                          7.409375
 Mistral-7B-OpenOrca                                 6.915625
 neural-chat-7b-v3-1                                 6.812500
 ```
 We are also keenly seeking support and investment for our startup, VAGO solutions, where we continuously advance the development of robust language models designed to address a diverse range of purposes and requirements. If the prospect of collaboratively navigating future challenges excites you, we warmly invite you to reach out to us.
 ## Acknowledgement
+Many thanks to [OpenOrca](https://huggingface.co/Open-Orca) and [teknium](https://huggingface.co/teknium) for providing such valuable models to the Open-Source community.