Update README.md
Browse files
README.md
CHANGED
@@ -32,14 +32,14 @@ Our approach ensures that the model retains its original strengths while acquiri
|
|
32 |
- [Training Dataset](#training-dataset)
|
33 |
- [Merge Procedure](#merge-procedure)
|
34 |
3. [Evaluation](#evaluation)
|
35 |
-
- [
|
36 |
-
- [MT-Bench (English)](#mt-bench-english)
|
37 |
- [Language Model evaluation Harness](#language-model-evaluation-harness)
|
38 |
- [BigBench](#BBH)
|
39 |
-
- [
|
|
|
40 |
- [Additional German Benchmark results](#additional-german-benchmark-results)
|
41 |
-
|
42 |
-
|
43 |
7. [Collaborations](#collaborations)
|
44 |
8. [Acknowledgement](#acknowledgement)
|
45 |
|
@@ -174,7 +174,11 @@ SauerkrautLM-7b-HerO <--- 7.409375
|
|
174 |
Mistral-7B-OpenOrca 6.915625
|
175 |
neural-chat-7b-v3-1 6.812500
|
176 |
```
|
|
|
|
|
|
|
177 |
|
|
|
178 |
|
179 |
### Language Model evaluation Harness:
|
180 |
Compared to Aleph Alpha Luminous Models
|
@@ -184,11 +188,95 @@ Compared to Aleph Alpha Luminous Models
|
|
184 |
### BBH:
|
185 |
![BBH](https://vago-solutions.de/wp-content/uploads/2023/11/bbh.png "SauerkrautLM-7b-HerO BBH")
|
186 |
*performed with newest Language Model Evaluation Harness
|
187 |
-
###
|
188 |
-
|
189 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
190 |
|
191 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
192 |
### Additional German Benchmark results:
|
193 |
![GermanBenchmarks](https://vago-solutions.de/wp-content/uploads/2023/11/German-benchmarks.png "SauerkrautLM-7b-HerO German Benchmarks")
|
194 |
*performed with newest Language Model Evaluation Harness
|
|
|
32 |
- [Training Dataset](#training-dataset)
|
33 |
- [Merge Procedure](#merge-procedure)
|
34 |
3. [Evaluation](#evaluation)
|
35 |
+
- [GPT4ALL](#gpt4all)
|
|
|
36 |
- [Language Model evaluation Harness](#language-model-evaluation-harness)
|
37 |
- [BigBench](#BBH)
|
38 |
+
- [MT-Bench (German)](#mt-bench-german)
|
39 |
+
- [MT-Bench (English)](#mt-bench-english)
|
40 |
- [Additional German Benchmark results](#additional-german-benchmark-results)
|
41 |
+
5. [Disclaimer](#disclaimer)
|
42 |
+
6. [Contact](#contact)
|
43 |
7. [Collaborations](#collaborations)
|
44 |
8. [Acknowledgement](#acknowledgement)
|
45 |
|
|
|
174 |
Mistral-7B-OpenOrca 6.915625
|
175 |
neural-chat-7b-v3-1 6.812500
|
176 |
```
|
177 |
+
### GPT4ALL:
|
178 |
+
Compared to Aleph Alpha Luminous Models, LeoLM and EM_German
|
179 |
+
![GPT4ALL diagram](https://vago-solutions.de/wp-content/uploads/2023/11/GPT4All.png "SauerkrautLM-7b-HerO GPT4ALL Diagram")
|
180 |
|
181 |
+
![GPT4ALL table](https://vago-solutions.de/wp-content/uploads/2023/11/GPT4All-Tabelle.png "SauerkrautLM-7b-HerO GPT4ALL Table")
|
182 |
|
183 |
### Language Model evaluation Harness:
|
184 |
Compared to Aleph Alpha Luminous Models
|
|
|
188 |
### BBH:
|
189 |
![BBH](https://vago-solutions.de/wp-content/uploads/2023/11/bbh.png "SauerkrautLM-7b-HerO BBH")
|
190 |
*performed with newest Language Model Evaluation Harness
|
191 |
+
### MT-Bench (German):
|
192 |
+
![MT-Bench German Diagram](https://vago-solutions.de/wp-content/uploads/2023/11/MT-Bench-German.png "SauerkrautLM-7b-HerO MT-Bench German Diagram")
|
193 |
+
```
|
194 |
+
########## First turn ##########
|
195 |
+
score
|
196 |
+
model turn
|
197 |
+
SauerkrautLM-70b-v1 1 7.25000
|
198 |
+
SauerkrautLM-7b-HerO <--- 1 6.96875
|
199 |
+
SauerkrautLM-7b-v1-mistral 1 6.30625
|
200 |
+
leo-hessianai-13b-chat 1 6.18750
|
201 |
+
SauerkrautLM-13b-v1 1 6.16250
|
202 |
+
leo-mistral-hessianai-7b-chat 1 6.15625
|
203 |
+
Llama-2-70b-chat-hf 1 6.03750
|
204 |
+
vicuna-13b-v1.5 1 5.80000
|
205 |
+
SauerkrautLM-7b-v1 1 5.65000
|
206 |
+
leo-hessianai-7b-chat 1 5.52500
|
207 |
+
vicuna-7b-v1.5 1 5.42500
|
208 |
+
Mistral-7B-v0.1 1 5.37500
|
209 |
+
SauerkrautLM-3b-v1 1 3.17500
|
210 |
+
Llama-2-7b 1 1.28750
|
211 |
+
open_llama_3b_v2 1 1.68750
|
212 |
|
213 |
+
########## Second turn ##########
|
214 |
+
score
|
215 |
+
model turn
|
216 |
+
SauerkrautLM-70b-v1 2 6.83125
|
217 |
+
SauerkrautLM-7b-HerO <--- 2 6.30625
|
218 |
+
vicuna-13b-v1.5 2 5.63125
|
219 |
+
SauerkrautLM-13b-v1 2 5.34375
|
220 |
+
SauerkrautLM-7b-v1-mistral 2 5.26250
|
221 |
+
leo-mistral-hessianai-7b-chat 2 4.99375
|
222 |
+
SauerkrautLM-7b-v1 2 4.73750
|
223 |
+
leo-hessianai-13b-chat 2 4.71250
|
224 |
+
vicuna-7b-v1.5 2 4.67500
|
225 |
+
Llama-2-70b-chat-hf 2 4.66250
|
226 |
+
Mistral-7B-v0.1 2 4.53750
|
227 |
+
leo-hessianai-7b-chat 2 2.65000
|
228 |
+
SauerkrautLM-3b-v1 2 1.98750
|
229 |
+
open_llama_3b_v2 2 1.22500
|
230 |
+
Llama-2-7b 2 1.07500
|
231 |
+
|
232 |
+
########## Average ##########
|
233 |
+
score
|
234 |
+
model
|
235 |
+
SauerkrautLM-70b-v1 7.040625
|
236 |
+
SauerkrautLM-7b-HerO <--- 6.637500
|
237 |
+
SauerkrautLM-7b-v1-mistral 5.784375
|
238 |
+
SauerkrautLM-13b-v1 5.753125
|
239 |
+
vicuna-13b-v1.5 5.715625
|
240 |
+
leo-mistral-hessianai-7b-chat 5.575000
|
241 |
+
leo-hessianai-13b-chat 5.450000
|
242 |
+
Llama-2-70b-chat-hf 5.350000
|
243 |
+
SauerkrautLM-v1-7b 5.193750
|
244 |
+
vicuna-7b-v1.5 5.050000
|
245 |
+
Mistral-7B-v0.1 4.956250
|
246 |
+
leo-hessianai-7b-chat 4.087500
|
247 |
+
SauerkrautLM-3b-v1 2.581250
|
248 |
+
open_llama_3b_v2 1.456250
|
249 |
+
Llama-2-7b 1.181250
|
250 |
+
```
|
251 |
+
|
252 |
+
|
253 |
+
### MT-Bench (English):
|
254 |
+
![MT-Bench English Diagram](https://vago-solutions.de/wp-content/uploads/2023/11/MT-Bench-Englisch.png "SauerkrautLM-7b-HerO MT-Bench English Diagram")
|
255 |
+
```
|
256 |
+
########## First turn ##########
|
257 |
+
score
|
258 |
+
model turn
|
259 |
+
OpenHermes-2.5-Mistral-7B 1 8.21875
|
260 |
+
SauerkrautLM-7b-HerO <--- 1 8.03125
|
261 |
+
Mistral-7B-OpenOrca 1 7.65625
|
262 |
+
neural-chat-7b-v3-1 1 7.22500
|
263 |
+
|
264 |
+
########## Second turn ##########
|
265 |
+
score
|
266 |
+
model turn
|
267 |
+
OpenHermes-2.5-Mistral-7B 2 7.1000
|
268 |
+
SauerkrautLM-7b-HerO <--- 2 6.7875
|
269 |
+
neural-chat-7b-v3-1 2 6.4000
|
270 |
+
Mistral-7B-OpenOrca 2 6.1750
|
271 |
+
|
272 |
+
########## Average ##########
|
273 |
+
score
|
274 |
+
model
|
275 |
+
OpenHermes-2.5-Mistral-7B 7.659375
|
276 |
+
SauerkrautLM-7b-HerO <--- 7.409375
|
277 |
+
Mistral-7B-OpenOrca 6.915625
|
278 |
+
neural-chat-7b-v3-1 6.812500
|
279 |
+
```
|
280 |
### Additional German Benchmark results:
|
281 |
![GermanBenchmarks](https://vago-solutions.de/wp-content/uploads/2023/11/German-benchmarks.png "SauerkrautLM-7b-HerO German Benchmarks")
|
282 |
*performed with newest Language Model Evaluation Harness
|