LHC88 commited on
Commit
328f85c
1 Parent(s): 8d50ed1

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +446 -0
README.md ADDED
@@ -0,0 +1,446 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: VAGOsolutions/SauerkrautLM-Mixtral-8x7B
3
+ datasets:
4
+ - Open-Orca/SlimOrca
5
+ - argilla/distilabel-math-preference-dpo
6
+ inference: false
7
+ language:
8
+ - en
9
+ - de
10
+ - fr
11
+ - it
12
+ - es
13
+ library_name: transformers
14
+ license: apache-2.0
15
+ model_creator: VAGO solutions
16
+ model_name: SauerkrautLM Mixtral 8X7B
17
+ model_type: mixtral
18
+ pipeline_tag: text-generation
19
+ prompt_template: '<|im_start|>system
20
+
21
+ {system_message}<|im_end|>
22
+
23
+ <|im_start|>user
24
+
25
+ {prompt}<|im_end|>
26
+
27
+ <|im_start|>assistant
28
+
29
+ '
30
+ quantized_by: LHC88
31
+ tags:
32
+ - mistral
33
+ - finetune
34
+ - sft
35
+ - dpo
36
+ - chatml
37
+ - augmentation
38
+ - german
39
+ - mixtral
40
+ ---
41
+ <!-- markdownlint-disable MD041 -->
42
+
43
+ <!-- header start -->
44
+ <!-- 200823 -->
45
+ <div style="display: flex; justify-content: space-between; width: 100%;">
46
+ <div style="display: flex; flex-direction: column; align-items: flex-start;">
47
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://www.linkedin.com/in/lucas-h%C3%A4nke-de-cansino-8b8521234/">Chat & support: LHC's LinkedIn</a></p>
48
+ </div>
49
+ <div style="display: flex; flex-direction: column; align-items: flex-end;">
50
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://github.com/sponsors/l4b4r4b4b4">Want to contribute? LHC's Github Sponsors</a></p>
51
+ </div>
52
+ </div>
53
+
54
+ <hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
55
+ <!-- header end -->
56
+
57
+ # SauerkrautLM Mixtral 8X7B - AWQ
58
+ - Model creator: [VAGO solutions](https://huggingface.co/VAGOsolutions)
59
+ - Original model: [SauerkrautLM Mixtral 8X7B](https://huggingface.co/VAGOsolutions/SauerkrautLM-Mixtral-8x7B)
60
+
61
+ <!-- description start -->
62
+ ## Description
63
+
64
+ This repo contains AWQ model files for [VAGO solutions's SauerkrautLM Mixtral 8X7B](https://huggingface.co/VAGOsolutions/SauerkrautLM-Mixtral-8x7B).
65
+
66
+
67
+ **MIXTRAL AWQ**
68
+
69
+ This is a Mixtral AWQ model.
70
+
71
+ For AutoAWQ inference, please install AutoAWQ 0.1.8 or later.
72
+
73
+ Support via Transformers is coming soon, via this PR: https://github.com/huggingface/transformers/pull/27950 which should be merged to Transformers `main` very soon.
74
+
75
+ vLLM: version 0.2.6 is confirmed to support Mixtral AWQs.
76
+
77
+ TGI: I tested version 1.3.3 and it loaded the model fine, but I was not able to get any output back. Further testing/debug is required. (Let me know if you get it working!)
78
+
79
+ ### About AWQ
80
+
81
+ AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
82
+
83
+ AWQ models are currently supported on Linux and Windows, with NVidia GPUs only. macOS users: please use GGUF models instead.
84
+
85
+ AWQ models are supported by (note that not all of these may support Mixtral models yet - see above):
86
+
87
+ - [Text Generation Webui](https://github.com/oobabooga/text-generation-webui) - using Loader: AutoAWQ
88
+ - [vLLM](https://github.com/vllm-project/vllm) - version 0.2.2 or later for support for all model types.
89
+ - [Hugging Face Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference)
90
+ - [Transformers](https://huggingface.co/docs/transformers) version 4.35.0 and later, from any code or client that supports Transformers
91
+ - [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - for use from Python code
92
+
93
+ <!-- description end -->
94
+
95
+ <!-- prompt-template start -->
96
+ ## Prompt template: ChatML
97
+
98
+ ```
99
+ <|im_start|>system
100
+ {system_message}<|im_end|>
101
+ <|im_start|>user
102
+ {prompt}<|im_end|>
103
+ <|im_start|>assistant
104
+
105
+ ```
106
+
107
+ <!-- prompt-template end -->
108
+
109
+
110
+ <!-- README_AWQ.md-provided-files start -->
111
+ ## Provided files, and AWQ parameters
112
+
113
+ I currently release 128g GEMM models only. The addition of group_size 32 models, and GEMV kernel models, is being actively considered.
114
+
115
+ Models are released as sharded safetensors files.
116
+
117
+ | Branch | Bits | GS | AWQ Dataset | Seq Len | Size |
118
+ | ------ | ---- | -- | ----------- | ------- | ---- |
119
+ | [main](https://huggingface.co/LHC88/SauerkrautLM-Mixtral-8x7B-AWQ/tree/main) | 4 | 128 | [German Quad](https://huggingface.co/datasets/deepset/germanquad/viewer/) | 8192 | 24.65 GB
120
+
121
+ <!-- README_AWQ.md-provided-files end -->
122
+
123
+ <!-- README_AWQ.md-text-generation-webui start -->
124
+ ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
125
+
126
+ Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
127
+
128
+ It is strongly recommended to use the text-generation-webui one-click-installers unless you're sure you know how to make a manual install.
129
+
130
+ 1. Click the **Model tab**.
131
+ 2. Under **Download custom model or LoRA**, enter `LHC88/SauerkrautLM-Mixtral-8x7B-AWQ`.
132
+ 3. Click **Download**.
133
+ 4. The model will start downloading. Once it's finished it will say "Done".
134
+ 5. In the top left, click the refresh icon next to **Model**.
135
+ 6. In the **Model** dropdown, choose the model you just downloaded: `SauerkrautLM-Mixtral-8x7B-AWQ`
136
+ 7. Select **Loader: AutoAWQ**.
137
+ 8. Click Load, and the model will load and is now ready for use.
138
+ 9. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
139
+ 10. Once you're ready, click the **Text Generation** tab and enter a prompt to get started!
140
+ <!-- README_AWQ.md-text-generation-webui end -->
141
+
142
+ <!-- README_AWQ.md-use-from-vllm start -->
143
+ ## Multi-user inference server: vLLM
144
+
145
+ Documentation on installing and using vLLM [can be found here](https://vllm.readthedocs.io/en/latest/).
146
+
147
+ - Please ensure you are using vLLM version 0.2 or later.
148
+ - When using vLLM as a server, pass the `--quantization awq` parameter.
149
+
150
+ For example:
151
+
152
+ ```shell
153
+ python3 -m vllm.entrypoints.api_server --model LHC88/SauerkrautLM-Mixtral-8x7B-AWQ --quantization awq --dtype auto
154
+ ```
155
+
156
+ - When using vLLM from Python code, again set `quantization=awq`.
157
+
158
+ For example:
159
+
160
+ ```python
161
+ from vllm import LLM, SamplingParams
162
+
163
+ prompts = [
164
+ "Tell me about AI",
165
+ "Write a story about llamas",
166
+ "What is 291 - 150?",
167
+ "How much wood would a woodchuck chuck if a woodchuck could chuck wood?",
168
+ ]
169
+ prompt_template=f'''<|im_start|>system
170
+ {system_message}<|im_end|>
171
+ <|im_start|>user
172
+ {prompt}<|im_end|>
173
+ <|im_start|>assistant
174
+ '''
175
+
176
+ prompts = [prompt_template.format(prompt=prompt) for prompt in prompts]
177
+
178
+ sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
179
+
180
+ llm = LLM(model="LHC88/SauerkrautLM-Mixtral-8x7B-AWQ", quantization="awq", dtype="auto")
181
+
182
+ outputs = llm.generate(prompts, sampling_params)
183
+
184
+ # Print the outputs.
185
+ for output in outputs:
186
+ prompt = output.prompt
187
+ generated_text = output.outputs[0].text
188
+ print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
189
+ ```
190
+ <!-- README_AWQ.md-use-from-vllm start -->
191
+
192
+ <!-- README_AWQ.md-use-from-tgi start -->
193
+ ## Multi-user inference server: Hugging Face Text Generation Inference (TGI)
194
+
195
+ Use TGI version 1.1.0 or later. The official Docker container is: `ghcr.io/huggingface/text-generation-inference:1.1.0`
196
+
197
+ Example Docker parameters:
198
+
199
+ ```shell
200
+ --model-id LHC88/SauerkrautLM-Mixtral-8x7B-AWQ --port 3000 --quantize awq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096
201
+ ```
202
+
203
+ Example Python code for interfacing with TGI (requires [huggingface-hub](https://github.com/huggingface/huggingface_hub) 0.17.0 or later):
204
+
205
+ ```shell
206
+ pip3 install huggingface-hub
207
+ ```
208
+
209
+ ```python
210
+ from huggingface_hub import InferenceClient
211
+
212
+ endpoint_url = "https://your-endpoint-url-here"
213
+
214
+ prompt = "Tell me about AI"
215
+ prompt_template=f'''<|im_start|>system
216
+ {system_message}<|im_end|>
217
+ <|im_start|>user
218
+ {prompt}<|im_end|>
219
+ <|im_start|>assistant
220
+ '''
221
+
222
+ client = InferenceClient(endpoint_url)
223
+ response = client.text_generation(prompt,
224
+ max_new_tokens=128,
225
+ do_sample=True,
226
+ temperature=0.7,
227
+ top_p=0.95,
228
+ top_k=40,
229
+ repetition_penalty=1.1)
230
+
231
+ print(f"Model output: ", response)
232
+ ```
233
+ <!-- README_AWQ.md-use-from-tgi end -->
234
+
235
+ <!-- README_AWQ.md-use-from-python start -->
236
+ ## Inference from Python code using Transformers
237
+
238
+ ### Install the necessary packages
239
+
240
+ - Requires: [Transformers](https://huggingface.co/docs/transformers) 4.35.0 or later.
241
+ - Requires: [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) 0.1.6 or later.
242
+
243
+ ```shell
244
+ pip3 install --upgrade "autoawq>=0.1.6" "transformers>=4.35.0"
245
+ ```
246
+
247
+ Note that if you are using PyTorch 2.0.1, the above AutoAWQ command will automatically upgrade you to PyTorch 2.1.0.
248
+
249
+ If you are using CUDA 11.8 and wish to continue using PyTorch 2.0.1, instead run this command:
250
+
251
+ ```shell
252
+ pip3 install https://github.com/casper-hansen/AutoAWQ/releases/download/v0.1.6/autoawq-0.1.6+cu118-cp310-cp310-linux_x86_64.whl
253
+ ```
254
+
255
+ If you have problems installing [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) using the pre-built wheels, install it from source instead:
256
+
257
+ ```shell
258
+ pip3 uninstall -y autoawq
259
+ git clone https://github.com/casper-hansen/AutoAWQ
260
+ cd AutoAWQ
261
+ pip3 install .
262
+ ```
263
+
264
+ ### Transformers example code (requires Transformers 4.35.0 and later)
265
+
266
+ ```python
267
+ from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
268
+
269
+ model_name_or_path = "LHC88/SauerkrautLM-Mixtral-8x7B-AWQ"
270
+
271
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
272
+ model = AutoModelForCausalLM.from_pretrained(
273
+ model_name_or_path,
274
+ low_cpu_mem_usage=True,
275
+ device_map="cuda:0"
276
+ )
277
+
278
+ # Using the text streamer to stream output one token at a time
279
+ streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
280
+
281
+ prompt = "Tell me about AI"
282
+ prompt_template=f'''<|im_start|>system
283
+ {system_message}<|im_end|>
284
+ <|im_start|>user
285
+ {prompt}<|im_end|>
286
+ <|im_start|>assistant
287
+ '''
288
+
289
+ # Convert prompt to tokens
290
+ tokens = tokenizer(
291
+ prompt_template,
292
+ return_tensors='pt'
293
+ ).input_ids.cuda()
294
+
295
+ generation_params = {
296
+ "do_sample": True,
297
+ "temperature": 0.7,
298
+ "top_p": 0.95,
299
+ "top_k": 40,
300
+ "max_new_tokens": 512,
301
+ "repetition_penalty": 1.1
302
+ }
303
+
304
+ # Generate streamed output, visible one token at a time
305
+ generation_output = model.generate(
306
+ tokens,
307
+ streamer=streamer,
308
+ **generation_params
309
+ )
310
+
311
+ # Generation without a streamer, which will include the prompt in the output
312
+ generation_output = model.generate(
313
+ tokens,
314
+ **generation_params
315
+ )
316
+
317
+ # Get the tokens from the output, decode them, print them
318
+ token_output = generation_output[0]
319
+ text_output = tokenizer.decode(token_output)
320
+ print("model.generate output: ", text_output)
321
+
322
+ # Inference is also possible via Transformers' pipeline
323
+ from transformers import pipeline
324
+
325
+ pipe = pipeline(
326
+ "text-generation",
327
+ model=model,
328
+ tokenizer=tokenizer,
329
+ **generation_params
330
+ )
331
+
332
+ pipe_output = pipe(prompt_template)[0]['generated_text']
333
+ print("pipeline output: ", pipe_output)
334
+
335
+ ```
336
+ <!-- README_AWQ.md-use-from-python end -->
337
+
338
+ <!-- README_AWQ.md-compatibility start -->
339
+ ## Compatibility
340
+
341
+ The files provided are tested to work with:
342
+
343
+ - [text-generation-webui](https://github.com/oobabooga/text-generation-webui) using `Loader: AutoAWQ`.
344
+ - [vLLM](https://github.com/vllm-project/vllm) version 0.2.0 and later.
345
+ - [Hugging Face Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) version 1.1.0 and later.
346
+ - [Transformers](https://huggingface.co/docs/transformers) version 4.35.0 and later.
347
+ - [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) version 0.1.1 and later.
348
+
349
+ <!-- README_AWQ.md-compatibility end -->
350
+
351
+ <!-- footer start -->
352
+ <!-- 200823 -->
353
+
354
+ <!-- footer end -->
355
+
356
+ # Original model card: VAGO solutions's SauerkrautLM Mixtral 8X7B
357
+
358
+
359
+ ![SauerkrautLM](https://vago-solutions.de/wp-content/uploads/2023/12/Sauerkraut_MoE.png "SauerkrautLM-Mixtral-8x7B")
360
+ ## VAGO solutions SauerkrautLM-Mixtral-8x7B
361
+ Introducing **SauerkrautLM-Mixtral-8x7B** – our Sauerkraut version of the powerful Mixtral-8x7B!
362
+ Finetuned and aligned with **SFT** and **DPO**
363
+
364
+ # Table of Contents
365
+ 1. [Overview of all SauerkrautLM-Mixtral models](#all-sauerkrautlm-mixtral-models)
366
+ 2. [Model Details](#model-details)
367
+ - [Prompt template](#prompt-template)
368
+ - [Training Dataset](#training-dataset)
369
+ 3. [Evaluation](#evaluation)
370
+ 5. [Disclaimer](#disclaimer)
371
+ 6. [Contact](#contact)
372
+ 7. [Collaborations](#collaborations)
373
+ 8. [Acknowledgement](#acknowledgement)
374
+
375
+
376
+ ## All SauerkrautLM-Mixtral Models
377
+
378
+ | Model | HF | GPTQ | GGUF | AWQ |
379
+ |-------|-------|-------|-------|-------|
380
+ | SauerkrautLM-Mixtral-8x7B | [Link](https://huggingface.co/VAGOsolutions/SauerkrautLM-Mixtral-8x7B) | coming soon | coming soon | coming soon |
381
+ | SauerkrautLM-Mixtral-8x7B-Instruct | [Link](https://huggingface.co/VAGOsolutions/SauerkrautLM-Mixtral-8x7B-Instruct) | coming soon | coming soon | coming soon |
382
+
383
+ ## Model Details
384
+ **SauerkrautLM-Mixtral-8x7B**
385
+ - **Model Type:** SauerkrautLM-Mixtral-8x7B is a Mixture of Experts (MoE) Model based on [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)
386
+ - **Language(s):** English, German, French, Italian, Spanish
387
+ - **License:** APACHE 2.0
388
+ - **Contact:** [Website](https://vago-solutions.de/#Kontakt) [David Golchinfar](mailto:[email protected])
389
+
390
+ ### Training Dataset:
391
+
392
+ SauerkrautLM-Mixtral-8x7B was trained with mix of German data augmentation and translated data.
393
+ **SFT** with the dataset[OpenOrca/Slim-Orca](https://huggingface.co/datasets/Open-Orca/SlimOrca) and aligned through **DPO** with our **new German SauerkrautLM-DPO dataset** based on parts of the SFT SauerkrautLM dataset
394
+ as chosen answers and [Sauerkraut-7b-HerO](https://huggingface.co/VAGOsolutions/SauerkrautLM-7b-HerO) as rejected answers. Added with additional **translated Parts of the [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)** and **[argilla/distilabel-math-preference-dpo](https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo).**
395
+ We found, that only a simple translation of training data can lead to unnatural German phrasings.
396
+ Data augmentation techniques were used to grant grammatical, syntactical correctness and a more natural German wording in our training data.
397
+
398
+ ### Data Contamination Test Results
399
+
400
+ Some models on the HuggingFace leaderboard had problems with wrong data getting mixed in.
401
+ We checked our SauerkrautLM-DPO dataset with a special test [1] on a smaller model for this problem.
402
+ The HuggingFace team used the same methods [2, 3].
403
+
404
+ Our results, with `result < 0.1, %:` being well below 0.9, indicate that our dataset is free from contamination.
405
+
406
+ *The data contamination test results of HellaSwag and Winograde will be added once [1] supports them.*
407
+
408
+ | Dataset | ARC | MMLU | TruthfulQA | GSM8K |
409
+ |------------------------------|-------|-------|-------|-------|
410
+ | **SauerkrautLM-DPO**| result < 0.1, %: 0.0 |result < 0.1, %: 0.09 | result < 0.1, %: 0.13 | result < 0.1, %: 0.16 |
411
+
412
+ [1] https://github.com/swj0419/detect-pretrain-code-contamination
413
+
414
+ [2] https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/discussions/474#657f2245365456e362412a06
415
+
416
+ [3] https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/discussions/265#657b6debf81f6b44b8966230
417
+
418
+ ### Prompt Template:
419
+ ```
420
+ <|im_start|>system
421
+ Du bist ein großes Sprachmodell, das höflich und kompetent antwortet. Schreibe deine Gedanken Schritt für Schritt auf, um Probleme sinnvoll zu lösen.<|im_end|>
422
+ <|im_start|>user
423
+ Wie geht es dir?<|im_end|>
424
+ <|im_start|>assistant
425
+
426
+ ```
427
+ ## Evaluation
428
+
429
+ ![Harness](https://vago-solutions.de/wp-content/uploads/2023/12/MoEbenchmark.png "SauerkrautLM-Mixtral-8x7B Harness")
430
+ *evaluated with lm-evaluation-harness v0.3.0 - mmlu coming soon
431
+
432
+ *All benchmarks were performed with a sliding window of 4096. New Benchmarks with Sliding Window null coming soon
433
+
434
+ ## Disclaimer
435
+ We must inform users that despite our best efforts in data cleansing, the possibility of uncensored content slipping through cannot be entirely ruled out.
436
+ However, we cannot guarantee consistently appropriate behavior. Therefore, if you encounter any issues or come across inappropriate content, we kindly request that you inform us through the contact information provided.
437
+ Additionally, it is essential to understand that the licensing of these models does not constitute legal advice. We are not held responsible for the actions of third parties who utilize our models. These models may be employed for commercial purposes, and the Apache 2.0 remains applicable and is included with the model files.
438
+  
439
+ ## Contact
440
+ If you are interested in customized LLMs for business applications, please get in contact with us via our website or contact us at [Dr. Daryoush Vaziri](mailto:[email protected]). We are also grateful for your feedback and suggestions.
441
+  
442
+ ## Collaborations
443
+ We are also keenly seeking support and investment for our startup, VAGO solutions, where we continuously advance the development of robust language models designed to address a diverse range of purposes and requirements. If the prospect of collaboratively navigating future challenges excites you, we warmly invite you to reach out to us.
444
+
445
+ ## Acknowledgement
446
+ Many thanks to [OpenOrca](https://huggingface.co/Open-Orca), [argilla](https://huggingface.co/datasets/argilla) and [Huggingface](https://huggingface.co) for providing such valuable datasets to the Open-Source community. And of course a big thanks to MistralAI for providing the open source community with their latest technology!