LHC88
/

SauerkrautLM-Mixtral-8x7B-Instruct-AWQ

@@ -32,18 +32,15 @@ tags:
 <!-- header start -->
 <!-- 200823 -->
-<div style="width: auto; margin-left: auto; margin-right: auto">
-<img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
-</div>
 <div style="display: flex; justify-content: space-between; width: 100%;">
     <div style="display: flex; flex-direction: column; align-items: flex-start;">
-        <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://discord.gg/theblokeai">Chat & support: TheBloke's Discord server</a></p>
     </div>
     <div style="display: flex; flex-direction: column; align-items: flex-end;">
-        <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
     </div>
 </div>
-<div style="text-align:center; margin-top: 0em; margin-bottom: 0em"><p style="margin-top: 0.25em; margin-bottom: 0em;">TheBloke's LLM work is generously supported by a grant from <a href="https://a16z.com">andreessen horowitz (a16z)</a></p></div>
 <hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
 <!-- header end -->
@@ -84,14 +81,7 @@ AWQ models are supported by (note that not all of these may support Mixtral mode
 - [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - for use from Python code
 <!-- description end -->
-<!-- repositories-available start -->
-## Repositories available
-* [AWQ model(s) for GPU inference.](https://huggingface.co/TheBloke/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ)
-* [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/SauerkrautLM-Mixtral-8x7B-Instruct-GPTQ)
-* [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/SauerkrautLM-Mixtral-8x7B-Instruct-GGUF)
-* [VAGO solutions's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/VAGOsolutions/SauerkrautLM-Mixtral-8x7B-Instruct)
-<!-- repositories-available end -->
 <!-- prompt-template start -->
 ## Prompt template: Mistral
@@ -113,7 +103,7 @@ Models are released as sharded safetensors files.
 | Branch | Bits | GS | AWQ Dataset | Seq Len | Size |
 | ------ | ---- | -- | ----------- | ------- | ---- |
-| [main](https://huggingface.co/TheBloke/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ/tree/main) | 4 | 32 | [German Quad](https://huggingface.co/datasets/deepset/germanquad/viewer/) | 8192 | 24.65 GB
 <!-- README_AWQ.md-provided-files end -->
@@ -125,7 +115,7 @@ Please make sure you're using the latest version of [text-generation-webui](http
 It is strongly recommended to use the text-generation-webui one-click-installers unless you're sure you know how to make a manual install.
 1. Click the **Model tab**.
-2. Under **Download custom model or LoRA**, enter `TheBloke/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ`.
 3. Click **Download**.
 4. The model will start downloading. Once it's finished it will say "Done".
 5. In the top left, click the refresh icon next to **Model**.
@@ -147,7 +137,7 @@ Documentation on installing and using vLLM [can be found here](https://vllm.read
 For example:
 ```shell
-python3 -m vllm.entrypoints.api_server --model TheBloke/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ --quantization awq --dtype auto
 ```
 - When using vLLM from Python code, again set `quantization=awq`.
@@ -170,7 +160,7 @@ prompts = [prompt_template.format(prompt=prompt) for prompt in prompts]
 sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
-llm = LLM(model="TheBloke/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ", quantization="awq", dtype="auto")
 outputs = llm.generate(prompts, sampling_params)
@@ -190,7 +180,7 @@ Use TGI version 1.1.0 or later. The official Docker container is: `ghcr.io/huggi
 Example Docker parameters:
 ```shell
---model-id TheBloke/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ --port 3000 --quantize awq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096
 ```
 Example Python code for interfacing with TGI (requires [huggingface-hub](https://github.com/huggingface/huggingface_hub) 0.17.0 or later):
@@ -255,7 +245,7 @@ pip3 install .
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
-model_name_or_path = "TheBloke/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ"
 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
 model = AutoModelForCausalLM.from_pretrained(

 <!-- header start -->
 <!-- 200823 -->
 <div style="display: flex; justify-content: space-between; width: 100%;">
     <div style="display: flex; flex-direction: column; align-items: flex-start;">
+        <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://www.linkedin.com/in/lucas-h%C3%A4nke-de-cansino-8b8521234/">Chat & support: LHC's LinkedIn</a></p>
     </div>
     <div style="display: flex; flex-direction: column; align-items: flex-end;">
+        <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://github.com/sponsors/l4b4r4b4b4">Want to contribute? LHC's Github Sponsors</a></p>
     </div>
 </div>
 <hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
 <!-- header end -->
 - [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - for use from Python code
 <!-- description end -->
 <!-- prompt-template start -->
 ## Prompt template: Mistral
 | Branch | Bits | GS | AWQ Dataset | Seq Len | Size |
 | ------ | ---- | -- | ----------- | ------- | ---- |
+| [main](https://huggingface.co/LHC88/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ/tree/main) | 4 | 32 | [German Quad](https://huggingface.co/datasets/deepset/germanquad/viewer/) | 8192 | 24.65 GB
 <!-- README_AWQ.md-provided-files end -->
 It is strongly recommended to use the text-generation-webui one-click-installers unless you're sure you know how to make a manual install.
 1. Click the **Model tab**.
+2. Under **Download custom model or LoRA**, enter `LHC88/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ`.
 3. Click **Download**.
 4. The model will start downloading. Once it's finished it will say "Done".
 5. In the top left, click the refresh icon next to **Model**.
 For example:
 ```shell
+python3 -m vllm.entrypoints.api_server --model LHC88/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ --quantization awq --dtype auto
 ```
 - When using vLLM from Python code, again set `quantization=awq`.
 sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
+llm = LLM(model="LHC88/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ", quantization="awq", dtype="auto")
 outputs = llm.generate(prompts, sampling_params)
 Example Docker parameters:
 ```shell
+--model-id LHC88/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ --port 3000 --quantize awq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096
 ```
 Example Python code for interfacing with TGI (requires [huggingface-hub](https://github.com/huggingface/huggingface_hub) 0.17.0 or later):
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
+model_name_or_path = "LHC88/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ"
 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
 model = AutoModelForCausalLM.from_pretrained(