Update README.md
Browse files
README.md
CHANGED
@@ -32,18 +32,15 @@ tags:
|
|
32 |
|
33 |
<!-- header start -->
|
34 |
<!-- 200823 -->
|
35 |
-
<div style="width: auto; margin-left: auto; margin-right: auto">
|
36 |
-
<img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
|
37 |
-
</div>
|
38 |
<div style="display: flex; justify-content: space-between; width: 100%;">
|
39 |
<div style="display: flex; flex-direction: column; align-items: flex-start;">
|
40 |
-
<p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://
|
41 |
</div>
|
42 |
<div style="display: flex; flex-direction: column; align-items: flex-end;">
|
43 |
-
<p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://
|
44 |
</div>
|
45 |
</div>
|
46 |
-
|
47 |
<hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
|
48 |
<!-- header end -->
|
49 |
|
@@ -84,14 +81,7 @@ AWQ models are supported by (note that not all of these may support Mixtral mode
|
|
84 |
- [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - for use from Python code
|
85 |
|
86 |
<!-- description end -->
|
87 |
-
<!-- repositories-available start -->
|
88 |
-
## Repositories available
|
89 |
|
90 |
-
* [AWQ model(s) for GPU inference.](https://huggingface.co/TheBloke/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ)
|
91 |
-
* [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/SauerkrautLM-Mixtral-8x7B-Instruct-GPTQ)
|
92 |
-
* [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/SauerkrautLM-Mixtral-8x7B-Instruct-GGUF)
|
93 |
-
* [VAGO solutions's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/VAGOsolutions/SauerkrautLM-Mixtral-8x7B-Instruct)
|
94 |
-
<!-- repositories-available end -->
|
95 |
|
96 |
<!-- prompt-template start -->
|
97 |
## Prompt template: Mistral
|
@@ -113,7 +103,7 @@ Models are released as sharded safetensors files.
|
|
113 |
|
114 |
| Branch | Bits | GS | AWQ Dataset | Seq Len | Size |
|
115 |
| ------ | ---- | -- | ----------- | ------- | ---- |
|
116 |
-
| [main](https://huggingface.co/
|
117 |
|
118 |
<!-- README_AWQ.md-provided-files end -->
|
119 |
|
@@ -125,7 +115,7 @@ Please make sure you're using the latest version of [text-generation-webui](http
|
|
125 |
It is strongly recommended to use the text-generation-webui one-click-installers unless you're sure you know how to make a manual install.
|
126 |
|
127 |
1. Click the **Model tab**.
|
128 |
-
2. Under **Download custom model or LoRA**, enter `
|
129 |
3. Click **Download**.
|
130 |
4. The model will start downloading. Once it's finished it will say "Done".
|
131 |
5. In the top left, click the refresh icon next to **Model**.
|
@@ -147,7 +137,7 @@ Documentation on installing and using vLLM [can be found here](https://vllm.read
|
|
147 |
For example:
|
148 |
|
149 |
```shell
|
150 |
-
python3 -m vllm.entrypoints.api_server --model
|
151 |
```
|
152 |
|
153 |
- When using vLLM from Python code, again set `quantization=awq`.
|
@@ -170,7 +160,7 @@ prompts = [prompt_template.format(prompt=prompt) for prompt in prompts]
|
|
170 |
|
171 |
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
|
172 |
|
173 |
-
llm = LLM(model="
|
174 |
|
175 |
outputs = llm.generate(prompts, sampling_params)
|
176 |
|
@@ -190,7 +180,7 @@ Use TGI version 1.1.0 or later. The official Docker container is: `ghcr.io/huggi
|
|
190 |
Example Docker parameters:
|
191 |
|
192 |
```shell
|
193 |
-
--model-id
|
194 |
```
|
195 |
|
196 |
Example Python code for interfacing with TGI (requires [huggingface-hub](https://github.com/huggingface/huggingface_hub) 0.17.0 or later):
|
@@ -255,7 +245,7 @@ pip3 install .
|
|
255 |
```python
|
256 |
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
|
257 |
|
258 |
-
model_name_or_path = "
|
259 |
|
260 |
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
|
261 |
model = AutoModelForCausalLM.from_pretrained(
|
|
|
32 |
|
33 |
<!-- header start -->
|
34 |
<!-- 200823 -->
|
|
|
|
|
|
|
35 |
<div style="display: flex; justify-content: space-between; width: 100%;">
|
36 |
<div style="display: flex; flex-direction: column; align-items: flex-start;">
|
37 |
+
<p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://www.linkedin.com/in/lucas-h%C3%A4nke-de-cansino-8b8521234/">Chat & support: LHC's LinkedIn</a></p>
|
38 |
</div>
|
39 |
<div style="display: flex; flex-direction: column; align-items: flex-end;">
|
40 |
+
<p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://github.com/sponsors/l4b4r4b4b4">Want to contribute? LHC's Github Sponsors</a></p>
|
41 |
</div>
|
42 |
</div>
|
43 |
+
|
44 |
<hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
|
45 |
<!-- header end -->
|
46 |
|
|
|
81 |
- [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - for use from Python code
|
82 |
|
83 |
<!-- description end -->
|
|
|
|
|
84 |
|
|
|
|
|
|
|
|
|
|
|
85 |
|
86 |
<!-- prompt-template start -->
|
87 |
## Prompt template: Mistral
|
|
|
103 |
|
104 |
| Branch | Bits | GS | AWQ Dataset | Seq Len | Size |
|
105 |
| ------ | ---- | -- | ----------- | ------- | ---- |
|
106 |
+
| [main](https://huggingface.co/LHC88/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ/tree/main) | 4 | 32 | [German Quad](https://huggingface.co/datasets/deepset/germanquad/viewer/) | 8192 | 24.65 GB
|
107 |
|
108 |
<!-- README_AWQ.md-provided-files end -->
|
109 |
|
|
|
115 |
It is strongly recommended to use the text-generation-webui one-click-installers unless you're sure you know how to make a manual install.
|
116 |
|
117 |
1. Click the **Model tab**.
|
118 |
+
2. Under **Download custom model or LoRA**, enter `LHC88/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ`.
|
119 |
3. Click **Download**.
|
120 |
4. The model will start downloading. Once it's finished it will say "Done".
|
121 |
5. In the top left, click the refresh icon next to **Model**.
|
|
|
137 |
For example:
|
138 |
|
139 |
```shell
|
140 |
+
python3 -m vllm.entrypoints.api_server --model LHC88/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ --quantization awq --dtype auto
|
141 |
```
|
142 |
|
143 |
- When using vLLM from Python code, again set `quantization=awq`.
|
|
|
160 |
|
161 |
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
|
162 |
|
163 |
+
llm = LLM(model="LHC88/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ", quantization="awq", dtype="auto")
|
164 |
|
165 |
outputs = llm.generate(prompts, sampling_params)
|
166 |
|
|
|
180 |
Example Docker parameters:
|
181 |
|
182 |
```shell
|
183 |
+
--model-id LHC88/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ --port 3000 --quantize awq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096
|
184 |
```
|
185 |
|
186 |
Example Python code for interfacing with TGI (requires [huggingface-hub](https://github.com/huggingface/huggingface_hub) 0.17.0 or later):
|
|
|
245 |
```python
|
246 |
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
|
247 |
|
248 |
+
model_name_or_path = "LHC88/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ"
|
249 |
|
250 |
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
|
251 |
model = AutoModelForCausalLM.from_pretrained(
|