LHC88 commited on
Commit
fc3440c
1 Parent(s): bc5bbc1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -19
README.md CHANGED
@@ -32,18 +32,15 @@ tags:
32
 
33
  <!-- header start -->
34
  <!-- 200823 -->
35
- <div style="width: auto; margin-left: auto; margin-right: auto">
36
- <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
37
- </div>
38
  <div style="display: flex; justify-content: space-between; width: 100%;">
39
  <div style="display: flex; flex-direction: column; align-items: flex-start;">
40
- <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://discord.gg/theblokeai">Chat & support: TheBloke's Discord server</a></p>
41
  </div>
42
  <div style="display: flex; flex-direction: column; align-items: flex-end;">
43
- <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
44
  </div>
45
  </div>
46
- <div style="text-align:center; margin-top: 0em; margin-bottom: 0em"><p style="margin-top: 0.25em; margin-bottom: 0em;">TheBloke's LLM work is generously supported by a grant from <a href="https://a16z.com">andreessen horowitz (a16z)</a></p></div>
47
  <hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
48
  <!-- header end -->
49
 
@@ -84,14 +81,7 @@ AWQ models are supported by (note that not all of these may support Mixtral mode
84
  - [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - for use from Python code
85
 
86
  <!-- description end -->
87
- <!-- repositories-available start -->
88
- ## Repositories available
89
 
90
- * [AWQ model(s) for GPU inference.](https://huggingface.co/TheBloke/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ)
91
- * [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/SauerkrautLM-Mixtral-8x7B-Instruct-GPTQ)
92
- * [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/SauerkrautLM-Mixtral-8x7B-Instruct-GGUF)
93
- * [VAGO solutions's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/VAGOsolutions/SauerkrautLM-Mixtral-8x7B-Instruct)
94
- <!-- repositories-available end -->
95
 
96
  <!-- prompt-template start -->
97
  ## Prompt template: Mistral
@@ -113,7 +103,7 @@ Models are released as sharded safetensors files.
113
 
114
  | Branch | Bits | GS | AWQ Dataset | Seq Len | Size |
115
  | ------ | ---- | -- | ----------- | ------- | ---- |
116
- | [main](https://huggingface.co/TheBloke/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ/tree/main) | 4 | 32 | [German Quad](https://huggingface.co/datasets/deepset/germanquad/viewer/) | 8192 | 24.65 GB
117
 
118
  <!-- README_AWQ.md-provided-files end -->
119
 
@@ -125,7 +115,7 @@ Please make sure you're using the latest version of [text-generation-webui](http
125
  It is strongly recommended to use the text-generation-webui one-click-installers unless you're sure you know how to make a manual install.
126
 
127
  1. Click the **Model tab**.
128
- 2. Under **Download custom model or LoRA**, enter `TheBloke/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ`.
129
  3. Click **Download**.
130
  4. The model will start downloading. Once it's finished it will say "Done".
131
  5. In the top left, click the refresh icon next to **Model**.
@@ -147,7 +137,7 @@ Documentation on installing and using vLLM [can be found here](https://vllm.read
147
  For example:
148
 
149
  ```shell
150
- python3 -m vllm.entrypoints.api_server --model TheBloke/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ --quantization awq --dtype auto
151
  ```
152
 
153
  - When using vLLM from Python code, again set `quantization=awq`.
@@ -170,7 +160,7 @@ prompts = [prompt_template.format(prompt=prompt) for prompt in prompts]
170
 
171
  sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
172
 
173
- llm = LLM(model="TheBloke/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ", quantization="awq", dtype="auto")
174
 
175
  outputs = llm.generate(prompts, sampling_params)
176
 
@@ -190,7 +180,7 @@ Use TGI version 1.1.0 or later. The official Docker container is: `ghcr.io/huggi
190
  Example Docker parameters:
191
 
192
  ```shell
193
- --model-id TheBloke/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ --port 3000 --quantize awq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096
194
  ```
195
 
196
  Example Python code for interfacing with TGI (requires [huggingface-hub](https://github.com/huggingface/huggingface_hub) 0.17.0 or later):
@@ -255,7 +245,7 @@ pip3 install .
255
  ```python
256
  from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
257
 
258
- model_name_or_path = "TheBloke/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ"
259
 
260
  tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
261
  model = AutoModelForCausalLM.from_pretrained(
 
32
 
33
  <!-- header start -->
34
  <!-- 200823 -->
 
 
 
35
  <div style="display: flex; justify-content: space-between; width: 100%;">
36
  <div style="display: flex; flex-direction: column; align-items: flex-start;">
37
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://www.linkedin.com/in/lucas-h%C3%A4nke-de-cansino-8b8521234/">Chat & support: LHC's LinkedIn</a></p>
38
  </div>
39
  <div style="display: flex; flex-direction: column; align-items: flex-end;">
40
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://github.com/sponsors/l4b4r4b4b4">Want to contribute? LHC's Github Sponsors</a></p>
41
  </div>
42
  </div>
43
+
44
  <hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
45
  <!-- header end -->
46
 
 
81
  - [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - for use from Python code
82
 
83
  <!-- description end -->
 
 
84
 
 
 
 
 
 
85
 
86
  <!-- prompt-template start -->
87
  ## Prompt template: Mistral
 
103
 
104
  | Branch | Bits | GS | AWQ Dataset | Seq Len | Size |
105
  | ------ | ---- | -- | ----------- | ------- | ---- |
106
+ | [main](https://huggingface.co/LHC88/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ/tree/main) | 4 | 32 | [German Quad](https://huggingface.co/datasets/deepset/germanquad/viewer/) | 8192 | 24.65 GB
107
 
108
  <!-- README_AWQ.md-provided-files end -->
109
 
 
115
  It is strongly recommended to use the text-generation-webui one-click-installers unless you're sure you know how to make a manual install.
116
 
117
  1. Click the **Model tab**.
118
+ 2. Under **Download custom model or LoRA**, enter `LHC88/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ`.
119
  3. Click **Download**.
120
  4. The model will start downloading. Once it's finished it will say "Done".
121
  5. In the top left, click the refresh icon next to **Model**.
 
137
  For example:
138
 
139
  ```shell
140
+ python3 -m vllm.entrypoints.api_server --model LHC88/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ --quantization awq --dtype auto
141
  ```
142
 
143
  - When using vLLM from Python code, again set `quantization=awq`.
 
160
 
161
  sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
162
 
163
+ llm = LLM(model="LHC88/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ", quantization="awq", dtype="auto")
164
 
165
  outputs = llm.generate(prompts, sampling_params)
166
 
 
180
  Example Docker parameters:
181
 
182
  ```shell
183
+ --model-id LHC88/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ --port 3000 --quantize awq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096
184
  ```
185
 
186
  Example Python code for interfacing with TGI (requires [huggingface-hub](https://github.com/huggingface/huggingface_hub) 0.17.0 or later):
 
245
  ```python
246
  from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
247
 
248
+ model_name_or_path = "LHC88/SauerkrautLM-Mixtral-8x7B-Instruct-AWQ"
249
 
250
  tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
251
  model = AutoModelForCausalLM.from_pretrained(