TheBloke
/

baichuan-7B-GPTQ

@@ -28,20 +28,30 @@ It is the result of quantising to 4bit using [AutoGPTQ](https://github.com/PanQi
 * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/CAMEL-33B-Combined-Data-GPTQ)
 * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/baichuan-inc/baichuan-7B)
-## Experimental first GPTQ, requires AutoGPTQ PR
 This is a first quantisation of a brand new model type.
-It will only work with AutoGPTQ, and only by merging [LaaZa's PR](https://github.com/PanQiWei/AutoGPTQ/pull/164).
-To merge this PR, please follow these steps to install AutoGPTQ from source:
 ```
 pip uninstall -y auto-gptq
-git clone -b Baichuan https://github.com/LaaZa/AutoGPTQ baichuan_AutoGPTQ
-cd baichuan_AutoGPTQ
 GITHUB_ACTIONS=true pip install .
 ```
 ## Trust Remote Code
 As this is a new model type, not yet supported by Transformers, you must run inference with Trust Remote Code set.
@@ -59,7 +69,6 @@ The example given in the README is a 1-shot categorisation:
 Hamlet->Shakespeare\nOne Hundred Years of Solitude->
 ```
 ## How to easily download and use this model in text-generation-webui
 Please make sure you're using the latest version of text-generation-webui
@@ -78,7 +87,7 @@ Please make sure you're using the latest version of text-generation-webui
 ## How to use this GPTQ model from Python code
-First make sure you have the [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) PR installed as mentioned above.
 Then try the following example code:
@@ -86,7 +95,9 @@ Then try the following example code:
 from transformers import AutoTokenizer
 from auto_gptq import AutoGPTQForCausalLM
-model_name_or_path = "/workspace/process/baichuan-7B/gptq"
 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
@@ -112,10 +123,10 @@ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
 **gptq_model-4bit-128g.safetensors**
-This will work only with [AutoGPTQ using LaaZa's PR](https://github.com/PanQiWei/AutoGPTQ/pull/164).
 * `gptq_model-4bit-128g.safetensors`
-  * Works only with AutoGPTQ, currently requiring using [LaaZa's PR](https://github.com/PanQiWei/AutoGPTQ/pull/164).
   * Requires `trust_remote_code`.
   * Works with text-generation-webui, but not yet with one-click-installers unless you manually re-compile AutoGPTQ.
   * Parameters: Groupsize = 128. Act Order / desc_act = False.

 * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/CAMEL-33B-Combined-Data-GPTQ)
 * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/baichuan-inc/baichuan-7B)
+## Experimental first GPTQ, requires latest AutoGPTq code
 This is a first quantisation of a brand new model type.
+It will only work with AutoGPTQ, and only using the latest version of AutoGPTQ, compiled from source
+To merge this PR, please follow these steps to install the latest AutoGPTQ from source:
+**Linux**
 ```
 pip uninstall -y auto-gptq
+git clone https://github.com/PanQiWei/AutoGPTQ
+cd AutoGPTQ
 GITHUB_ACTIONS=true pip install .
 ```
+**Windows (command prompt)**:
+```
+pip uninstall -y auto-gptq
+git clone https://github.com/PanQiWei/AutoGPTQ
+cd AutoGPTQ
+set GITHUB_ACTIONS=true
+pip install .
+```
 ## Trust Remote Code
 As this is a new model type, not yet supported by Transformers, you must run inference with Trust Remote Code set.
 Hamlet->Shakespeare\nOne Hundred Years of Solitude->
 ```
 ## How to easily download and use this model in text-generation-webui
 Please make sure you're using the latest version of text-generation-webui
 ## How to use this GPTQ model from Python code
+First make sure you have the latest [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed from source as mentioned above.
 Then try the following example code:
 from transformers import AutoTokenizer
 from auto_gptq import AutoGPTQForCausalLM
+model_name_or_path = 'TheBloke/baichuan-7B-GPTQ'
+# Or you can clone the model locally and reference it on disk, eg with:
+# model_name_or_path = "/path/to/TheBloke_baichuan-7B"
 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
 **gptq_model-4bit-128g.safetensors**
+This will currently only work with the latest [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ), compiled from source.
 * `gptq_model-4bit-128g.safetensors`
+  * Works only with latest AutoGPTQ, compiled from source.
   * Requires `trust_remote_code`.
   * Works with text-generation-webui, but not yet with one-click-installers unless you manually re-compile AutoGPTQ.
   * Parameters: Groupsize = 128. Act Order / desc_act = False.