TheBloke commited on
Commit
959eb7f
1 Parent(s): b02e67a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -3
README.md CHANGED
@@ -4,11 +4,11 @@ license: other
4
  # Koala: A Dialogue Model for Academic Research
5
  This repo contains the weights of the Koala 13B model produced at Berkeley. It is the result of combining the diffs from https://huggingface.co/young-geng/koala with the original Llama 13B model.
6
 
7
- This version has then been quantized to 4bit using https://github.com/qwopqwop200/GPTQ-for-LLaMa
8
 
9
  ## Other Koala repos
10
 
11
- These other versions are also available:
12
  * [Unquantized 13B model in HF format](https://huggingface.co/TheBloke/koala-13B-HF)
13
  * [Unquantized 7B model in HF format](https://huggingface.co/TheBloke/koala-7B-HF)
14
  * [Unquantized 7B model in GGML format for llama.cpp](https://huggingface.co/TheBloke/koala-7b-ggml-unquantized)
@@ -44,11 +44,17 @@ python server.py --model koala-13B-GPTQ-4bit-128g --wbits 4 --groupsize 128 --mo
44
 
45
  The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.
46
 
 
 
 
 
 
 
47
  ## Coming soon
48
 
49
  Tomorrow I will upload a `safetensors` file as well.
50
 
51
- ## How to merge Koala delta weights
52
 
53
  The Koala delta weights were originally merged using the following commands, producing [koala-13B-HF](https://huggingface.co/TheBloke/koala-13B-HF):
54
  ```
@@ -80,6 +86,8 @@ PYTHON_PATH="${PWD}:$PYTHONPATH" python \
80
  --tokenizer_path=/content/llama-13b/tokenizer.model
81
  ```
82
 
 
 
83
  Check out the following links to learn more about the Berkeley Koala model.
84
  * [Blog post](https://bair.berkeley.edu/blog/2023/04/03/koala/)
85
  * [Online demo](https://koala.lmsys.org/)
 
4
  # Koala: A Dialogue Model for Academic Research
5
  This repo contains the weights of the Koala 13B model produced at Berkeley. It is the result of combining the diffs from https://huggingface.co/young-geng/koala with the original Llama 13B model.
6
 
7
+ This version has then been quantized to 4-bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
8
 
9
  ## Other Koala repos
10
 
11
+ I have also made these other Koala repose available:
12
  * [Unquantized 13B model in HF format](https://huggingface.co/TheBloke/koala-13B-HF)
13
  * [Unquantized 7B model in HF format](https://huggingface.co/TheBloke/koala-7B-HF)
14
  * [Unquantized 7B model in GGML format for llama.cpp](https://huggingface.co/TheBloke/koala-7b-ggml-unquantized)
 
44
 
45
  The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.
46
 
47
+ If you cannot use the Triton branch for any reason, I believe it should also work to use the CUDA branch instead:
48
+ ```
49
+ git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b cuda
50
+ ```
51
+ Then link that into `text-generation-webui/repositories` as described above.
52
+
53
  ## Coming soon
54
 
55
  Tomorrow I will upload a `safetensors` file as well.
56
 
57
+ ## How the Koala delta weights were merged
58
 
59
  The Koala delta weights were originally merged using the following commands, producing [koala-13B-HF](https://huggingface.co/TheBloke/koala-13B-HF):
60
  ```
 
86
  --tokenizer_path=/content/llama-13b/tokenizer.model
87
  ```
88
 
89
+ ## Further info
90
+
91
  Check out the following links to learn more about the Berkeley Koala model.
92
  * [Blog post](https://bair.berkeley.edu/blog/2023/04/03/koala/)
93
  * [Online demo](https://koala.lmsys.org/)