dahara1 commited on
Commit
3105ad4
1 Parent(s): 36dec03

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -7,23 +7,23 @@ language:
7
 
8
  original model [weblab-10b-instruction-sft](https://huggingface.co/matsuo-lab/weblab-10b-instruction-sft) which is a Japanese-centric multilingual GPT-NeoX model of 10 billion parameters.
9
 
10
- This model is A quantized(miniaturized) version of the original model.
11
 
12
  There are currently two well-known quantization methods.
13
- (1)GPTQ(This model)
14
- The size is smaller and the execution speed is faster, but the inference performance may be a little worse.
 
 
15
  You need autoGPTQ library to use this model.
16
 
17
- (2)llama.cpp([matsuolab-weblab-10b-instruction-sft-gguf](https://huggingface.co/mmnga/matsuolab-weblab-10b-instruction-sft-gguf)) created by mmnga.
18
- You can use cpu only machine. but little bit slow especialy long text.
 
19
 
20
 
21
  ### sample code
22
- At least one GPU is currently required due to a limitation of the Accelerate library.
23
- So this model cannot be run with the huggingface space free version.
24
- Try it on [Google Colab Under development](https://github.com/webbigdata-jp/python_sample/blob/main/weblab_10b_instruction_sft_GPTQ_sample.ipynb)
25
-
26
-
27
 
28
  ```
29
  pip install auto-gptq
 
7
 
8
  original model [weblab-10b-instruction-sft](https://huggingface.co/matsuo-lab/weblab-10b-instruction-sft) which is a Japanese-centric multilingual GPT-NeoX model of 10 billion parameters.
9
 
10
+ This model is A quantized(miniaturized) version of the original model(21.42GB).
11
 
12
  There are currently two well-known quantization methods.
13
+ (1)GPTQ(This model. 6.3 GB)
14
+ The size is smaller and the execution speed is faster, but the inference performance may be a little worse than original model.
15
+ At least one GPU is currently required due to a limitation of the Accelerate library.
16
+ So this model cannot be run with the huggingface space free version.
17
  You need autoGPTQ library to use this model.
18
 
19
+ (2)gguf([matsuolab-weblab-10b-instruction-sft-gguf](https://huggingface.co/mmnga/matsuolab-weblab-10b-instruction-sft-gguf) 6.03GB) created by mmnga.
20
+ You can use gguf model with llama.cpp at cpu only machine.
21
+ but maybe little bit slower then GPTQ especialy long text.
22
 
23
 
24
  ### sample code
25
+
26
+ Try it on [Google Colab. Under development](https://github.com/webbigdata-jp/python_sample/blob/main/weblab_10b_instruction_sft_GPTQ_sample.ipynb)
 
 
 
27
 
28
  ```
29
  pip install auto-gptq