Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- zh
|
4 |
+
- en
|
5 |
+
tags:
|
6 |
+
- glm
|
7 |
+
- chatglm
|
8 |
+
- ggml
|
9 |
+
---
|
10 |
+
# ChatGLM3-6B-int8
|
11 |
+
|
12 |
+
介绍 (Introduction)
|
13 |
+
ChatGLM3-6B 是 ChatGLM 系列最新一代的开源模型,[THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b)
|
14 |
+
|
15 |
+
用 [ChatGLM.CPP]() 基於 GGML quantize 生成 Q8_0 權重 weights 儲存於此倉庫。
|
16 |
+
|
17 |
+
## Performance
|
18 |
+
|Model |GGML quantize method| HDD size |1 token\*|
|
19 |
+
|----------------------|--------------------|----------|---------|
|
20 |
+
|chatglm3-ggml-q8_0.bin| q8_0 | 6.64 GB | 114ms |
|
21 |
+
\* ms/token (CPU @ Platinum 8260) from [reference](https://github.com/li-plus/chatglm.cpp#performance)
|
22 |
+
|
23 |
+
## Getting Started
|
24 |
+
1. Install dependency
|
25 |
+
```sh
|
26 |
+
pip install chatglm-cpp transformers
|
27 |
+
```
|
28 |
+
|
29 |
+
2. Download weight
|
30 |
+
```sh
|
31 |
+
wget https://huggingface.co/npc0/chatglm3-6b-int8/resolve/main/chatglm3-ggml-q8_0.bin
|
32 |
+
```
|
33 |
+
|
34 |
+
3. Code
|
35 |
+
```py
|
36 |
+
import chatglm_cpp
|
37 |
+
|
38 |
+
pipeline = chatglm_cpp.Pipeline("./chatglm3-ggml-q8_0.bin")
|
39 |
+
pipeline.chat(["你好"])
|
40 |
+
# Output: 你好👋!我是人工智能助手 ChatGLM3-6B,很高兴见到你,欢迎问我任何问题。
|
41 |
+
```
|