Update new version readme.
Browse files
README.md
ADDED
@@ -0,0 +1,124 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: creativeml-openrail-m
|
3 |
+
language: en
|
4 |
+
tags:
|
5 |
+
- LLM
|
6 |
+
- ChatGLM
|
7 |
+
|
8 |
+
---
|
9 |
+
|
10 |
+
|
11 |
+
## Breakings!
|
12 |
+
|
13 |
+
**We know what you want, and here they are!**
|
14 |
+
|
15 |
+
- Newly released lyraChatGLM model, suitable for Ampere(A100/A10) as well as Volta(V100)
|
16 |
+
- lyraChatGLM has been further optimized, reaches **9000tokens/s** on A100 and **3900 tokens/s** on V100, about **5.5x** faster than original version(2023/6/1).
|
17 |
+
- The memory usage was optimized too, now we can set batch_size up to **256** on A100!
|
18 |
+
|
19 |
+
**Note that the code was fully updated too, you need to use new API, see `Uses` below**
|
20 |
+
|
21 |
+
|
22 |
+
## Model Card for lyraChatGLM
|
23 |
+
|
24 |
+
lyraChatGLM is currently the **fastest ChatGLM-6B** available. To the best of our knowledge, it is the **first accelerated version of ChatGLM-6B**.
|
25 |
+
|
26 |
+
The inference speed of lyraChatGLM has achieved **300x** acceleration upon the ealry original version. We are still working hard to further improve the performance.
|
27 |
+
|
28 |
+
Among its main features are:
|
29 |
+
|
30 |
+
- weights: original ChatGLM-6B weights released by THUDM.
|
31 |
+
- device: Nvidia GPU with Amperer architecture or Volta architecture (A100, A10, V100...).
|
32 |
+
- batch_size: compiled with dynamic batch size, maximum depends on device.
|
33 |
+
|
34 |
+
## Speed
|
35 |
+
|
36 |
+
- orginal version(fixed batch infer): commit id 1d240ba
|
37 |
+
|
38 |
+
### test on A100 40G
|
39 |
+
|
40 |
+
|version|max_batch_size|max_speed|
|
41 |
+
|:-:|:-:|:-:|
|
42 |
+
|original|1|30 tokens/s|
|
43 |
+
|original(fxied batch infer)|192|1638.52 toekns/s|
|
44 |
+
|lyraChatGLM(current)|256|9082.60+ tokens/s|
|
45 |
+
|
46 |
+
|
47 |
+
|
48 |
+
### test on V100
|
49 |
+
|version|max_batch_size|max_speed|
|
50 |
+
|:-:|:-:|:-:|
|
51 |
+
|original|1|17.83 tokens/s|
|
52 |
+
|original(fxied batch infer)|128|992.20 toekns/s|
|
53 |
+
|lyraChatGLM(current)|192|3911.45+ tokens/s|
|
54 |
+
|
55 |
+
|
56 |
+
## Model Sources
|
57 |
+
|
58 |
+
- **Repository:** https://huggingface.co/THUDM/chatglm-6b
|
59 |
+
|
60 |
+
|
61 |
+
## Docker Environment
|
62 |
+
|
63 |
+
- **docker image available** at [https://hub.docker.com/repository/docker/bigmoyan/lyrallm/general], pull image by:
|
64 |
+
|
65 |
+
```
|
66 |
+
docker pull bigmoyan/lyrallm:v0.1
|
67 |
+
```
|
68 |
+
|
69 |
+
|
70 |
+
## Uses
|
71 |
+
|
72 |
+
```python
|
73 |
+
from lyraChatGLM import LyraChatGLM6B
|
74 |
+
|
75 |
+
model_path = "./models/1-gpu-fp16.h5"
|
76 |
+
tokenizer_path = "./models"
|
77 |
+
data_type = "fp16"
|
78 |
+
int8_mode = 0
|
79 |
+
max_output_length = 150
|
80 |
+
arch = "Ampere" # Ampere or Volta
|
81 |
+
|
82 |
+
model = LyraChatGLM6B(model_path, tokenizer_path, data_type, int8_mode, arch)
|
83 |
+
prompt = "列出3个不同的机器学习算法,并说明它们的适用范围."
|
84 |
+
test_batch_size = 256
|
85 |
+
|
86 |
+
prompts = [prompt, ]
|
87 |
+
|
88 |
+
|
89 |
+
# If you want to get different output in same batch, you can set do_sample to True
|
90 |
+
output_texts = model.generate(prompts, output_length=max_output_length,top_k=30, top_p=0.85, temperature=0.35, repetition_penalty=1.2, do_sample=False)
|
91 |
+
|
92 |
+
print(output_texts)
|
93 |
+
|
94 |
+
```
|
95 |
+
## Demo output
|
96 |
+
|
97 |
+
### input
|
98 |
+
列出3个不同的机器学习算法,并说明它们的适用范围.
|
99 |
+
|
100 |
+
### output
|
101 |
+
以下是三个常见的机器学习算法及其适用范围:
|
102 |
+
|
103 |
+
1. 决策树(Decision Tree):决策树是一种基于分类和回归问题的朴素贝叶斯模型。它通过构建一系列逐步分裂的分支来预测结果。适用于那些具有简单特征、大量数据且数据集大小在可接受范围内的情况。
|
104 |
+
|
105 |
+
2. 随机森林(Random Forest):随机森林是一种集成学习算法,由多个决策树组成。它的优点是能够处理大规模数据和高维度的特征。适用于需要对多个变量进行建模的场景,例如医疗诊断、金融风险评估等。
|
106 |
+
|
107 |
+
3. 支持向量机(Support Vector Machine):支持向量机是一种监督学习方法,通常用于分类问题。它可以处理高维数据,并且具有较高的准确性。适用于需要对高维数据进行分类或回归的问题,例如图像识别、自然语言处理等。
|
108 |
+
|
109 |
+
|
110 |
+
## Citation
|
111 |
+
``` bibtex
|
112 |
+
@Misc{lyraChatGLM2023,
|
113 |
+
author = {Kangjian Wu, Zhengtao Wang, Yibo Lu, Bin Wu},
|
114 |
+
title = {lyraChatGLM: Accelerating ChatGLM by 5.5x+},
|
115 |
+
howpublished = {\url{https://huggingface.co/TMElyralab/lyraChatGLM}},
|
116 |
+
year = {2023}
|
117 |
+
}
|
118 |
+
```
|
119 |
+
|
120 |
+
## Report bug
|
121 |
+
- start a discussion to report any bugs!--> https://huggingface.co/TMElyralab/lyraChatGLM/discussions
|
122 |
+
- report bug with a `[bug]` mark in the title.
|
123 |
+
|
124 |
+
|