Ubuntu commited on
Commit
397fc1e
1 Parent(s): d613393

english readme

Browse files
Files changed (3) hide show
  1. README.md +10 -8
  2. README_en.md +54 -0
  3. generation_config.json +13 -0
README.md CHANGED
@@ -15,11 +15,15 @@ inference: false
15
 
16
  # GLM-4-9B
17
 
18
- GLM-4-9B 是智谱 AI 推出的最新一代预训练模型 GLM-4 系列中的开源版本。
19
- 在语义、数学、推理、代码和知识等多方面的数据集测评中,GLM-4-9B 及其人类偏好对齐的版本 GLM-4-9B-Chat 均表现出较高的性能。
20
- 除了能进行多轮对话,GLM-4-9B-Chat 还具备网页浏览、代码执行、自定义工具调用(Function Call)和长文本推理(支持最大 128K
21
- 上下文)等高级功能。
22
- 本代模型增加了多语言支持,支持包括日语,韩语,德语在内的 26 种语言。我们还推出了支持 1M 上下文长度(约 200 万中文字符)的模型。
 
 
 
 
23
 
24
  我们在一些典型任务上对 GLM-4-9B 基座模型进行了评测,结果如下:
25
 
@@ -31,14 +35,12 @@ GLM-4-9B 是智谱 AI 推出的最新一代预训练模型 GLM-4 系列中的开
31
  | GLM-4-9B | **74.7** | **77.1** | **34.3** | **84.0** | **30.4** | **70.1** |
32
 
33
 
34
- **本仓库是 GLM-4-9B 的基座版本,支持`8K`上下文长度。**
35
 
36
  ## 协议
37
 
38
  GLM-4 模型的权重的使用则需要遵循 [LICENSE](LICENSE)。
39
 
40
- Rhe use of the GLM-4 model weights needs to comply with the [LICENSE](LICENSE).
41
-
42
  ## 引用
43
 
44
  如果你觉得我们的工作有帮助的话,请考虑引用下列论文。
 
15
 
16
  # GLM-4-9B
17
 
18
+ Read this in [English](README_en.md)
19
+
20
+ GLM-4-9B 是智谱 AI 推出的最新一代预训练模型 GLM-4 系列中的开源版本。 在语义、数学、推理、代码和知识等多方面的数据集测评中,
21
+ **GLM-4-9B** 及其人类偏好对齐的版本 **GLM-4-9B-Chat** 均表现出超越 Llama-3-8B 的卓越性能。除了能进行多轮对话,GLM-4-9B-Chat
22
+ 还具备网页浏览、代码执行、自定义工具调用(Function Call)和长文本推理(支持最大 128K 上下文)等高级功能。本代模型增加了多语言支持,支持包括日语,韩语,德语在内的
23
+ 26 种语言。我们还推出了支持 1M 上下文长度(约 200 万中文字符)的 **GLM-4-9B-Chat-1M** 模型和基于 GLM-4-9B 的多模态模型
24
+ GLM-4V-9B。**GLM-4V-9B** 具备 1120 * 1120 高分辨率下的中英双语多轮对话能力,在中英文综合能力、感知推理、文字识别、图表理解等多方面多模态评测中,GLM-4V-9B
25
+ 表现出超越 GPT-4-turbo-2024-04-09、Gemini
26
+ 1.0 Pro、Qwen-VL-Max 和 Claude 3 Opus 的卓越性能。
27
 
28
  我们在一些典型任务上对 GLM-4-9B 基座模型进行了评测,结果如下:
29
 
 
35
  | GLM-4-9B | **74.7** | **77.1** | **34.3** | **84.0** | **30.4** | **70.1** |
36
 
37
 
38
+ **This repository is the base version of GLM-4-9B, supporting `8K` context length.**
39
 
40
  ## 协议
41
 
42
  GLM-4 模型的权重的使用则需要遵循 [LICENSE](LICENSE)。
43
 
 
 
44
  ## 引用
45
 
46
  如果你觉得我们的工作有帮助的话,请考虑引用下列论文。
README_en.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GLM-4-9B-Chat-1M
2
+
3
+ ## Model Introduction
4
+
5
+ GLM-4-9B is the open-source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu
6
+ AI. In the evaluation of data sets in semantics, mathematics, reasoning, code, and knowledge, **GLM-4-9B**
7
+ and its human preference-aligned version **GLM-4-9B-Chat** have shown superior performance beyond Llama-3-8B. In
8
+ addition to multi-round conversations, GLM-4-9B-Chat also has advanced features such as web browsing, code execution,
9
+ custom tool calls (Function Call), and long text
10
+ reasoning (supporting up to 128K context). This generation of models has added multi-language support, supporting 26
11
+ languages including Japanese, Korean, and German. We have also launched the **GLM-4-9B-Chat-1M** model that supports 1M
12
+ context length (about 2 million Chinese characters) and the multimodal model GLM-4V-9B based on GLM-4-9B.
13
+ **GLM-4V-9B** possesses dialogue capabilities in both Chinese and English at a high resolution of 1120*1120.
14
+ In various multimodal evaluations, including comprehensive abilities in Chinese and English, perception & reasoning,
15
+ text recognition, and chart understanding, GLM-4V-9B demonstrates superior performance compared to
16
+ GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.
17
+
18
+ We evaluated the GLM-4-9B base model on some typical tasks, and the results are as follows:
19
+
20
+ | Model | MMLU | C-Eval | GPQA | GSM8K | MATH | HumanEval |
21
+ |:--------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:---------:|
22
+ | Llama-3-8B | 66.6 | 51.2 | - | 45.8 | - | - |
23
+ | Llama-3-8B-Instruct | 68.4 | 51.3 | 34.2 | 79.6 | 30.0 | 62.2 |
24
+ | ChatGLM3-6B-Base | 61.4 | 69.0 | - | 72.3 | 25.7 | - |
25
+ | GLM-4-9B | **74.7** | **77.1** | **34.3** | **84.0** | **30.4** | **70.1** |
26
+
27
+
28
+
29
+ ## LICENSE
30
+
31
+ The weights of the GLM-4 model are available under the terms of [LICENSE](LICENSE).
32
+
33
+ ## Citations
34
+
35
+ If you find our work useful, please consider citing the following paper.
36
+
37
+ ```
38
+ @article{zeng2022glm,
39
+ title={Glm-130b: An open bilingual pre-trained model},
40
+ author={Zeng, Aohan and Liu, Xiao and Du, Zhengxiao and Wang, Zihan and Lai, Hanyu and Ding, Ming and Yang, Zhuoyi and Xu, Yifan and Zheng, Wendi and Xia, Xiao and others},
41
+ journal={arXiv preprint arXiv:2210.02414},
42
+ year={2022}
43
+ }
44
+ ```
45
+
46
+ ```
47
+ @inproceedings{du2022glm,
48
+ title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},
49
+ author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},
50
+ booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
51
+ pages={320--335},
52
+ year={2022}
53
+ }
54
+ ```
generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "eos_token_id": [
3
+ 151329,
4
+ 151336,
5
+ 151338
6
+ ],
7
+ "pad_token_id": 151329,
8
+ "do_sample": true,
9
+ "temperature": 0.8,
10
+ "max_length": 8192,
11
+ "top_p": 0.8,
12
+ "transformers_version": "4.38.2"
13
+ }