english readme

Browse files

Files changed (3) hide show

README.md +10 -8
README_en.md +54 -0
generation_config.json +13 -0

README.md CHANGED Viewed

@@ -15,11 +15,15 @@ inference: false
 # GLM-4-9B
-GLM-4-9B 是智谱 AI 推出的最新一代预训练模型 GLM-4 系列中的开源版本。
-在语义、数学、推理、代码和知识等多方面的数据集测评中，GLM-4-9B 及其人类偏好对齐的版本 GLM-4-9B-Chat 均表现出较高的性能。
-除了能进行多轮对话，GLM-4-9B-Chat 还具备网页浏览、代码执行、自定义工具调用（Function Call）和长文本推理（支持最大 128K
-上下文）等高级功能。
-本代模型增加了多语言支持，支持包括日语，韩语，德语在内的 26 种语言。我们还推出了支持 1M 上下文长度（约 200 万中文字符）的模型。
 我们在一些典型任务上对 GLM-4-9B 基座模型进行了评测，结果如下：
@@ -31,14 +35,12 @@ GLM-4-9B 是智谱 AI 推出的最新一代预训练模型 GLM-4 系列中的开
 | GLM-4-9B            | **74.7** | **77.1** | **34.3** | **84.0** | **30.4** | **70.1**  |
-**本仓库是 GLM-4-9B 的基座版本，支持`8K`上下文长度。**
 ## 协议
 GLM-4 模型的权重的使用则需要遵循 [LICENSE](LICENSE)。
-Rhe use of the GLM-4 model weights needs to comply with the [LICENSE](LICENSE).
 ## 引用
 如果你觉得我们的工作有帮助的话，请考虑引用下列论文。

 # GLM-4-9B
+Read this in [English](README_en.md)
+GLM-4-9B 是智谱 AI 推出的最新一代预训练模型 GLM-4 系列中的开源版本。 在语义、数学、推理、代码和知识等多方面的数据集测评中，
+**GLM-4-9B** 及其人类偏好对齐的版本 **GLM-4-9B-Chat** 均表现出超越 Llama-3-8B 的卓越性能。除了能进行多轮对话，GLM-4-9B-Chat
+还具备网页浏览、代码执行、自定义工具调用（Function Call）和长文本推理（支持最大 128K 上下文）等高级功能。本代模型增加了多语言支持，支持包括日语，韩语，德语在内的
+26 种语言。我们还推出了支持 1M 上下文长度（约 200 万中文字符）的 **GLM-4-9B-Chat-1M** 模型和基于 GLM-4-9B 的多模态模型
+GLM-4V-9B。**GLM-4V-9B** 具备 1120 * 1120 高分辨率下的中英双语多轮对话能力，在中英文综合能力、感知推理、文字识别、图表理解等多方面多模态评测中，GLM-4V-9B
+表现出超越 GPT-4-turbo-2024-04-09、Gemini
+1.0 Pro、Qwen-VL-Max 和 Claude 3 Opus 的卓越性能。
 我们在一些典型任务上对 GLM-4-9B 基座模型进行了评测，结果如下：
 | GLM-4-9B            | **74.7** | **77.1** | **34.3** | **84.0** | **30.4** | **70.1**  |
+**This repository is the base version of GLM-4-9B, supporting `8K` context length.**
 ## 协议
 GLM-4 模型的权重的使用则需要遵循 [LICENSE](LICENSE)。
 ## 引用
 如果你觉得我们的工作有帮助的话，请考虑引用下列论文。

README_en.md ADDED Viewed

	@@ -0,0 +1,54 @@

+# GLM-4-9B-Chat-1M
+## Model Introduction
+GLM-4-9B is the open-source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu
+AI. In the evaluation of data sets in semantics, mathematics, reasoning, code, and knowledge, **GLM-4-9B**
+and its human preference-aligned version **GLM-4-9B-Chat** have shown superior performance beyond Llama-3-8B. In
+addition to multi-round conversations, GLM-4-9B-Chat also has advanced features such as web browsing, code execution,
+custom tool calls (Function Call), and long text
+reasoning (supporting up to 128K context). This generation of models has added multi-language support, supporting 26
+languages including Japanese, Korean, and German. We have also launched the **GLM-4-9B-Chat-1M** model that supports 1M
+context length (about 2 million Chinese characters) and the multimodal model GLM-4V-9B based on GLM-4-9B.
+**GLM-4V-9B** possesses dialogue capabilities in both Chinese and English at a high resolution of 1120*1120.
+In various multimodal evaluations, including comprehensive abilities in Chinese and English, perception & reasoning,
+text recognition, and chart understanding, GLM-4V-9B demonstrates superior performance compared to
+GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.
+We evaluated the GLM-4-9B base model on some typical tasks, and the results are as follows:
+| Model               |   MMLU   |  C-Eval  |   GPQA   |  GSM8K   |   MATH   | HumanEval |
+|:--------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:---------:|
+| Llama-3-8B          |   66.6   |   51.2   |    -     |   45.8   |    -     |     -     |
+| Llama-3-8B-Instruct |   68.4   |   51.3   |   34.2   |   79.6   |   30.0   |   62.2    |
+| ChatGLM3-6B-Base    |   61.4   |   69.0   |    -     |   72.3   |   25.7   |     -     |
+| GLM-4-9B            | **74.7** | **77.1** | **34.3** | **84.0** | **30.4** | **70.1**  |
+## LICENSE
+The weights of the GLM-4 model are available under the terms of [LICENSE](LICENSE).
+## Citations
+If you find our work useful, please consider citing the following paper.
+```
+@article{zeng2022glm,
+  title={Glm-130b: An open bilingual pre-trained model},
+  author={Zeng, Aohan and Liu, Xiao and Du, Zhengxiao and Wang, Zihan and Lai, Hanyu and Ding, Ming and Yang, Zhuoyi and Xu, Yifan and Zheng, Wendi and Xia, Xiao and others},
+  journal={arXiv preprint arXiv:2210.02414},
+  year={2022}
+}
+```
+```
+@inproceedings{du2022glm,
+  title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},
+  author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},
+  booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
+  pages={320--335},
+  year={2022}
+}
+```

generation_config.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "eos_token_id": [
+    151329,
+    151336,
+    151338
+  ],
+  "pad_token_id": 151329,
+  "do_sample": true,
+  "temperature": 0.8,
+  "max_length": 8192,
+  "top_p": 0.8,
+  "transformers_version": "4.38.2"
+}