|
# GLM-4-9B |
|
|
|
## Model Introduction |
|
|
|
GLM-4-9B is the open-source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu |
|
AI. In the evaluation of data sets in semantics, mathematics, reasoning, code, and knowledge, **GLM-4-9B** |
|
and its human preference-aligned version **GLM-4-9B-Chat** have shown superior performance beyond Llama-3-8B. In |
|
addition to multi-round conversations, GLM-4-9B-Chat also has advanced features such as web browsing, code execution, |
|
custom tool calls (Function Call), and long text |
|
reasoning (supporting up to 128K context). This generation of models has added multi-language support, supporting 26 |
|
languages including Japanese, Korean, and German. We have also launched the **GLM-4-9B-Chat-1M** model that supports 1M |
|
context length (about 2 million Chinese characters) and the multimodal model GLM-4V-9B based on GLM-4-9B. |
|
**GLM-4V-9B** possesses dialogue capabilities in both Chinese and English at a high resolution of 1120*1120. |
|
In various multimodal evaluations, including comprehensive abilities in Chinese and English, perception & reasoning, |
|
text recognition, and chart understanding, GLM-4V-9B demonstrates superior performance compared to |
|
GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus. |
|
|
|
We evaluated the GLM-4-9B base model on some typical tasks, and the results are as follows: |
|
|
|
| Model | MMLU | C-Eval | GPQA | GSM8K | MATH | HumanEval | |
|
|:--------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:---------:| |
|
| Llama-3-8B | 66.6 | 51.2 | - | 45.8 | - | - | |
|
| Llama-3-8B-Instruct | 68.4 | 51.3 | 34.2 | 79.6 | 30.0 | 62.2 | |
|
| ChatGLM3-6B-Base | 61.4 | 69.0 | - | 72.3 | 25.7 | - | |
|
| GLM-4-9B | **74.7** | **77.1** | **34.3** | **84.0** | **30.4** | **70.1** | |
|
|
|
**This repository is the base version of GLM-4-9B, supporting 8K context length.** |
|
|
|
## LICENSE |
|
|
|
The weights of the GLM-4 model are available under the terms of [LICENSE](LICENSE). |
|
|
|
## Citations |
|
|
|
If you find our work useful, please consider citing the following paper. |
|
|
|
``` |
|
@article{zeng2022glm, |
|
title={Glm-130b: An open bilingual pre-trained model}, |
|
author={Zeng, Aohan and Liu, Xiao and Du, Zhengxiao and Wang, Zihan and Lai, Hanyu and Ding, Ming and Yang, Zhuoyi and Xu, Yifan and Zheng, Wendi and Xia, Xiao and others}, |
|
journal={arXiv preprint arXiv:2210.02414}, |
|
year={2022} |
|
} |
|
``` |
|
|
|
``` |
|
@inproceedings{du2022glm, |
|
title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling}, |
|
author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie}, |
|
booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, |
|
pages={320--335}, |
|
year={2022} |
|
} |
|
``` |
|
|