THUDM
/

glm-4-9b

Text Generation

Model card Files Files and versions Community

glm-4-9b / README_en.md

Ubuntu

english readme

b7588ac 6 months ago

|

2.92 kB

	# GLM-4-9B

	## Model Introduction

	GLM-4-9B is the open-source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu
	AI. In the evaluation of data sets in semantics, mathematics, reasoning, code, and knowledge, GLM-4-9B
	and its human preference-aligned version GLM-4-9B-Chat have shown superior performance beyond Llama-3-8B. In
	addition to multi-round conversations, GLM-4-9B-Chat also has advanced features such as web browsing, code execution,
	custom tool calls (Function Call), and long text
	reasoning (supporting up to 128K context). This generation of models has added multi-language support, supporting 26
	languages including Japanese, Korean, and German. We have also launched the GLM-4-9B-Chat-1M model that supports 1M
	context length (about 2 million Chinese characters) and the multimodal model GLM-4V-9B based on GLM-4-9B.
	GLM-4V-9B possesses dialogue capabilities in both Chinese and English at a high resolution of 1120*1120.
	In various multimodal evaluations, including comprehensive abilities in Chinese and English, perception & reasoning,
	text recognition, and chart understanding, GLM-4V-9B demonstrates superior performance compared to
	GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.

	We evaluated the GLM-4-9B base model on some typical tasks, and the results are as follows:

	\| Model \| MMLU \| C-Eval \| GPQA \| GSM8K \| MATH \| HumanEval \|
	\|:--------------------\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:---------:\|
	\| Llama-3-8B \| 66.6 \| 51.2 \| - \| 45.8 \| - \| - \|
	\| Llama-3-8B-Instruct \| 68.4 \| 51.3 \| 34.2 \| 79.6 \| 30.0 \| 62.2 \|
	\| ChatGLM3-6B-Base \| 61.4 \| 69.0 \| - \| 72.3 \| 25.7 \| - \|
	\| GLM-4-9B \| 74.7 \| 77.1 \| 34.3 \| 84.0 \| 30.4 \| 70.1 \|

	This repository is the base version of GLM-4-9B, supporting 8K context length.

	## LICENSE

	The weights of the GLM-4 model are available under the terms of [LICENSE](LICENSE).

	## Citations

	If you find our work useful, please consider citing the following paper.

	```
	@article{zeng2022glm,
	title={Glm-130b: An open bilingual pre-trained model},
	author={Zeng, Aohan and Liu, Xiao and Du, Zhengxiao and Wang, Zihan and Lai, Hanyu and Ding, Ming and Yang, Zhuoyi and Xu, Yifan and Zheng, Wendi and Xia, Xiao and others},
	journal={arXiv preprint arXiv:2210.02414},
	year={2022}
	}
	```

	```
	@inproceedings{du2022glm,
	title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},
	author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},
	booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
	pages={320--335},
	year={2022}
	}
	```