Papers
arxiv:2311.15786

YUAN 2.0: A Large Language Model with Localized Filtering-based Attention

Published on Nov 27, 2023
Authors:
,
,
,
,
,
,
,
,
,
,
,

Abstract

In this work, the Localized Filtering-based Attention (LFA) is introduced to incorporate prior knowledge of local dependencies of natural language into Attention. Based on LFA, we develop and release Yuan 2.0, a large language model with parameters ranging from 2.1 billion to 102.6 billion. A data filtering and generation method is presented to build pretraining and fine-tuning dataset in high quality. A distributed training method with non-uniform pipeline parallel, data parallel, and optimizer parallel is proposed, which greatly reduces the bandwidth requirements of intra-node communication, and achieves good performance in large-scale distributed training. Yuan 2.0 models display impressive ability in code generation, math problem-solving, and chat compared with existing models. The latest version of YUAN 2.0, including model weights and source code, is accessible at Github.

Community

Repository is here, incl. English readme!

Sign up or log in to comment

Models citing this paper 8

Browse 8 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2311.15786 in a dataset README.md to link it from this page.

Spaces citing this paper 11

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.