Papers
arxiv:2203.17189

Scaling Up Models and Data with t5x and seqio

Published on Mar 31, 2022
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we present two software libraries that ease these issues: t5x simplifies the process of building and training large language models at scale while maintaining ease of use, and seqio provides a task-based API for simple creation of fast and reproducible training data and evaluation pipelines. These open-source libraries have been used to train models with hundreds of billions of parameters on datasets with multiple terabytes of training data. Along with the libraries, we release configurations and instructions for T5-like encoder-decoder models as well as GPT-like decoder-only architectures. t5x and seqio are open source and available at https://github.com/google-research/t5x and https://github.com/google/seqio, respectively.

Community

Introduces t5x and seqio libraries: t5x allows building and training LLMs at scale while being easy to use; seqio helps create datasets and evaluation pipelines; also released T5 (encoder-decoder) and GPT (decoder-only) implementation as examples. T5x is built over JAX with focus on transformer-based models (flaxformer) training on parallel TPU clusters (pjit), but also supports GPU and CPU. Supports parallelism like ZeRO-3, FSDP, and Megatron. Seqio has a task-based API for getting data from source, preprocessing, post-processing, etc. pipelines. Gin configuration/customisation system, optimized for TPU pods on GCP. From Google.

Links: GitHub: google-research/t5x, GitHub: google/seqio

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2203.17189 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2203.17189 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 6