arxiv:2304.14402

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

Published on Apr 27, 2023

Upvote

Authors:

Chiyu Zhang ,

Muhammad Abdul-Mageed ,

Alham Fikri Aji

Abstract

Large language models (LLMs) with instruction finetuning demonstrate superior generative capabilities. However, these models are resource intensive. To alleviate this issue, we explore distilling knowledge from instruction-tuned LLMs to much smaller ones. To this end, we carefully develop a large set of 2.58M instructions based on both existing and newly-generated instructions. In addition to being sizeable, we design our instructions to cover a broad set of topics to ensure. A thorough investigation of our instruction data demonstrate their diversity, and we generate responses for these instructions using gpt-3.5-turbo. We then exploit the instructions to tune a host of models, dubbed LaMini-LM, of varying sizes, both from the encoder-decoder as well as the decoder-only families. We evaluate our models both automatically (on 15 different NLP benchmarks) and manually. Results show that our proposed LaMini-LM are on par with competitive baselines while being nearly 10 times smaller in size.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 22

Browse 22 models citing this paper

Datasets citing this paper 3

Spaces citing this paper 88

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.