Papers
arxiv:2310.19019

TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise

Published on Oct 29, 2023
ยท Submitted by akhaliq on Oct 31, 2023
Authors:
,
,
,
Rui Lu ,
,
,
,

Abstract

Large Language Models (LLMs) exhibit impressive reasoning and data augmentation capabilities in various NLP tasks. However, what about small models? In this work, we propose TeacherLM-7.1B, capable of annotating relevant fundamentals, chain of thought, and common mistakes for most NLP samples, which makes annotation more than just an answer, thus allowing other models to learn "why" instead of just "what". The TeacherLM-7.1B model achieved a zero-shot score of 52.3 on MMLU, surpassing most models with over 100B parameters. Even more remarkable is its data augmentation ability. Based on TeacherLM-7.1B, we augmented 58 NLP datasets and taught various student models with different parameters from OPT and BLOOM series in a multi-task setting. The experimental results indicate that the data augmentation provided by TeacherLM has brought significant benefits. We will release the TeacherLM series of models and augmented datasets as open-source.

Community

But how do you know TeacherLM isn't introducing mistakes of its own? Doesn't seem like this approach would work if your baseline model is already quite good - BLOOM and OPT baseline are pretty low bars to clear.

The benchmarks do seem quite weak. The model benches a good 20 points less than models that are a tenth of it's size

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2310.19019 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2310.19019 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2310.19019 in a Space README.md to link it from this page.

Collections including this paper 14