license: mit
language:
- en
tags:
- t5
model-index:
- name: metro_t0p_basepp
results:
- task:
type: natural-language-inference
dataset:
type: super_glue
name: RTE
config: rte
split: validation
metrics:
- type: accuracy
value: 71.44404332129963
- task:
type: natural-language-inference
dataset:
type: super_glue
name: CB
config: cb
split: validation
metrics:
- type: accuracy
value: 60.714285714285715
- task:
type: natural-language-inference
dataset:
type: anli
name: ANLI R1
split: dev_r1
metrics:
- type: accuracy
value: 36.906666666666666
- task:
type: natural-language-inference
dataset:
type: anli
name: ANLI R2
split: dev_r2
metrics:
- type: accuracy
value: 35.24
- task:
type: natural-language-inference
dataset:
type: anli
name: ANLI R3
split: dev_r3
metrics:
- type: accuracy
value: 36.46666666666666
- task:
type: coreference-resolution
dataset:
type: super_glue
name: WSC
config: wsc.fixed
split: validation
metrics:
- type: accuracy
value: 62.21153846153847
- task:
type: coreference-resolution
dataset:
type: winogrande
name: Winogrande XL
config: winogrande_xl
split: validation
metrics:
- type: accuracy
value: 54.08050513022889
- task:
type: multiple-choice-qa
dataset:
type: super_glue
name: COPA
config: copa
split: validation
metrics:
- type: accuracy
value: 78.875
- task:
type: multiple-choice-qa
dataset:
type: story_cloze
name: StoryCloze 2016
config: '2016'
split: validation
metrics:
- type: accuracy
value: 90.29396044895778
- task:
type: multiple-choice-qa
dataset:
type: hellaswag
name: HellaSwag
split: validation
metrics:
- type: accuracy
value: 67.56871141206932
- task:
type: word-sense-disambiguation
dataset:
type: super_glue
name: WiC
config: wic
split: validation
metrics:
- type: accuracy
value: 51.5987460815047
Official repository: https://github.com/gonglinyuan/metro_t0
METRO-T0
Paper: Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers (ACL 2023)
METRO-T0 is a T5-style text-to-text Transformer pretrained using model-generated pretraining signals, prompt-finetuned on a family of public NLP tasks proposed in T0. METRO-T0 is highly parameter efficient. For example, METRO-T0-Large++ (775M parameters) outperforms GPT-3 (175B parameters) and T0-3B (3B parameters) on a wide range of NLP tasks.
Use METRO-T0+-Base++
To use METRO-T0+-Base++ in PyTorch (Python 3.7+, PyTorch 1.12+ and transformers 4.17+ are prerequisites), refer to the code snippet below:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("gonglinyuan/metro_t0p_basepp", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("gonglinyuan/metro_t0p_basepp", trust_remote_code=True)
input_text = "Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy"
inputs = tokenizer([input_text], max_length=512, truncation=True, add_special_tokens=True, return_tensors="pt").input_ids
outputs = model.generate(inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) # expected: positive
Other METRO-T0 Models
# Parameters | Pretraining Data | Prompt-Finetuning Data | |
---|---|---|---|
METRO-T0-Base | 226M | Wikibook (16G) | T0 Train |
METRO-T0+-Base | 226M | Wikibook (16G) | T0+ Train |
METRO-T0++-Base | 226M | Wikibook (16G) | T0++ Train |
METRO-T0-Base++ | 256M | 160G corpus | T0 Train |
METRO-T0+-Base++ | 256M | 160G corpus | T0+ Train |
METRO-T0++-Base++ | 256M | 160G corpus | T0++ Train |
METRO-T0-Large++ | 775M | 160G corpus | T0 Train |
METRO-T0+-Large++ | 775M | 160G corpus | T0+ Train |
METRO-T0++-Large++ | 775M | 160G corpus | T0++ Train |
Citation
If you find the code and models useful for your research, please cite the following paper:
@misc{gong2023modelgenerated,
title={Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers},
author={Linyuan Gong and Chenyan Xiong and Xiaodong Liu and Payal Bajaj and Yiqing Xie and Alvin Cheung and Jianfeng Gao and Xia Song},
year={2023},
eprint={2305.12567},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2305.12567}
}