metadata

license: mit
language:
  - en
tags:
  - t5
model-index:
  - name: metro_t0p_basepp
    results:
      - task:
          type: natural-language-inference
        dataset:
          type: super_glue
          name: RTE
          config: rte
          split: validation
        metrics:
          - type: accuracy
            value: 71.44404332129963
      - task:
          type: natural-language-inference
        dataset:
          type: super_glue
          name: CB
          config: cb
          split: validation
        metrics:
          - type: accuracy
            value: 60.714285714285715
      - task:
          type: natural-language-inference
        dataset:
          type: anli
          name: ANLI R1
          split: dev_r1
        metrics:
          - type: accuracy
            value: 36.906666666666666
      - task:
          type: natural-language-inference
        dataset:
          type: anli
          name: ANLI R2
          split: dev_r2
        metrics:
          - type: accuracy
            value: 35.24
      - task:
          type: natural-language-inference
        dataset:
          type: anli
          name: ANLI R3
          split: dev_r3
        metrics:
          - type: accuracy
            value: 36.46666666666666
      - task:
          type: coreference-resolution
        dataset:
          type: super_glue
          name: WSC
          config: wsc.fixed
          split: validation
        metrics:
          - type: accuracy
            value: 62.21153846153847
      - task:
          type: coreference-resolution
        dataset:
          type: winogrande
          name: Winogrande XL
          config: winogrande_xl
          split: validation
        metrics:
          - type: accuracy
            value: 54.08050513022889
      - task:
          type: multiple-choice-qa
        dataset:
          type: super_glue
          name: COPA
          config: copa
          split: validation
        metrics:
          - type: accuracy
            value: 78.875
      - task:
          type: multiple-choice-qa
        dataset:
          type: story_cloze
          name: StoryCloze 2016
          config: '2016'
          split: validation
        metrics:
          - type: accuracy
            value: 90.29396044895778
      - task:
          type: multiple-choice-qa
        dataset:
          type: hellaswag
          name: HellaSwag
          split: validation
        metrics:
          - type: accuracy
            value: 67.56871141206932
      - task:
          type: word-sense-disambiguation
        dataset:
          type: super_glue
          name: WiC
          config: wic
          split: validation
        metrics:
          - type: accuracy
            value: 51.5987460815047

Official repository: https://github.com/gonglinyuan/metro_t0

METRO-T0

Paper: Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers (ACL 2023)

METRO-T0 is a T5-style text-to-text Transformer pretrained using model-generated pretraining signals, prompt-finetuned on a family of public NLP tasks proposed in T0. METRO-T0 is highly parameter efficient. For example, METRO-T0-Large++ (775M parameters) outperforms GPT-3 (175B parameters) and T0-3B (3B parameters) on a wide range of NLP tasks.

Use METRO-T0+-Base++

To use METRO-T0+-Base++ in PyTorch (Python 3.7+, PyTorch 1.12+ and transformers 4.17+ are prerequisites), refer to the code snippet below:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained("gonglinyuan/metro_t0p_basepp", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("gonglinyuan/metro_t0p_basepp", trust_remote_code=True)

input_text = "Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy"
inputs = tokenizer([input_text], max_length=512, truncation=True, add_special_tokens=True, return_tensors="pt").input_ids
outputs = model.generate(inputs, max_new_tokens=256, do_sample=False)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))  # expected: positive

Other METRO-T0 Models

	# Parameters	Pretraining Data	Prompt-Finetuning Data
METRO-T0-Base	226M	Wikibook (16G)	T0 Train
METRO-T0+-Base	226M	Wikibook (16G)	T0+ Train
METRO-T0++-Base	226M	Wikibook (16G)	T0++ Train
METRO-T0-Base++	256M	160G corpus	T0 Train
METRO-T0+-Base++	256M	160G corpus	T0+ Train
METRO-T0++-Base++	256M	160G corpus	T0++ Train
METRO-T0-Large++	775M	160G corpus	T0 Train
METRO-T0+-Large++	775M	160G corpus	T0+ Train
METRO-T0++-Large++	775M	160G corpus	T0++ Train

Citation

If you find the code and models useful for your research, please cite the following paper:

@misc{gong2023modelgenerated,
      title={Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers}, 
      author={Linyuan Gong and Chenyan Xiong and Xiaodong Liu and Payal Bajaj and Yiqing Xie and Alvin Cheung and Jianfeng Gao and Xia Song},
      year={2023},
      eprint={2305.12567},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2305.12567}
}