hypertask_T0_3B / README.md
hamishivi's picture
Create README.md
9469f0d
metadata
datasets:
  - bigscience/P3
language:
  - en

A 3B T5 model trained on the P3 (T0 split) dataset for 20,000 steps with a batch size of 2048 a maximum input sequence length of 1024, a maximum output sequence length of 256, and the Adafactor optimizer with a constant learning rate of 0.001. The model is trained from the T5 v1.1 lm-adapt checkpoint and fully finetuned.

For more details, see HINT: Hypernetwork Instruction Tuning for Efficient Zero- & Few-Shot Generalisation.

Performance on T0 held-out tasks (average accuracy across prompts using rank classification):

Model ANLI (avg) HellaSwag StoryCloze CB COPA RTE WiC WSC WinoGrande Average
T0-3B 33.4 27.2 84.0 45.4 75.9 64.6 50.7 65.1 51.0 55.2
hypertask_T0_3B (this model) 41.7 30.1 96.9 72.7 89.1 81.2 51.7 57.2 59.2 64.4