README.md · hamishivi/hypertask_T0

metadata

datasets:
  - bigscience/P3
language:
  - en

An 11B T5 model trained on the P3 (T0 split) dataset for 20,000 steps with a batch size of 2048 a maximum input sequence length of 1024, a maximum output sequence length of 256, and the Adafactor optimizer with a constant learning rate of 0.001. The model is trained from the T5 v1.1 lm-adapt checkpoint and fully finetuned.

For more details, see HINT: Hypernetwork Instruction Tuning for Efficient Zero- & Few-Shot Generalisation.

Performance on T0 held-out tasks (average accuracy across prompts using rank classification):

Model	ANLI (avg)	HellaSwag	StoryCloze	CB	COPA	RTE	WiC	WSC	WinoGrande	Average
T0-11B	41.0	33.6	92.4	70.1	91.5	81.0	56.1	61.1	59.9	65.2
hypertask_T0_11B (this model)	46.8	34.1	98.2	81.2	96.6	84.0	52.1	62.6	64.8	68.9