README.md · hamishivi/hypertask_T0

metadata

datasets:
  - bigscience/P3
language:
  - en

A 3B T5 model trained on the P3 (T0 split) dataset for 20,000 steps with a batch size of 2048 a maximum input sequence length of 1024, a maximum output sequence length of 256, and the Adafactor optimizer with a constant learning rate of 0.001. The model is trained from the T5 v1.1 lm-adapt checkpoint and fully finetuned.

For more details, see HINT: Hypernetwork Instruction Tuning for Efficient Zero- & Few-Shot Generalisation.

Performance on T0 held-out tasks (average accuracy across prompts using rank classification):

Model	ANLI (avg)	HellaSwag	StoryCloze	CB	COPA	RTE	WiC	WSC	WinoGrande	Average
T0-3B	33.4	27.2	84.0	45.4	75.9	64.6	50.7	65.1	51.0	55.2
hypertask_T0_3B (this model)	41.7	30.1	96.9	72.7	89.1	81.2	51.7	57.2	59.2	64.4