hamishivi
/

hypertask_T0_3B

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hypertask_T0_3B / README.md

hamishivi's picture

Create README.md

9469f0d over 1 year ago

|

history blame contribute delete

1.07 kB

	---
	datasets:
	- bigscience/P3
	language:
	- en
	---

	A 3B T5 model trained on the [P3](https://huggingface.co/datasets/bigscience/P3) (T0 split) dataset for 20,000 steps with a batch size of 2048 a maximum input sequence length of 1024, a maximum output sequence length of 256, and the Adafactor optimizer with a constant learning rate of 0.001.
	The model is trained from the [T5 v1.1 lm-adapt checkpoint](https://huggingface.co/google/t5-xl-lm-adapt) and fully finetuned.

	For more details, see [HINT: Hypernetwork Instruction Tuning for Efficient Zero- & Few-Shot Generalisation](https://arxiv.org/abs/2212.10315).

	Performance on T0 held-out tasks (average accuracy across prompts using rank classification):

	\| Model \| ANLI (avg) \| HellaSwag \| StoryCloze \| CB \| COPA \| RTE \| WiC \| WSC \| WinoGrande \| Average \|
	\|--\|--\|--\|--\|--\|--\|--\|--\|--\|--\|--\|
	\| [T0-3B](https://huggingface.co/bigscience/T0_3B) \| 33.4 \| 27.2 \| 84.0 \| 45.4 \| 75.9 \| 64.6 \| 50.7 \| 65.1 \| 51.0 \| 55.2 \|
	\| hypertask_T0_3B (this model) \| 41.7 \| 30.1 \| 96.9 \| 72.7 \| 89.1 \| 81.2 \| 51.7 \| 57.2 \| 59.2 \| 64.4 \|