Update README.md

0a45a86 over 1 year ago

7.77 kB

	---
	datasets:
	- multi_nli
	- snli
	- scitail
	metrics:
	- accuracy
	- f1
	pipeline_tag: zero-shot-classification
	language:
	- en
	model-index:
	- name: AntoineBlanot/flan-t5-xxl-classif-3way
	results:
	- task:
	type: nli # Required. Example: automatic-speech-recognition
	name: Natural Language Inference # Optional. Example: Speech Recognition
	dataset:
	type: multi_nli # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
	name: MultiNLI # Required. A pretty name for the dataset. Example: Common Voice (French)
	split: validation_matched # Optional. Example: test
	metrics:
	- type: accuracy # Required. Example: wer. Use metric id from https://hf.co/metrics
	value: 0.9230769230769231 # Required. Example: 20.90
	name: Validation (matched) accuracy # Optional. Example: Test WER
	- type: f1 # Required. Example: wer. Use metric id from https://hf.co/metrics
	value: 0.9225172687920663 # Required. Example: 20.90
	name: Validation (matched) f1 # Optional. Example: Test WER

	- task:
	type: nli # Required. Example: automatic-speech-recognition
	name: Natural Language Inference # Optional. Example: Speech Recognition
	dataset:
	type: multi_nli # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
	name: MultiNLI # Required. A pretty name for the dataset. Example: Common Voice (French)
	split: validation_mismatched # Optional. Example: test
	metrics:
	- type: accuracy # Required. Example: wer. Use metric id from https://hf.co/metrics
	value: 0.9222945484133441 # Required. Example: 20.90
	name: Validation (mismatched) accuracy # Optional. Example: Test WER

	- type: f1 # Required. Example: wer. Use metric id from https://hf.co/metrics
	value: 0.9216699467726924 # Required. Example: 20.90
	name: Validation (mismatched) f1 # Optional. Example: Test WER

	- task:
	type: nli # Required. Example: automatic-speech-recognition
	name: Natural Language Inference # Optional. Example: Speech Recognition
	dataset:
	type: snli # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
	name: SNLI # Required. A pretty name for the dataset. Example: Common Voice (French)
	split: validation # Optional. Example: test
	metrics:
	- type: accuracy # Required. Example: wer. Use metric id from https://hf.co/metrics
	value: 0.9418817313554155 # Required. Example: 20.90
	name: Validation accuracy # Optional. Example: Test WER

	- type: f1 # Required. Example: wer. Use metric id from https://hf.co/metrics
	value: 0.9416213776111287 # Required. Example: 20.90
	name: Validation f1 # Optional. Example: Test WER

	- task:
	type: nli # Required. Example: automatic-speech-recognition
	name: Natural Language Inference # Optional. Example: Speech Recognition
	dataset:
	type: scitail # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
	name: SciTail # Required. A pretty name for the dataset. Example: Common Voice (French)
	split: validation # Optional. Example: test
	metrics:
	- type: accuracy # Required. Example: wer. Use metric id from https://hf.co/metrics
	value: 0.9662576687116564 # Required. Example: 20.90
	name: Validation accuracy # Optional. Example: Test WER

	- type: f1 # Required. Example: wer. Use metric id from https://hf.co/metrics
	value: 0.6471347983817357 # Required. Example: 20.90
	name: Validation f1 # Optional. Example: Test WER

	---
	# T5ForSequenceClassification
	T5ForSequenceClassification adapts the original [T5](https://github.com/google-research/text-to-text-transfer-transformer) architecture for sequence classification tasks.

	T5 was originally built for text-to-text tasks and excels in it.
	It can handle any NLP task if it has been converted to a text-to-text format, including sequence classification task!
	You can find [here](https://huggingface.co/google/flan-t5-base?text=Premise%3A++At+my+age+you+will+probably+have+learnt+one+lesson.+Hypothesis%3A++It%27s+not+certain+how+many+lessons+you%27ll+learn+by+your+thirties.+Does+the+premise+entail+the+hypothesis%3F) how the original T5 is used for sequence classification task.

	Our motivations for building T5ForSequenceClassification is that the full original T5 architecture is not needed for most NLU tasks. Indeed, NLU tasks generally do not require to generate text and thus a large decoder is unnecessary.
	By removing the decoder we can half the original number of parameters (thus half the computation cost) and efficiently optimize the network for the given task.

	## Table of Contents

	0. [Usage](#usage)
	1. [Why use T5ForSequenceClassification?](#why-use-t5forsequenceclassification)
	2. [T5ForClassification vs T5](#t5forclassification-vs-t5)
	3. [Results](#results)

	## Usage
	T5ForSequenceClassification supports the task of zero-shot classification.
	It can direclty be used for:
	- topic classification
	- intent recognition
	- boolean question answering
	- sentiment analysis
	- and any other task which goal is to clasify a text...

	Since the T5ForClassification class is currently not supported by the transformers library, you cannot direclty use this model on the Hub.
	To use T5ForSequenceClassification, you will have to install additional packages and model weights.
	You can find instructions [here](https://github.com/AntoineBlanot/zero-nlp).


	## Why use T5ForSequenceClassification?
	Models based on the [BERT](https://huggingface.co/bert-large-uncased) architecture like [RoBERTa](https://huggingface.co/roberta-large) and [DeBERTa](https://huggingface.co/microsoft/deberta-v2-xxlarge) have shown very strong performance on sequence classification task and are still widely used today.
	However, those models only scale up to ~1.5B parameters (DeBERTa xxlarge) resulting in a limited knowledge compare to bigger models.
	On the other hand, models based on the T5 architecture scale up to ~11B parameters (t5-xxl) and innovations with this architecture are very recent and keeps improving ([mT5](https://huggingface.co/google/mt5-xxl), [Flan-T5](https://huggingface.co/google/flan-t5-xxl), [UL2](https://huggingface.co/google/ul2), [Flan-UL2](https://huggingface.co/google/flan-ul2), and probably more...)

	## T5ForClassification vs T5
	T5ForClassification Architecture:
	- Encoder: same as original T5
	- Decoder: only the first layer (for pooling purpose)
	- Classification head: simple Linear layer on top of the decoder

	Benefits and Drawbacks:
	- (+) Keeps T5 encoding strength
	- (+) Parameters size is half
	- (+) Interpretable outputs (class logits)
	- (+) No generation mistakes and faster prediction (no generation latency)
	- (-) Looses text-to-text ability

	## Results
	Results on the validation data of training tasks:
	\| Dataset \| Accuracy \| F1 \|
	\|:-------:\|:--------:\|:--:\|
	\| MNLI (m)\| 0.923 \| 0.923 \|
	\| MNLI (mm) \| 0.922 \| 0.922 \|
	\| SNLI \| 0.942 \| 0.942 \|
	\| SciTail \| 0.966 \| 0.647 \|

	Results on validation data of unseen tasks (zero-shot):
	\| Dataset \| Accuracy \| F1 \|
	\|:-------:\|:--------:\|:--:\|
	\| ?\| ? \| ? \|

	Special thanks to [philschmid](https://huggingface.co/philschmid) for making a Flan-T5-xxl [checkpoint](https://huggingface.co/philschmid/flan-t5-xxl-sharded-fp16) in fp16.