flan-t5-large-spelling-peft

This model is an experimental peft adapter for google/flan-t5-large trained on the wiki.en dataset from oliverguhr/spelling.

It achieves the following results on the evaluation set:

Loss: 0.2537
Rouge1: 95.8905
Rouge2: 91.9178
Rougel: 95.8459
Rougelsum: 95.8393
Gen Len: 33.61

Model description

This an experimental model that should be capable of fixing typos and punctuation.

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, pipeline

model_id = "google/flan-t5-large"
peft_model_id = "jbochi/flan-t5-large-spelling-peft"

model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
model.load_adapter(peft_model_id)

tokenizer = AutoTokenizer.from_pretrained(model_id)

pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer)
pipe("Fix spelling: This restuarant is awesome")
# [{'generated_text': 'This restaurant is awesome'}]

Intended uses & limitations

Intented for research purposes.

It may produce artifacts.
Doesn't seen capable of fixing multiple errors in a single sentence.
It doesn't support languages other than English.
It was fine-tuned with a max_length of 100 tokens.

Training and evaluation data

Data from oliverguhr/spelling, with a "Fix spelling: " prefix added to every example.

The model was only evaluated on the first 100 test examples only during training.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
0.3359	0.05	500	0.2738	95.8385	91.6723	95.7821	95.766	33.5
0.2853	0.11	1000	0.2702	95.7124	91.5043	95.656	95.651	33.53
0.2691	0.16	1500	0.2691	95.735	91.7108	95.7039	95.7067	33.41
0.2596	0.21	2000	0.2663	95.9819	92.0897	95.9519	95.9488	33.51
0.2536	0.27	2500	0.2621	95.7519	91.5445	95.6614	95.6622	33.49
0.2472	0.32	3000	0.2626	95.7052	91.7321	95.6476	95.6512	33.58
0.2448	0.37	3500	0.2669	95.8003	91.7949	95.7536	95.7576	33.57
0.2345	0.43	4000	0.2582	95.8784	92.008	95.8284	95.8343	33.65
0.2345	0.48	4500	0.2629	95.8131	91.9088	95.7624	95.766	33.63
0.2284	0.53	5000	0.2585	95.8552	91.9833	95.8105	95.8135	33.62
0.2266	0.59	5500	0.2591	95.9205	92.0577	95.8689	95.8718	33.61
0.2281	0.64	6000	0.2605	95.9172	91.9782	95.874	95.8638	33.59
0.2228	0.69	6500	0.2566	95.7612	91.7858	95.7129	95.7058	33.63
0.2202	0.75	7000	0.2561	95.9468	92.0914	95.9018	95.8941	33.64
0.218	0.8	7500	0.2579	95.9468	92.0914	95.9018	95.8941	33.64
0.2162	0.85	8000	0.2523	95.8231	91.9464	95.7727	95.7758	33.66
0.2135	0.91	8500	0.2549	95.8388	91.9804	95.7914	95.7917	33.63
0.2124	0.96	9000	0.2537	95.8905	91.9178	95.8459	95.8393	33.61

Framework versions

Transformers 4.35.2
Pytorch 2.1.0+cu121
Datasets 2.16.0
Tokenizers 0.15.0

jbochi
/

flan-t5-large-spelling-peft