Unbabel/gec-t5_small · Hugging Face

This model is an implementation of the paper A Simple Recipe for Multilingual Grammatical Error Correction from Google where they report the State of the art score in the task of Grammatical Error Correction (GEC). We implement the version with the T5-small with the reported F_0.5 score in the paper (60.70).

To effectively use the "Hosted inference API", write "gec: [YOUR SENTENCE HERE]".

In order to use the model, look at the following snippet:

from transformers import T5ForConditionalGeneration, T5Tokenizer

model = T5ForConditionalGeneration.from_pretrained("Unbabel/gec-t5_small")
tokenizer = T5Tokenizer.from_pretrained('t5-small')

sentence = "I like to swimming"
tokenized_sentence = tokenizer('gec: ' + sentence, max_length=128, truncation=True, padding='max_length', return_tensors='pt')
corrected_sentence = tokenizer.decode(
    model.generate(
        input_ids = tokenized_sentence.input_ids,
        attention_mask = tokenized_sentence.attention_mask, 
        max_length=128,
        num_beams=5,
        early_stopping=True,
    )[0],
    skip_special_tokens=True, 
    clean_up_tokenization_spaces=True
)
print(corrected_sentence) # -> I like swimming.

Unbabel
/

gec-t5_small

Model tree for Unbabel/gec-t5_small

Spaces using Unbabel/gec-t5_small 2