m2mtokenizer doesn't know the word "wouldn't"

#2
by anzorq - opened

I accidentally discovered that the tokenizer tokenizes the word "wouldn't" as ['<unk>', "'", 't'].

It doesn't seem to affect model's performance, but makes me wonder what else the tokenizer doesn't have in its vocabulary.

This comment has been hidden

Sign up or log in to comment