m2mtokenizer doesn't know the word "wouldn't"
#2
by
anzorq
- opened
I accidentally discovered that the tokenizer tokenizes the word "wouldn't" as ['<unk>', "'", 't'].
It doesn't seem to affect model's performance, but makes me wonder what else the tokenizer doesn't have in its vocabulary.
This comment has been hidden