Trained tokenizer file

#5
by siba07 - opened

Hi, can the trained tokenizer model be published?

I'm not 100% on what you mean. I have the tokenizer's vocab.txt in the file, but not the original BertWordPieceTokenizer object from training.

If you were redoing it, I would use a smaller vocabulary size. There were lot of changes since 2020, so I don't think that we could rebuild it with the same random state, corpus, and code

Sign up or log in to comment