Trained tokenizer file
#5
by
siba07
- opened
Hi, can the trained tokenizer model be published?
I'm not 100% on what you mean. I have the tokenizer's vocab.txt in the file, but not the original BertWordPieceTokenizer object from training.
If you were redoing it, I would use a smaller vocabulary size. There were lot of changes since 2020, so I don't think that we could rebuild it with the same random state, corpus, and code