Trained tokenizer file

by siba07 - opened Apr 27

Discussion

siba07

Apr 27

Hi, can the trained tokenizer model be published?

monsoon-nlp

Owner May 1

I'm not 100% on what you mean. I have the tokenizer's vocab.txt in the file, but not the original BertWordPieceTokenizer object from training.

If you were redoing it, I would use a smaller vocabulary size. There were lot of changes since 2020, so I don't think that we could rebuild it with the same random state, corpus, and code

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment