What are the specifications for the tokenizer? How big is vocab size, and how can use with tokenizer in huggingface? Can allow for special tokens?
· Sign up or log in to comment