Upload tokenizer.json
#2
by
jonatanklosko
- opened
The persisted tokenizer.json
does not have the template processor for adding special tokens. transformers
overrides the processor on load, but when loading tokenizer.json
directly with the Rust tokenizers it's nice to have the processor there already (which worked so far in case of other models). This basically re-saves the tokenizer to match exactly what is loaded by transformers
.
Generated with:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-1.3b-base")
assert tokenizer.is_fast
tokenizer.save_pretrained("...")