AssertionError: Non-consecutive added token 'θ' found. Should have index 21170 but has index 21128 in saved vocabulary.
#1
by
HokyeeJau
- opened
There is something wrong with the Chinese character indexes.
When I first loaded the tokenizer, according to the screenshot attached, it seemed the index is wrong.
Following the tips, I changed the index in the added_tokens.json file, but there is another wrong information for the same reason that came out.
I am wondering if I could do anything to avoid this kind of error.
Thank you.