Question about tokenizer
#3
by
freQuensy23
- opened
I've tried to use your model and can't understand some behaviour of its tokenizer:tokenizer('1')
returns 1, 29871, 29896.
1 is BOS token, 29896 is '1' token, but what does 29871 means???
When I decoded it back into string, i get: tokenizer.decode([29871]) = ''
(empty string)
Can you explain me the purpuse of adding empty string to tokenizer's vocab?