Model token size is bigger than tokenizer size?
#97
by
fahadh4ilyas
- opened
Tokenizer vocab size is 50295
but embedding and head size is 51200
. is it intentional?
This is a good reference: https://huggingface.co/microsoft/phi-2/discussions/22#659d8ba950c1bbee5be6f179
We ended up setting 51200 as the vocabulary size just to accommodate any new tokens that we might need in the future. You can follow @Deepakvictor answer and it should fix the issue.
As far as I know, no tokens from 50295+ should be generated because those embeddings were not trained. Though, depending on the generation's parameters, they could appear (low probabilities however).
gugarosa
changed discussion status to
closed