Tokenizer mismatch all the time
#47
by
tian9
- opened
is it related?
The original Llama 3 8b (base) special token weights are zero, which might cause NaN gradients. This version re-initialized the weights of all the following special tokens to alleviate the problem.
https://huggingface.co/imone/Llama-3-8B-fixed-special-embedding
I have no idea what your tokenization missmatch is, but make sure that the tokenizer you are using is of the PreTrainedTokenizerFast
class, not the LlamaTokenizerFast
.
It should be completely possible otherwise!