fix vocab size
Have you tested this? The model's weights have 32128 embedding dim so I feel like this would break no?
Have you tested this? The model's weights have 32128 embedding dim so I feel like this would break no?
No I didn't test this and according to the docs you could be right see here .
Does it work with VLLM for you? See also example config.json from OpenOrca for comparison. Probably related to resize_token_embeddings_to_32x
(but why its not 32032 then?) .
And seems to be an issue e.g. also here: https://github.com/huggingface/transformers/issues/4875
I have no idea what the right solution is or whether this is more a bug in VLLM; probably it would work to resize token embeddings after training again ( model.resize_token_embeddings(embeddings_len)
) to get a match for usable vocab size and embeddings?
Feel free to close, just wanted to make you aware of this issue :).
I think the real solution is to 1. raise an issue with vLLM and hope they fix it or 2. add dummy tokens to the tokenizer. I resized the embeddings to a multiple of 128 since this is what is apparently most efficient on h100+ GPUs. Your idea of resizing back down might also be a good and easy solution. I don't think the speed loss should be too great.
I am trying to convert the model to gguf, llama.cpp complains about a Vocab size mismatch (model has 32128, but tokenizer.model has 32000).
(I removed all from added_tokens.json
. I can sure "fix" the vocab_size in the config which will eventually lead to an error loading the model: 'token_embd.weight' has wrong shape; expected 4096, 32000, got 4096, 32128,
Any ideas?