FIM mode does not work properly, due to missing stop token

#3
by qwp4w3hyb - opened

Details here: https://github.com/ggerganov/llama.cpp/issues/9606
Workaround:

python3 ./gguf-py/scripts/gguf_new_metadata.py --special-token-by-id eot 151643 ./models/Qwen2.5-Coder-7B-Instruct-Q6_K_L.gguf ./models/Qwen2.5-Coder-7B-Instruct-Q6_K_L.fixed.gguf

Might be cool to apply the workaround to your ggufs.

If you can tell me how to fix the upstream tokenizer_config.json, I'm also happy to do that, but I was unable to as documented in the above issue.

Greetings & thanks for all your hard work quantizing all the models :)

FYI You can probably hold out on this as llama.cpp has a workaround in the pipe. https://github.com/ggerganov/llama.cpp/pull/9609

It still might make sense to hint people to use that version once it's released for proper FIM support.

good catch @qwp4w3hyb , that workaround should be very handy

Sign up or log in to comment