FIM mode does not work properly, due to missing stop token
#3
by
qwp4w3hyb
- opened
Details here: https://github.com/ggerganov/llama.cpp/issues/9606
Workaround:
python3 ./gguf-py/scripts/gguf_new_metadata.py --special-token-by-id eot 151643 ./models/Qwen2.5-Coder-7B-Instruct-Q6_K_L.gguf ./models/Qwen2.5-Coder-7B-Instruct-Q6_K_L.fixed.gguf
Might be cool to apply the workaround to your ggufs.
If you can tell me how to fix the upstream tokenizer_config.json, I'm also happy to do that, but I was unable to as documented in the above issue.
Greetings & thanks for all your hard work quantizing all the models :)
FYI You can probably hold out on this as llama.cpp
has a workaround in the pipe. https://github.com/ggerganov/llama.cpp/pull/9609
It still might make sense to hint people to use that version once it's released for proper FIM support.
good catch @qwp4w3hyb , that workaround should be very handy