Chat Template is broken?

#2
by Erland - opened

I tried deploying this model using llama-cpp-python, but it won't stop generating and it's going through the maximum tokens. I use chatml for the template since I can't use custom chat template for llama-cpp. I am new with llamacpp so maybe I am doing something wrong since it's outputing correctly using HuggingFace.

Thank you in advance!

I have no problem with this in LMStudio and ChatML. Which Q are you using? (I have had many issues with py llama.cpp chat templates before)

At first I am using the Q8, but then the not stopping error happens so I tried the fp16 but it still gives the same error.

I'll try just llama.cpp and will get back to you

This Q5 in LM Studio:
image.png

Clearly the quantized models stop, it's usually the application you use and how they handle stop token and chat template.

Sign up or log in to comment