Chat Template is broken?
#2
by
Erland
- opened
I tried deploying this model using llama-cpp-python, but it won't stop generating and it's going through the maximum tokens. I use chatml
for the template since I can't use custom chat template for llama-cpp. I am new with llamacpp
so maybe I am doing something wrong since it's outputing correctly using HuggingFace.
Thank you in advance!
I have no problem with this in LMStudio and ChatML. Which Q are you using? (I have had many issues with py llama.cpp chat templates before)
At first I am using the Q8, but then the not stopping error happens so I tried the fp16 but it still gives the same error.
I'll try just llama.cpp and will get back to you