Chat Template is broken?

by Erland - opened Aug 22

Aug 22

I tried deploying this model using llama-cpp-python, but it won't stop generating and it's going through the maximum tokens. I use chatml for the template since I can't use custom chat template for llama-cpp. I am new with llamacpp so maybe I am doing something wrong since it's outputing correctly using HuggingFace.

Thank you in advance!

MaziyarPanahi

Owner Aug 22

I have no problem with this in LMStudio and ChatML. Which Q are you using? (I have had many issues with py llama.cpp chat templates before)

Erland

Aug 22

At first I am using the Q8, but then the not stopping error happens so I tried the fp16 but it still gives the same error.

I'll try just llama.cpp and will get back to you

MaziyarPanahi

Owner Aug 22

This Q5 in LM Studio:

Clearly the quantized models stop, it's usually the application you use and how they handle stop token and chat template.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment