Wrong chat template

#1
by amgadhasan - opened

When running in llama.cpp using llama-cli -m Lite-Mistral-150M-v2-Instruct-Q8_0.gguf -p "You are a helpful assistant." -cnv, it sets the chat template to ChatML:

main: chat template example: <|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant


system_info: n_threads = 8 / 16 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 0 | 
main: interactive mode on.
sampling: 
    repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
    top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
    mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order: 
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature 
generate: n_ctx = 2048, n_batch = 2048, n_predict = -1, n_keep = 1


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to the AI.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 <|im_start|>system
You are a helpful assistant.<|im_end|>

> 

Is there a problem with the GGUF?

I created a PR to add support for this model's chat template
https://github.com/ggerganov/llama.cpp/pull/8522

Odd, I would expect it to pick up the template from the model which I think is correctly stored?

Odd, I would expect it to pick up the template from the model which I think is correctly stored?

I believe it's stored correctly, atleast huggingface's gguf viewer show the correct prompt

Screenshot_20240717-181722.png

Hi, I've updated the chat template in tokenizer_config.json and the quants in my repo. It should now detect the template properly.

Thanks!

amgadhasan changed discussion status to closed

Sign up or log in to comment