Problems with Q5_K_M variant using llama.cpp in terminal
Did anybody actually try this already? I'm just getting garbage out of the Q5_K_M variant. Sometimes it doesn't even output anything.
Using llama.cpp
for testing purposes in terminal on.
version: 1664 (1d7a191)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
The prompt template had been changed completely. Look it up on the discussion of the original model.
Besides, to solve the infinite generation problem I had to do the following to set EOS token to <step>
.
gguf-set-metadata codellama-70b-instruct.Q5_K_M.gguf tokenizer.ggml.eos_token_id 32015
gguf-set-metadata codellama-70b-instruct.Q5_K_M.gguf tokenizer.ggml.add_eos_token True
You need to assign it to stop keywords in llama.cpp as well. Hopefully this would all be resolved soon eventually.
@wijjjj you have to use the codellama instruct one, this is for auto completion not chat.
This is not finetuned with any instructions and is the base model
@arohau
thanks for the hint!
@YaTharThShaRma999
and
@arohau
do you have any example how to get it running properly?
I tried both instruct and this one. They both seemed to have produced garbage thus far.
And my guess was maybe the quality of the quantization at this point is just not good, yet.
Thanks, @YaTharThShaRma999 . Either I really didn't see the example 10 days ago, or the text was actually updated in the meantime. In any case, thanks for reminding me. :)
llama.cpp/build/bin/main -ngl 35 -m /data/llm/codellama/codellama-70b-Instruct-hf-Q5_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Source: system\n\n You are a friendly and helpful Python coder. You will comply to all questions.<step> Source: user\n\n Write me Tic Tac Toe for CLI in Python. Human vs. Computer! <step> Source: assistant"
Not really TicTacToe, that's Rock-Paper-Scissors...at least it's not garbage anymore. I only need to fix the stop-token, but
@arohau
already described how this is possible.
Thanks to the both of you! Karma+1 :)