llama.cpp-b2234 generated not good output
It looks like the GGUF file was not compatible with llama.cpp-b2234 release.
I tried "gemma-7b-it-Q4_K_M.gguf", with prompt "write a python program to caculate pi with monte carlo method".
Its output is worse than "gemma-2b-it-Q4_K_M.gguf" from another repository.
@wyklq
The gguf models are generated with b2230 and also tested against b2230. There are some changes introduced into llama.cpp after b2230, so we are not sure if they are compatible with b2234. But anyway, we'll track the changes on llama.cpp and update the models in the near future.
In addition, according to my personal experience, 2b-it-Q8_0 is better. You can try it.
OK, it turns to be the original model's issue.
I found the discussion https://huggingface.co/google/gemma-7b-it/discussions/38
And the workaround works, i.e. set "Presence penalty" to 1.