The version beyond Q4 quantization is completely unavailable
The Q4_K_M version performed well, but all versions higher than Q4 did not answer as expected.
I am running through ollama 0.1.41.
Can you explain what "unavailable" means?
And what answer do you get, and what do you expect? As such, this posting is pretty useless.
Just tried out the Q6_K and it works fine. Make sure you downloaded the files correctly and fully, and maybe consult a support forum for ollama on how to set it up.
Just tried out the Q6_K and it works fine. Make sure you downloaded the files correctly and fully, and maybe consult a support forum for ollama on how to set it up.
They're just throwing out random outputs, and only Q4 can answer the question correctly
Just tried out the Q6_K and it works fine. Make sure you downloaded the files correctly and fully, and maybe consult a support forum for ollama on how to set it up.
Are you using this model through ollama?
No, I use llama.cpp, which ollama also uses (but likely an older version).
No, I use llama.cpp, which ollama also uses (but likely an older version).
This might be the cause of this issue. Even with the use of f16, the model still gives irrelevant answers.
It's probably a configuration/usage issue, though.