Can't load q5_1 model
I have tried to load model with llama AVX2 version and with cublas version but I failed. Llama-cpp is from the latest release. Here is the output
C:\AI\llama>main -i --color --interactive-first -r "### Human:" -r "### Input:" -r "(Input)" -r "### Instruction:" -r "### User:" -r "User:" -r "USER:" -r "=============" --temp 0 --ctx_size 2048 --n_predict -1 --ignore-eos --repeat_penalty 1.2 --instruct -m wizardcoder-guanaco-15b-v1.1.ggmlv1.q5_1.bin --threads 8
main: build = 843 (6e7cca4)
main: seed = 1689451684
llama.cpp: loading model from wizardcoder-guanaco-15b-v1.1.ggmlv1.q5_1.bin
error loading model: unexpectedly reached end of file
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'wizardcoder-guanaco-15b-v1.1.ggmlv1.q5_1.bin'
main: error: unable to load model
From readme:
Compatibilty: These files are not compatible with llama.cpp, text-generation-webui or llama-cpp-python.
Model works for me using ctransformers (https://github.com/marella/ctransformers)
From readme:
Compatibilty: These files are not compatible with llama.cpp, text-generation-webui or llama-cpp-python.Model works for me using ctransformers (https://github.com/marella/ctransformers)
Oh, sorry. Got used that all ggml models run well on these 3. The problem is that I need openAI API. Kobold has its own API. About the rest I have no idea. TheBloke, can we expect quantized models compatible with those 3 tools?
PS: Just found out that LM Studio should have OpenAI API. So I will try it.
Yeah lm studio is good. And ctransformers can also provide an open AI api I believe.