Failed to load model
$ ~/workspace/llama.cpp/server -m ./granite-34b-code-instruct-gguf/granite-34b-code-instruct.Q8_0.gguf -c 8192 --host 0.0.0.0 --port 8501 -ngl 81 -t 10 --mlock
It runs failed with message:
llama_model_load: error loading model: check_tensor_dims: tensor 'output.weight' not found
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model './granite-34b-code-instruct-gguf/granite-34b-code-instruct.Q8_0.gguf'
{"tid":"0x1f67f3ac0","timestamp":1715049990,"level":"ERR","function":"load_model","line":685,"msg":"unable to load model","model":"./granite-34b-code-instruct-gguf/granite-34b-code-instruct.Q8_0.gguf"}
@davideuler Yep, I'm pretty sure it's an issue with llama.cpp not yet supporting the IBM granite models.
@davideuler Yep, I'm pretty sure it's an issue with llama.cpp not yet supporting the IBM granite models.
Thanks, hope it can be supported soon.
There is a feature request to get support added open: https://github.com/ggerganov/llama.cpp/issues/7116