Cannot load the model in Koboldcpp 1.28

#1
by FenixInDarkSolo - opened

I down the q4_0 and q8_0 models to test, but it cannot load in koboldcpp 1.28.
I have checked the SHA256 and confirm both of them are correct.
```
> koboldcpp_128.exe --threads 12 --smartcontext --unbantokens --contextsize 2048 --blasbatchsize 1024 --useclblast 0 0 --gpulayers 3
Welcome to KoboldCpp - Version 1.28
For command line arguments, please refer to --help
Otherwise, please manually select ggml file:
Attempting to use CLBlast library for faster prompt ingestion. A compatible clblast will be required.
Initializing dynamic library: koboldcpp_clblast.dll

Loading model: D:\program\koboldcpp\planner-7b.ggmlv3.q8_0.bin
[Threads: 12, BlasThreads: 12, SmartContext: True]


Identified as LLAMA model: (ver 5)
Attempting to Load...

System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
llama.cpp: loading model from D:\program\koboldcpp\planner-7b.ggmlv3.q8_0.bin
error loading model: unrecognized tensor type 14

llama_init_from_file: failed to load model
gpttype_load_model: error: failed to load model 'D:\program\koboldcpp\planner-7b.ggmlv3.q8_0.bin'
Load Model OK: False
Could not load model: D:\program\koboldcpp\planner-7b.ggmlv3.q8_0.bin

But I can successfully load it in the llama.cpp.
FenixInDarkSolo changed discussion title from Cannot load the model in Koboldcpp to Cannot load the model in Koboldcpp 1.28

Shit. I hadn't realised that the new llama.cpp k-quant commit had changed q4_0, q4_1, q5_0, q5_1 and q8_0.

I happened to do this model 2 hours after the k-quant PR was merged (https://github.com/ggerganov/llama.cpp/pull/1684) so yeah the files only work with latest llama.cpp .

I am sure koboldcpp will add support pretty soon, but for now they won't work.

I'll see about re-doing them with previous llama.cpp. I think for the next week or two I'm going to do q4_0, q4_1, q5_0, q5_1 and q8_0 using llama.cpp before k-quant, and just do the new k-quant methods using the latest code. To ensure maximum compatibility.

I'm running into a similar issue with llama-cpp. Have not dug deep yet.

llama.cpp: loading model from ../python3.11/site-packages/llama_cpp/models/7B/planner-7b.ggmlv3.q5_0.bin
error loading model: unrecognized tensor type 14
llama_init_from_file: failed to load model

Yeah sorry. Right now the files can only be used with latest llama.cpp.

I will re-generate them shortly

I have updated all the old quant types: q4_0, q4_1, q5_0, q5_1, q8_0. They were now generated with an older version of llama.cpp

Please re-download, re-test and let me know.

I have a custom automatic updater for Koboldcpp on Windows, if anyone is interested.

I have download the q4_0 again and test it in koboldcpp 1.28. And it works. Thank you for the fix.

KoboldCpp was updated to 1.29 recently and offers a partial support for k-quantizers. Partial because it only supports openblas for now, not Clblast yet. πŸ˜•

Sign up or log in to comment