Transformers
llama

Can't load model?

#2
by nacs - opened

I've tried the latest llama.cpp (as of August 11th) as well as the commit mentioned in the README ( commit e76d630) but am getting an error when trying to load this model.

I've tried the q2_k.bin as well as q3_K_S.bin and both give the following error:

llama_model_load_internal: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 3060) as main device
error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected  8192 x  8192, got  8192 x  1024
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model './models/airoboros-l2-70b-gpt4-2.0.ggmlv3.q2_K.bin'
main: error: unable to load model

Am I doing something wrong?

Sorry, this was my fault. I missed the part of the README that says to use " -gqa 8" when using this model. It works now thanks.

nacs changed discussion status to closed

Sign up or log in to comment