llama-cpp failed

#3
by gptwin - opened

llama_model_load: error loading model: create_tensor: tensor 'output.weight' not found

same

Owner

I'll look into it.

Owner

@gptwin @mirek190 Can you try the new "v2" files now?

main: build = 2239 (3a03541c)
main: built with Apple clang version 15.0.0 (clang-1500.1.0.2.5) for arm64-apple-darwin23.2.0
main: seed = 1708607934
llama_model_loader: loaded meta data with 24 key-value pairs and 254 tensors from gemma-7b-it.Q8_0.gguf (version GGUF V3 (latest))
...
llm_load_vocab: mismatch in special tokens definition ( 416/256000 vs 260/256000 ).
...
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = ?B
...
llama_model_load: error loading model: create_tensor: tensor 'output.weight' not found
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'gemma-7b-it.Q8_0.gguf'

Owner

@telehan I updated the model files. Could you try to load the model "gemma-7b-it.Q8_0-v2.gguf" instead? It should work.

"v2" files seems work fine

main: build = 2239 (3a03541c)
main: built with Apple clang version 15.0.0 (clang-1500.1.0.2.5) for arm64-apple-darwin23.2.0
main: seed  = 1708610347
llama_model_loader: loaded meta data with 24 key-value pairs and 254 tensors from gemma-7b-it.Q8_0-v2.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma
llama_model_loader: - kv   1:                               general.name str              = gemma-7b-it
...
llm_load_vocab: mismatch in special tokens definition ( 416/256000 vs 260/256000 )
...
llama_print_timings:        load time =   13195.25 ms
llama_print_timings:      sample time =      56.71 ms /   118 runs   (    0.48 ms per token,  2080.65 tokens per second)
llama_print_timings: prompt eval time =     124.71 ms /    40 tokens (    3.12 ms per token,   320.73 tokens per second)
llama_print_timings:        eval time =    3367.45 ms /   117 runs   (   28.78 ms per token,    34.74 tokens per second)
llama_print_timings:       total time =    3626.87 ms /   157 tokens

success loaded in LM Studio 0.2.15 (0.2.15)
image.png

Owner

It's so relieving to see this. Due to my mistake (I used the wrong llama.cpp branch 🤦🏻‍♂️), many people couldn't use the model yesterday. I hope I can make it up.

is working now ..thanks

sayhan changed discussion status to closed

Sign up or log in to comment