llama-cpp failed

by gptwin - opened Feb 21

Discussion

gptwin

Feb 21

llama_model_load: error loading model: create_tensor: tensor 'output.weight' not found

mirek190

Feb 21

same

sayhan

Owner Feb 22

I'll look into it.

sayhan

Owner Feb 22

@gptwin @mirek190 Can you try the new "v2" files now?

telehan

Feb 22

main: build = 2239 (3a03541c)
main: built with Apple clang version 15.0.0 (clang-1500.1.0.2.5) for arm64-apple-darwin23.2.0
main: seed = 1708607934
llama_model_loader: loaded meta data with 24 key-value pairs and 254 tensors from gemma-7b-it.Q8_0.gguf (version GGUF V3 (latest))
...
llm_load_vocab: mismatch in special tokens definition ( 416/256000 vs 260/256000 ).
...
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = ?B
...
llama_model_load: error loading model: create_tensor: tensor 'output.weight' not found
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'gemma-7b-it.Q8_0.gguf'

sayhan

Owner Feb 22

@telehan I updated the model files. Could you try to load the model "gemma-7b-it.Q8_0-v2.gguf" instead? It should work.

telehan

Feb 22

"v2" files seems work fine

main: build = 2239 (3a03541c)
main: built with Apple clang version 15.0.0 (clang-1500.1.0.2.5) for arm64-apple-darwin23.2.0
main: seed  = 1708610347
llama_model_loader: loaded meta data with 24 key-value pairs and 254 tensors from gemma-7b-it.Q8_0-v2.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma
llama_model_loader: - kv   1:                               general.name str              = gemma-7b-it
...
llm_load_vocab: mismatch in special tokens definition ( 416/256000 vs 260/256000 )
...
llama_print_timings:        load time =   13195.25 ms
llama_print_timings:      sample time =      56.71 ms /   118 runs   (    0.48 ms per token,  2080.65 tokens per second)
llama_print_timings: prompt eval time =     124.71 ms /    40 tokens (    3.12 ms per token,   320.73 tokens per second)
llama_print_timings:        eval time =    3367.45 ms /   117 runs   (   28.78 ms per token,    34.74 tokens per second)
llama_print_timings:       total time =    3626.87 ms /   157 tokens

telehan

Feb 22

success loaded in LM Studio 0.2.15 (0.2.15)

sayhan

Owner Feb 22

It's so relieving to see this. Due to my mistake (I used the wrong llama.cpp branch 🤦🏻‍♂️), many people couldn't use the model yesterday. I hope I can make it up.

mirek190

Feb 22

is working now ..thanks

sayhan changed discussion status to closed Feb 23

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment