Trying to quantize. Running into the issue below. Any suggestions?
Permuting layer 30
Permuting layer 31
model.embed_tokens.weight -> token_embd.weight | F16 | [32002, 4096]
model.layers.0.input_layernorm.weight -> blk.0.attn_norm.weight | F16 | [4096]
Traceback (most recent call last):
File "/home/developer/llama.cpp/convert.py", line 1228, in
main()
File "/home/developer/llama.cpp/convert.py", line 1215, in main
model = convert_model_names(model, params)
File "/home/developer/llama.cpp/convert.py", line 1004, in convert_model_names
raise Exception(f"Unexpected tensor name: {name}")
Exception: Unexpected tensor name: model.layers.0.mlp.experts.0.w1.weight
Llama.cpp doesn't have support for this architecture. They'll probably wait until official arch is released before implementing