EVA-UNIT-01/EVA-Yi-1.5-9B-32K-V1
Hello! It's been a while, but I finally released a new model. Would love to see GGUFs for it. Thanks in advance! (a quick note - I've altered tokenizer config of Yi to use PreTrainedTokenizerFast instead of LlamaTokenizer because it was causing decoding artifacts, hopefully it's respected by lcpp during quanting)
https://huggingface.co/EVA-UNIT-01/EVA-Yi-1.5-9B-32K-V1
No idea what tokenizer it uses but at least I can confirm that it sucessfully convearts to a GGUF. Here the llama.cpp logfile and here the script used to do the convearsion in case you want to investigate this yourself: https://github.com/ggerganov/llama.cpp/blob/master/convert_hf_to_gguf.py
venv/bin/python3 llama.cpp/convert_hf_to_gguf.py --outfile out.gguf ./EVA-Yi-1.5-9B-32K-V1/
INFO:hf-to-gguf:Loading model: EVA-Yi-1.5-9B-32K-V1
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00004.safetensors'
INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {4096, 64000}
INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.0.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.0.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.0.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.0.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.1.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.1.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.1.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.1.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.10.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.10.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.10.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.10.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.10.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.11.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.11.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.11.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.11.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.11.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.12.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.12.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.12.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.12.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.12.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.12.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.2.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.2.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.2.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.3.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.3.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.3.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.3.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.4.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.4.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.4.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.4.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.5.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.5.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.5.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.6.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.6.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.6.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.6.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.7.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.7.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.7.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.7.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.8.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.8.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.8.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.8.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.8.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.8.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.8.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.9.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.9.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.9.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.9.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.9.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.9.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.9.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.9.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.9.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:gguf: loading model part 'model-00002-of-00004.safetensors'
INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.12.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.13.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.13.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.13.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.13.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.13.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.13.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.13.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.14.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.14.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.14.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.14.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.14.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.14.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.14.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.15.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.15.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.15.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.15.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.15.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.15.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.15.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.15.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.16.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.16.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.16.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.16.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.16.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.16.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.16.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.16.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.16.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.17.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.17.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.17.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.17.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.17.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.17.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.17.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.17.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.17.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.18.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.18.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.18.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.18.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.18.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.18.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.18.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.18.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.18.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.19.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.19.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.19.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.19.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.19.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.19.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.19.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.19.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.19.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.20.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.20.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.20.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.20.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.20.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.20.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.20.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.20.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.20.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.21.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.21.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.21.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.21.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.21.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.21.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.21.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.21.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.21.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.22.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.22.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.22.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.22.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.22.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.22.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.22.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.22.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.22.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.23.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.23.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.23.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.23.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.23.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.23.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.23.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.23.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.23.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.24.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.24.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.24.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.24.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.24.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.24.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.24.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.24.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.24.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.25.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.25.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.25.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.25.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.25.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.25.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.25.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.25.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.25.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.26.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.26.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.26.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.26.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.26.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.26.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.26.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.26.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.26.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.27.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.27.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.27.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:gguf: loading model part 'model-00003-of-00004.safetensors'
INFO:hf-to-gguf:blk.27.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.27.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.27.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.27.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.27.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.27.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.28.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.28.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.28.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.28.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.28.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.28.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.28.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.28.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.28.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.29.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.29.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.29.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.29.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.29.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.29.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.29.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.29.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.29.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.30.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.30.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.30.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.30.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.30.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.30.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.30.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.30.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.30.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.31.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.31.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.31.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.31.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.31.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.31.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.31.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.31.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.31.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.32.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.32.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.32.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.32.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.32.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.32.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.32.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.32.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.32.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.33.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.33.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.33.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.33.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.33.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.33.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.33.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.33.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.33.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.34.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.34.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.34.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.34.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.34.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.34.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.34.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.34.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.34.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.35.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.35.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.35.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.35.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.35.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.35.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.35.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.35.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.35.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.36.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.36.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.36.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.36.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.36.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.36.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.36.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.36.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.36.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.37.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.37.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.37.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.37.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.37.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.37.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.37.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.37.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.37.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.38.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.38.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.38.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.38.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.38.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.38.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.38.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.38.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.38.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.39.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.39.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.39.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.39.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.39.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.39.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.39.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.39.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.39.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.40.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.40.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.40.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.40.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.40.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.40.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.40.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.40.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.40.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.41.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.41.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.41.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.41.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.41.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:gguf: loading model part 'model-00004-of-00004.safetensors'
INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F16, shape = {4096, 64000}
INFO:hf-to-gguf:blk.41.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.41.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.41.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.41.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.42.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.42.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.42.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.42.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.42.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.42.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.42.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.42.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.42.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.43.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.43.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.43.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.43.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.43.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.43.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.43.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.43.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.43.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.44.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.44.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.44.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.44.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.44.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.44.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.44.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.44.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.44.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.45.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.45.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.45.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.45.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.45.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.45.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.45.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.45.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.45.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.46.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.46.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.46.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.46.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.46.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.46.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.46.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.46.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.46.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.47.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.47.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
INFO:hf-to-gguf:blk.47.ffn_gate.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.47.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 11008}
INFO:hf-to-gguf:blk.47.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.47.attn_k.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:blk.47.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.47.attn_q.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.47.attn_v.weight, torch.bfloat16 --> F16, shape = {4096, 512}
INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 32768
INFO:hf-to-gguf:gguf: embedding length = 4096
INFO:hf-to-gguf:gguf: feed forward length = 11008
INFO:hf-to-gguf:gguf: head count = 32
INFO:hf-to-gguf:gguf: key-value head count = 4
INFO:hf-to-gguf:gguf: rope theta = 5000000
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 1
INFO:gguf.vocab:Setting special token type eos to 2
INFO:gguf.vocab:Setting special token type unk to 0
INFO:gguf.vocab:Setting special token type pad to 0
INFO:gguf.vocab:Setting add_bos_token to False
INFO:gguf.vocab:Setting add_eos_token to False
INFO:gguf.vocab:Setting chat_template to {% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}
INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:out.gguf: n_tensors = 435, total_size = 17.7G
Writing: 100%|_________________________________________________________| 17.7G/17.7G [00:30<00:00, 573Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to out.gguf
@nicoboss I had at least one report of broken GGUF already, but that's maybe just GGUF-my-repo space's problem. Will investigate, thanks for the clue
@nicoboss I had at least one report of broken GGUF already, but that's maybe just GGUF-my-repo space's problem. Will investigate, thanks for the clue
It doesn't look broken to me:
llama.cpp/llama-cli -m EVA-Yi-1.5-9B-32K-V1.SOURCE.gguf -n 128 -p "I believe the meaning of life is"
Log start
main: build = 3618 (3ba780e2)
main: built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
main: seed = 1727130908
llama_model_loader: loaded meta data with 40 key-value pairs and 435 tensors from EVA-Yi-1.5-9B-32K-V1.SOURCE.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Yi 1.5 9B 32K Tokfix
llama_model_loader: - kv 3: general.version str = V1
llama_model_loader: - kv 4: general.organization str = AuriAetherwiing
llama_model_loader: - kv 5: general.finetune str = 32k-tokfix
llama_model_loader: - kv 6: general.basename str = Yi-1.5
llama_model_loader: - kv 7: general.size_label str = 9B
llama_model_loader: - kv 8: general.license str = apache-2.0
llama_model_loader: - kv 9: general.base_model.count u32 = 1
llama_model_loader: - kv 10: general.base_model.0.name str = Yi 1.5 9B 32K Tokfix
llama_model_loader: - kv 11: general.base_model.0.organization str = AuriAetherwiing
llama_model_loader: - kv 12: general.base_model.0.repo_url str = https://huggingface.co/AuriAetherwiin...
llama_model_loader: - kv 13: general.datasets arr[str,2] = ["AuriAetherwiing/Allura", "kalomaze/...
llama_model_loader: - kv 14: llama.block_count u32 = 48
llama_model_loader: - kv 15: llama.context_length u32 = 32768
llama_model_loader: - kv 16: llama.embedding_length u32 = 4096
llama_model_loader: - kv 17: llama.feed_forward_length u32 = 11008
llama_model_loader: - kv 18: llama.attention.head_count u32 = 32
llama_model_loader: - kv 19: llama.attention.head_count_kv u32 = 4
llama_model_loader: - kv 20: llama.rope.freq_base f32 = 5000000.000000
llama_model_loader: - kv 21: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 22: llama.attention.key_length u32 = 128
llama_model_loader: - kv 23: llama.attention.value_length u32 = 128
llama_model_loader: - kv 24: general.file_type u32 = 1
llama_model_loader: - kv 25: llama.vocab_size u32 = 64000
llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 27: tokenizer.ggml.model str = llama
llama_model_loader: - kv 28: tokenizer.ggml.pre str = default
llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,64000] = ["<unk>", "<|startoftext|>", "<|endof...
llama_model_loader: - kv 30: tokenizer.ggml.scores arr[f32,64000] = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv 31: tokenizer.ggml.token_type arr[i32,64000] = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 34: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 35: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 36: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 37: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 38: tokenizer.chat_template str = {% if not add_generation_prompt is de...
llama_model_loader: - kv 39: general.quantization_version u32 = 2
llama_model_loader: - type f32: 97 tensors
llama_model_loader: - type f16: 338 tensors
llm_load_vocab: special tokens cache size = 239
llm_load_vocab: token to piece cache size = 0.3834 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 64000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 32768
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_layer = 48
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 4
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 8
llm_load_print_meta: n_embd_k_gqa = 512
llm_load_print_meta: n_embd_v_gqa = 512
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 11008
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 5000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 32768
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: ssm_dt_b_c_rms = 0
llm_load_print_meta: model type = 34B
llm_load_print_meta: model ftype = F16
llm_load_print_meta: model params = 8.83 B
llm_load_print_meta: model size = 16.45 GiB (16.00 BPW)
llm_load_print_meta: general.name = Yi 1.5 9B 32K Tokfix
llm_load_print_meta: BOS token = 1 '<|startoftext|>'
llm_load_print_meta: EOS token = 2 '<|endoftext|>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: PAD token = 0 '<unk>'
llm_load_print_meta: LF token = 315 '<0x0A>'
llm_load_print_meta: EOT token = 2 '<|endoftext|>'
llm_load_print_meta: max token length = 48
ggml_cuda_init: failed to initialize CUDA: no CUDA-capable device is detected
llm_load_tensors: ggml ctx size = 0.20 MiB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/49 layers to GPU
llm_load_tensors: CPU buffer size = 16841.52 MiB
.................................................................................................
llama_new_context_with_model: n_ctx = 32768
llama_new_context_with_model: n_batch = 2048
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 5000000.0
llama_new_context_with_model: freq_scale = 1
ggml_cuda_host_malloc: failed to allocate 3072.00 MiB of pinned memory: no CUDA-capable device is detected
llama_kv_cache_init: CPU KV buffer size = 3072.00 MiB
llama_new_context_with_model: KV self size = 3072.00 MiB, K (f16): 1536.00 MiB, V (f16): 1536.00 MiB
ggml_cuda_host_malloc: failed to allocate 0.24 MiB of pinned memory: no CUDA-capable device is detected
llama_new_context_with_model: CPU output buffer size = 0.24 MiB
ggml_cuda_host_malloc: failed to allocate 2144.01 MiB of pinned memory: no CUDA-capable device is detected
llama_new_context_with_model: CUDA_Host compute buffer size = 2144.01 MiB
llama_new_context_with_model: graph nodes = 1542
llama_new_context_with_model: graph splits = 1
system_info: n_threads = 32 / 62 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
sampling:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 32768, n_batch = 2048, n_predict = 128, n_keep = 0
I believe the meaning of life is to be fully alive, fully engaged, to breathe deeply, to look wide, to love wildly.”
—
Jennifer was right. I was in the wrong. I wasn’t being the person I wanted to be. I wasn’t being the kind of man I wanted to be. I needed to change.
I needed to be the kind of man who would say yes to everything that Jennifer was asking of me.
The kind of man who would let her be who she wanted to be and make her dreams come true.
The kind of man who would make her happy.
llama_print_timings: load time = 1916.15 ms
llama_print_timings: sample time = 3.46 ms / 128 runs ( 0.03 ms per token, 36951.50 tokens per second)
llama_print_timings: prompt eval time = 119.37 ms / 7 tokens ( 17.05 ms per token, 58.64 tokens per second)
llama_print_timings: eval time = 11398.82 ms / 127 runs ( 89.75 ms per token, 11.14 tokens per second)
llama_print_timings: total time = 11531.61 ms / 134 tokens
Log end
Sure, it's queued. I don't know about this architecture, but the llama.cpp conversion does not exactly just use the hf tokenizer, so if it converts, that is a good sign, but it might still use the wrong tokenizer. Which is likely to still work, since the hf framework can save the tokenizer in multiple formats, so whichever llama.cpp picks is likely the right one, if it works.
Anyway, you can watch it's progress at http://hf.tst.eu/status.html and it's currently the only model actively quantizing :)