question
can you try to quantize this model
https://huggingface.co/niallturbitt/mpt-3b-8k-instruct
I Have not enough memory to do it
Just uploaded quants but llama.cpp is having issues loading them. Full output below, trying to debug but this is my first MPT quant... Maybe a custom_code issue? Param count doesn't seem to match repo name either.
Broken quants: https://huggingface.co/afrideva/mpt-3b-8k-instruct-GGUF
Using latest commit of llama.cpp: https://github.com/ggerganov/llama.cpp/tree/57ad015dc3011b046ed5a23186c86ea55f987c54
Log start
main: build = 1500 (57ad015)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed = 1699510135
llama_model_loader: loaded meta data with 19 key-value pairs and 292 tensors from mpt-3b-8k-instruct/mpt-3b-8k-instruct.q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: - tensor 0: token_embd.weight q8_0 [ 2048, 50368, 1, 1 ]
llama_model_loader: - tensor 1: output.weight q8_0 [ 2048, 50368, 1, 1 ]
llama_model_loader: - tensor 2: blk.0.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 3: blk.0.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 4: blk.0.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 5: blk.0.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 6: blk.0.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 7: blk.0.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 8: blk.0.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 9: blk.0.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 10: blk.0.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 11: blk.0.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 12: blk.0.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 13: blk.0.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 14: blk.1.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 15: blk.1.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 16: blk.1.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 17: blk.1.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 18: blk.1.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 19: blk.1.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 20: blk.1.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 21: blk.1.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 22: blk.1.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 23: blk.1.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 24: blk.1.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 25: blk.1.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 26: blk.2.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 27: blk.2.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 28: blk.2.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 29: blk.2.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 30: blk.2.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 31: blk.2.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 32: blk.2.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 33: blk.2.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 34: blk.2.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 35: blk.2.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 36: blk.2.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 37: blk.2.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 38: blk.3.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 39: blk.3.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 40: blk.3.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 41: blk.3.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 42: blk.3.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 43: blk.3.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 44: blk.3.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 45: blk.3.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 46: blk.3.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 47: blk.3.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 48: blk.3.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 49: blk.3.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 50: blk.4.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 51: blk.4.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 52: blk.4.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 53: blk.4.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 54: blk.4.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 55: blk.4.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 56: blk.4.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 57: blk.4.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 58: blk.4.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 59: blk.4.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 60: blk.4.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 61: blk.4.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 62: blk.5.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 63: blk.5.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 64: blk.5.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 65: blk.5.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 66: blk.5.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 67: blk.5.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 68: blk.5.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 69: blk.5.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 70: blk.5.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 71: blk.5.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 72: blk.5.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 73: blk.5.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 74: blk.6.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 75: blk.6.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 76: blk.6.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 77: blk.6.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 78: blk.6.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 79: blk.6.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 80: blk.6.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 81: blk.6.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 82: blk.6.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 83: blk.6.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 84: blk.6.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 85: blk.6.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 86: blk.7.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 87: blk.7.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 88: blk.7.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 89: blk.7.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 90: blk.7.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 91: blk.7.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 92: blk.7.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 93: blk.7.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 94: blk.7.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 95: blk.7.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 96: blk.7.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 97: blk.7.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 98: blk.8.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 99: blk.8.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 100: blk.8.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 101: blk.8.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 102: blk.8.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 103: blk.8.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 104: blk.8.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 105: blk.8.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 106: blk.8.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 107: blk.8.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 108: blk.8.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 109: blk.8.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 110: blk.9.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 111: blk.9.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 112: blk.9.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 113: blk.9.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 114: blk.9.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 115: blk.9.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 116: blk.9.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 117: blk.9.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 118: blk.9.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 119: blk.9.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 120: blk.9.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 121: blk.9.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 122: blk.10.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 123: blk.10.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 124: blk.10.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 125: blk.10.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 126: blk.10.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 127: blk.10.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 128: blk.10.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 129: blk.10.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 130: blk.10.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 131: blk.10.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 132: blk.10.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 133: blk.10.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 134: blk.11.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 135: blk.11.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 136: blk.11.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 137: blk.11.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 138: blk.11.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 139: blk.11.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 140: blk.11.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 141: blk.11.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 142: blk.11.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 143: blk.11.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 144: blk.11.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 145: blk.11.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 146: blk.12.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 147: blk.12.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 148: blk.12.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 149: blk.12.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 150: blk.12.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 151: blk.12.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 152: blk.12.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 153: blk.12.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 154: blk.12.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 155: blk.12.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 156: blk.12.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 157: blk.12.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 158: blk.13.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 159: blk.13.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 160: blk.13.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 161: blk.13.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 162: blk.13.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 163: blk.13.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 164: blk.13.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 165: blk.13.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 166: blk.13.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 167: blk.13.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 168: blk.13.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 169: blk.13.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 170: blk.14.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 171: blk.14.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 172: blk.14.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 173: blk.14.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 174: blk.14.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 175: blk.14.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 176: blk.14.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 177: blk.14.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 178: blk.14.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 179: blk.14.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 180: blk.14.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 181: blk.14.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 182: blk.15.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 183: blk.15.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 184: blk.15.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 185: blk.15.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 186: blk.15.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 187: blk.15.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 188: blk.15.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 189: blk.15.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 190: blk.15.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 191: blk.15.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 192: blk.15.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 193: blk.15.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 194: blk.16.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 195: blk.16.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 196: blk.16.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 197: blk.16.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 198: blk.16.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 199: blk.16.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 200: blk.16.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 201: blk.16.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 202: blk.16.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 203: blk.16.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 204: blk.16.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 205: blk.16.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 206: blk.17.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 207: blk.17.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 208: blk.17.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 209: blk.17.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 210: blk.17.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 211: blk.17.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 212: blk.17.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 213: blk.17.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 214: blk.17.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 215: blk.17.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 216: blk.17.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 217: blk.17.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 218: blk.18.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 219: blk.18.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 220: blk.18.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 221: blk.18.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 222: blk.18.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 223: blk.18.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 224: blk.18.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 225: blk.18.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 226: blk.18.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 227: blk.18.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 228: blk.18.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 229: blk.18.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 230: blk.19.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 231: blk.19.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 232: blk.19.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 233: blk.19.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 234: blk.19.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 235: blk.19.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 236: blk.19.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 237: blk.19.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 238: blk.19.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 239: blk.19.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 240: blk.19.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 241: blk.19.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 242: blk.20.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 243: blk.20.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 244: blk.20.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 245: blk.20.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 246: blk.20.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 247: blk.20.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 248: blk.20.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 249: blk.20.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 250: blk.20.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 251: blk.20.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 252: blk.20.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 253: blk.20.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 254: blk.21.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 255: blk.21.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 256: blk.21.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 257: blk.21.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 258: blk.21.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 259: blk.21.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 260: blk.21.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 261: blk.21.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 262: blk.21.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 263: blk.21.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 264: blk.21.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 265: blk.21.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 266: blk.22.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 267: blk.22.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 268: blk.22.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 269: blk.22.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 270: blk.22.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 271: blk.22.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 272: blk.22.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 273: blk.22.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 274: blk.22.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 275: blk.22.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 276: blk.22.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 277: blk.22.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 278: blk.23.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 279: blk.23.attn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 280: blk.23.attn_qkv.weight q8_0 [ 2048, 6144, 1, 1 ]
llama_model_loader: - tensor 281: blk.23.attn_qkv.bias f32 [ 6144, 1, 1, 1 ]
llama_model_loader: - tensor 282: blk.23.attn_output.weight q8_0 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 283: blk.23.attn_output.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 284: blk.23.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 285: blk.23.ffn_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 286: blk.23.ffn_up.weight q8_0 [ 2048, 8192, 1, 1 ]
llama_model_loader: - tensor 287: blk.23.ffn_up.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 288: blk.23.ffn_down.weight q8_0 [ 8192, 2048, 1, 1 ]
llama_model_loader: - tensor 289: blk.23.ffn_down.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 290: output_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 291: output_norm.bias f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - kv 0: general.architecture str
llama_model_loader: - kv 1: general.name str
llama_model_loader: - kv 2: mpt.context_length u32
llama_model_loader: - kv 3: mpt.embedding_length u32
llama_model_loader: - kv 4: mpt.block_count u32
llama_model_loader: - kv 5: mpt.feed_forward_length u32
llama_model_loader: - kv 6: mpt.attention.head_count u32
llama_model_loader: - kv 7: mpt.attention.layer_norm_epsilon f32
llama_model_loader: - kv 8: mpt.attention.max_alibi_bias f32
llama_model_loader: - kv 9: tokenizer.ggml.model str
llama_model_loader: - kv 10: tokenizer.ggml.tokens arr
llama_model_loader: - kv 11: tokenizer.ggml.token_type arr
llama_model_loader: - kv 12: tokenizer.ggml.merges arr
llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32
llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32
llama_model_loader: - kv 15: tokenizer.ggml.unknown_token_id u32
llama_model_loader: - kv 16: tokenizer.ggml.padding_token_id u32
llama_model_loader: - kv 17: general.quantization_version u32
llama_model_loader: - kv 18: general.file_type u32
llama_model_loader: - type f32: 194 tensors
llama_model_loader: - type q8_0: 98 tensors
llm_load_vocab: mismatch in special tokens definition ( 95/50368 vs 116/50368 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = mpt
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 50368
llm_load_print_meta: n_merges = 50009
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 2048
llm_load_print_meta: n_head = 16
llm_load_print_meta: n_head_kv = 16
llm_load_print_meta: n_layer = 24
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 0.0e+00
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 8.0e+00
llm_load_print_meta: n_ff = 8192
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = ?B
llm_load_print_meta: model ftype = mostly Q8_0
llm_load_print_meta: model params = 1.41 B
llm_load_print_meta: model size = 1.40 GiB (8.51 BPW)
llm_load_print_meta: general.name = mpt-3b-8k-instruct
llm_load_print_meta: BOS token = 0 '<|endoftext|>'
llm_load_print_meta: EOS token = 0 '<|endoftext|>'
llm_load_print_meta: UNK token = 0 '<|endoftext|>'
llm_load_print_meta: PAD token = 0 '<|endoftext|>'
llm_load_print_meta: LF token = 128 'Γ'
llm_load_tensors: ggml ctx size = 0.11 MB
error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 147
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'mpt-3b-8k-instruct/mpt-3b-8k-instruct.q8_0.gguf'
main: error: unable to load model
oh, thanks for trying, there must be a problem with the model π
now I see, by the file size the model is broken, a normal 3b float16 is around 6,85 gb
he uploaded again, now seems to be around 6,85 gb
https://huggingface.co/niallturbitt/mpt-3b-8k-instruct/tree/main
by the way sorry annoying you.
https://huggingface.co/afrideva/mpt-3b-8k-instruct-GGUF/blob/main/mpt-3b-8k-instruct.q2_k.gguf
Just uploaded the q2_k, working on rest. Happy to help anytime if able.
Many thanks for all your smaller model quants!
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = mpt
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 50432
llm_load_print_meta: n_merges = 50009
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_layer = 16
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 0.0e+00
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 8.0e+00
llm_load_print_meta: n_ff = 16384
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = ?B
llm_load_print_meta: model ftype = mostly Q2_K
llm_load_print_meta: model params = 3.63 B
llm_load_print_meta: model size = 1.43 GiB (3.39 BPW)
llm_load_print_meta: general.name = mpt-3b-8k-instruct
llm_load_print_meta: BOS token = 0 '<|endoftext|>'
llm_load_print_meta: EOS token = 0 '<|endoftext|>'
llm_load_print_meta: UNK token = 0 '<|endoftext|>'
llm_load_print_meta: PAD token = 0 '<|endoftext|>'
llm_load_print_meta: LF token = 128 'Γ'
llm_load_tensors: ggml ctx size = 0.04 MB
llm_load_tensors: mem required = 1468.79 MB
..........................................................
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size = 128.00 MB
llama_build_graph: non-view tensors processed: 324/324
llama_new_context_with_model: compute buffer total size = 121.13 MB
system_info: n_threads = 2 / 2 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
sampling:
repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 0
### Instruction:
Please give me a color representing Namibia's beauty.
### Response:
A light blue color could represent the country ofNamibia, which is in Southern Africa and has some of the most beautiful beaches on earth! [end of text]
llama_print_timings: load time = 6422.40 ms
llama_print_timings: sample time = 28.16 ms / 32 runs ( 0.88 ms per token, 1136.16 tokens per second)
llama_print_timings: prompt eval time = 7203.60 ms / 21 tokens ( 343.03 ms per token, 2.92 tokens per second)
llama_print_timings: eval time = 7767.82 ms / 31 runs ( 250.57 ms per token, 3.99 tokens per second)
llama_print_timings: total time = 15021.36 ms
Log end
q3_k-q8 up, Alpaca format seems to work well