Loving the model and your thought behind it <3, have you considered requantizing?
#1
by
khushman
- opened
After the llama.cpp update to quantization? It's made the responses faster.
If not, do you mind sharing the fp16/32 weights on here by any chance so I could quantize myself?
Agreed. At least on the original weights.