GGUF imatrix quants for this model
https://huggingface.co/iyadycb/phillama-3.8b-v0.1-gguf-imatrix
Not sure why Q8_0 is bigger than F16 🤷
I'm surprised you got this working, did you have to change a bunch of config values? It was very broken in my attempts to convert and then make an imatrix
Q8 being bigger than f16 is also a weird sign
There was an error about vocab size when converting to f32 gguf so I specified --pad-vocab. Afterwards, nothing special. Generated imatrix data then quantized as usual.
llama.cpp commit 46e12c4 for conversion, then b2737 for imatrix and quantize.
Did some more testing. I think you're right. It did fine with simple questions (e.g. What is the capital of Japan?) but when I asked more questions it produced an incomplete response. With larger context it's just incoherent mess.