GGUF imatrix quants for this model

#1
by iyadycb - opened

https://huggingface.co/iyadycb/phillama-3.8b-v0.1-gguf-imatrix

Not sure why Q8_0 is bigger than F16 🤷

I'm surprised you got this working, did you have to change a bunch of config values? It was very broken in my attempts to convert and then make an imatrix

Q8 being bigger than f16 is also a weird sign

There was an error about vocab size when converting to f32 gguf so I specified --pad-vocab. Afterwards, nothing special. Generated imatrix data then quantized as usual.

llama.cpp commit 46e12c4 for conversion, then b2737 for imatrix and quantize.

Did some more testing. I think you're right. It did fine with simple questions (e.g. What is the capital of Japan?) but when I asked more questions it produced an incomplete response. With larger context it's just incoherent mess.

raincandy-u changed discussion status to closed

Sign up or log in to comment