Performance of this Quant

#1
by ernestr - opened

Hey,

Thanks very much for quantizing this model! Downloading tonight. Are you able to provide any feedback on it's performance over the GGUF. Did you see any issues with performance or coherence?

Owner

If you can load the entire model onto GPUs, based on my limited experiences, EXL2 is always much faster than GGUF. I found this model seems to be slightly better than the original mistral model. It is not surprising because the Tess model is coming from the legendary creator of Synthia models, who I pretty respect.

However, this Tess model is extremely sensitive to the prompt format. Make sure you are using the one provided in the model card. Otherwise, it will generate gibberish.

Enjoy!

denru changed discussion status to closed

Sign up or log in to comment