Performance of this Quant
#1
by
ernestr
- opened
Hey,
Thanks very much for quantizing this model! Downloading tonight. Are you able to provide any feedback on it's performance over the GGUF. Did you see any issues with performance or coherence?
If you can load the entire model onto GPUs, based on my limited experiences, EXL2 is always much faster than GGUF. I found this model seems to be slightly better than the original mistral model. It is not surprising because the Tess model is coming from the legendary creator of Synthia models, who I pretty respect.
However, this Tess model is extremely sensitive to the prompt format. Make sure you are using the one provided in the model card. Otherwise, it will generate gibberish.
Enjoy!
denru
changed discussion status to
closed