GGUF 8bit model not recommended.
Hi! Thanks a lot for making these models accessible and fully open source!
I was looking at the different quantized models, and from the table, you say that the
8-bit variant is : " very large, extremely low quality loss - not recommended"
First of all, where can i find the results showing that the model has extremely low loss of quality? Could the use of "extremely here be an exaggeration?"
Second, why is it that this variant is not recommended?
Given you have the compute wouldn't you prefer this over the smaller model?
I understand that the smaller models could be more optimal for some, but I don't get why it is just not recommended without any explanation.
If the described magnitude of model size and loss of compression is relative to the other models in the table, then perhaps just call it "largest" and "least loss of quality"?
We have not tested the quantized models yet, and the use cases are from llama.cpp, these are there recommendations for the llama, and mistral 7b models.
I deleted the "recommendations", they depend on how much VRAM you have available and seem quite misleading. Thanks!