Full Perplexity Comparison Table for Release Documentation
Quantization Type | PPL | ln(PPL(Q)/PPL(fp16)) | File Size |
---|---|---|---|
bf16 | 14.0431 | 0.0 | 4.2G |
IQ2_XS | 28.9052 | 0.72189 | 1.5G |
IQ3_M | 15.1995 | 0.079131 | 1.7G |
IQ3_S | 15.8627 | 0.121839 | 1.7G |
IQ3_XS | 16.7197 | 0.174456 | 1.7G |
IQ3_XXS | 17.6216 | 0.226994 | 1.7G |
IQ4_NL | 14.5534 | 0.035693 | 1.9G |
IQ4_XS | 14.5638 | 0.036408 | 1.8G |
Q3_K_L | 15.0444 | 0.068875 | 1.8G |
Q3_K_M | 15.2582 | 0.082986 | 1.8G |
Q3_K_S | 15.839 | 0.120344 | 1.7G |
Q4_K_M | 14.399 | 0.025028 | 2.0G |
Q4_K_S | 14.4338 | 0.027442 | 1.9G |
Q5_K_M | 14.1299 | 0.006162 | 2.2G |
Q5_K_S | 14.1497 | 0.007562 | 2.1G |
Q6_K | 14.0675 | 0.001736 | 2.4G |
Q8_0 | 14.0495 | 0.000456 | 2.7G |
This full table documents all the quantization types tested, showing their respective Perplexity (PPL), ln(PPL(Q)/PPL(fp16)), and file sizes.