salamandra-2b / quantization_results.md
robbiemu's picture
update for quantization
5dadba4
|
raw
history blame
1.76 kB

Full Perplexity Comparison Table for Release Documentation

Quantization Type PPL ln(PPL(Q)/PPL(fp16)) File Size
bf16 14.0431 0.0 4.2G
IQ2_XS 28.9052 0.72189 1.5G
IQ3_M 15.1995 0.079131 1.7G
IQ3_S 15.8627 0.121839 1.7G
IQ3_XS 16.7197 0.174456 1.7G
IQ3_XXS 17.6216 0.226994 1.7G
IQ4_NL 14.5534 0.035693 1.9G
IQ4_XS 14.5638 0.036408 1.8G
Q3_K_L 15.0444 0.068875 1.8G
Q3_K_M 15.2582 0.082986 1.8G
Q3_K_S 15.839 0.120344 1.7G
Q4_K_M 14.399 0.025028 2.0G
Q4_K_S 14.4338 0.027442 1.9G
Q5_K_M 14.1299 0.006162 2.2G
Q5_K_S 14.1497 0.007562 2.1G
Q6_K 14.0675 0.001736 2.4G
Q8_0 14.0495 0.000456 2.7G

This full table documents all the quantization types tested, showing their respective Perplexity (PPL), ln(PPL(Q)/PPL(fp16)), and file sizes.