mradermacher
commited on
Commit
•
ab951ef
1
Parent(s):
f0c542d
Update README.md
Browse files
README.md
CHANGED
@@ -39,7 +39,7 @@ The only fix seems to be to delete the repo, which unfortunately also deletes th
|
|
39 |
|
40 |
The quant types I currently do regularly are:
|
41 |
|
42 |
-
- static: (
|
43 |
- imatrix: Q2_K Q4_K_S IQ3_XXS Q3_K_M Q4_K_M IQ2_M Q6_K IQ4_XS Q3_K_S Q3_K_L Q5_K_S Q5_K_M Q4_0 IQ3_XS IQ3_S IQ3_M IQ2_XXS IQ2_XS IQ2_S IQ1_M IQ1_S (Q4_0_4_4 Q4_0_4_8 Q4_0_8_8)
|
44 |
|
45 |
And they are generally (but not always) generated in the order above, for which there are deep reasons.
|
@@ -48,18 +48,18 @@ For models less than 11B size, I experimentally generate f16 versions at the mom
|
|
48 |
|
49 |
For models less than 15B in size, the "arm only" Q4_0_4_4 static and Q4_0_4_4/Q4_0_4_8/Q4_0_8_8 imatrix quants will be generated.
|
50 |
|
51 |
-
Older models that pre-date introduction of new quant types generally will have them retrofitted, hopefully
|
52 |
-
this year. At least when multiple quant types are missing, as it is hard to justify a big mdoel download
|
53 |
-
for just one quant. If you want a quant form the above list and don't want to wait, feel free to request it and I will
|
54 |
-
prioritize it to the best of my abilities.
|
55 |
-
|
56 |
The (static) IQ3 quants are no longer generated, as they consistently seem to result in *much* lower quality
|
57 |
quants than even static Q2_K, so it would be s disservice to offer them.
|
58 |
|
59 |
I specifically do not do Q2_K_S, because I generally think it is not worth it, and IQ4_NL, because it requires
|
60 |
a lot of computing and is generally completely superseded by IQ4_XS.
|
61 |
|
62 |
-
|
|
|
|
|
|
|
|
|
|
|
63 |
|
64 |
## What does the "-i1" mean in "-i1-GGUF"?
|
65 |
|
|
|
39 |
|
40 |
The quant types I currently do regularly are:
|
41 |
|
42 |
+
- static: (f16) Q8_0 Q4_K_S Q2_K Q6_K Q3_K_M Q3_K_S Q3_K_L Q4_K_M Q5_K_S Q5_K_M IQ4_XS (Q4_0_4)
|
43 |
- imatrix: Q2_K Q4_K_S IQ3_XXS Q3_K_M Q4_K_M IQ2_M Q6_K IQ4_XS Q3_K_S Q3_K_L Q5_K_S Q5_K_M Q4_0 IQ3_XS IQ3_S IQ3_M IQ2_XXS IQ2_XS IQ2_S IQ1_M IQ1_S (Q4_0_4_4 Q4_0_4_8 Q4_0_8_8)
|
44 |
|
45 |
And they are generally (but not always) generated in the order above, for which there are deep reasons.
|
|
|
48 |
|
49 |
For models less than 15B in size, the "arm only" Q4_0_4_4 static and Q4_0_4_4/Q4_0_4_8/Q4_0_8_8 imatrix quants will be generated.
|
50 |
|
|
|
|
|
|
|
|
|
|
|
51 |
The (static) IQ3 quants are no longer generated, as they consistently seem to result in *much* lower quality
|
52 |
quants than even static Q2_K, so it would be s disservice to offer them.
|
53 |
|
54 |
I specifically do not do Q2_K_S, because I generally think it is not worth it, and IQ4_NL, because it requires
|
55 |
a lot of computing and is generally completely superseded by IQ4_XS.
|
56 |
|
57 |
+
Q8_0 imatrix quants do not exist - some quanters claim otherwise, but Q8_0 ggufs do not contain any tensor
|
58 |
+
type that uses the imatrix data, although technically it might be possible to do so.
|
59 |
+
|
60 |
+
Older models that pre-date introduction of new quant types generally will have them retrofitted on request.
|
61 |
+
|
62 |
+
You can always try to change my mind about all this, but be prepared to bring convincing data.
|
63 |
|
64 |
## What does the "-i1" mean in "-i1-GGUF"?
|
65 |
|