InferenceIllusionist
commited on
Commit
•
61b79de
1
Parent(s):
9e8a02d
Update README.md
Browse files
README.md
CHANGED
@@ -15,10 +15,9 @@ After testing the new important matrix quants for 11b and 8x7b models and being
|
|
15 |
|
16 |
<b>❗❗Need a different quantization/model? Please open a community post and I'll get back to you - thanks ❗❗ </b>
|
17 |
|
18 |
-
Newer quants (
|
19 |
|
20 |
-
|
21 |
-
This should provide a significant speed boost even if you are offloading to CPU.
|
22 |
|
23 |
(Credits to [TeeZee](https://huggingface.co/TeeZee/) for the original model and [ikawrakow](https://github.com/ikawrakow) for the stellar work on IQ quants)
|
24 |
|
|
|
15 |
|
16 |
<b>❗❗Need a different quantization/model? Please open a community post and I'll get back to you - thanks ❗❗ </b>
|
17 |
|
18 |
+
<i>UPDATE 3/4/24: Newer quants ([IQ4_XS](https://github.com/ggerganov/llama.cpp/pull/5747), IQ2_S, etc) are confirmed working in Koboldcpp as of version <b>[1.60](https://github.com/LostRuins/koboldcpp/releases/tag/v1.60)</b> - if you run into any issues kindly let me know.</i>
|
19 |
|
20 |
+
IQ3_S has been generated after PR [#5829](https://github.com/ggerganov/llama.cpp/pull/5829) was merged. This should provide a significant speed boost even if you are offloading to CPU.
|
|
|
21 |
|
22 |
(Credits to [TeeZee](https://huggingface.co/TeeZee/) for the original model and [ikawrakow](https://github.com/ikawrakow) for the stellar work on IQ quants)
|
23 |
|