InferenceIllusionist
commited on
Commit
•
9e8a02d
1
Parent(s):
594cbdb
Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,9 @@ After testing the new important matrix quants for 11b and 8x7b models and being
|
|
16 |
<b>❗❗Need a different quantization/model? Please open a community post and I'll get back to you - thanks ❗❗ </b>
|
17 |
|
18 |
Newer quants (IQ3_S, IQ4_NL, etc) are confirmed working in Koboldcpp as of 1.59.1 - if you run into any issues kindly let me know.
|
19 |
-
|
|
|
|
|
20 |
|
21 |
(Credits to [TeeZee](https://huggingface.co/TeeZee/) for the original model and [ikawrakow](https://github.com/ikawrakow) for the stellar work on IQ quants)
|
22 |
|
|
|
16 |
<b>❗❗Need a different quantization/model? Please open a community post and I'll get back to you - thanks ❗❗ </b>
|
17 |
|
18 |
Newer quants (IQ3_S, IQ4_NL, etc) are confirmed working in Koboldcpp as of 1.59.1 - if you run into any issues kindly let me know.
|
19 |
+
|
20 |
+
<s>For IQ3_S only: Please offload to GPU completely for best speed.</s> Update: No longer necessary. IQ3_S has been generated after PR [#5829](https://github.com/ggerganov/llama.cpp/pull/5829) was merged.
|
21 |
+
This should provide a significant speed boost even if you are offloading to CPU.
|
22 |
|
23 |
(Credits to [TeeZee](https://huggingface.co/TeeZee/) for the original model and [ikawrakow](https://github.com/ikawrakow) for the stellar work on IQ quants)
|
24 |
|