InferenceIllusionist
/

DarkForest-20B-v2.0-iMat-GGUF

Not-For-All-Audiences

Inference Endpoints

Model card Files Files and versions Community

InferenceIllusionist commited on Mar 3

Commit

9e8a02d

•

1 Parent(s): 594cbdb

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -16,7 +16,9 @@ After testing the new important matrix quants for 11b and 8x7b models and being
 <b>❗❗Need a different quantization/model? Please open a community post and I'll get back to you - thanks ❗❗ </b>
 Newer quants (IQ3_S, IQ4_NL, etc) are confirmed working in Koboldcpp as of 1.59.1 - if you run into any issues kindly let me know.
-For IQ3_S only: Please offload to GPU completely for best speed.
 (Credits to [TeeZee](https://huggingface.co/TeeZee/) for the original model and [ikawrakow](https://github.com/ikawrakow) for the stellar work on IQ quants)

 <b>❗❗Need a different quantization/model? Please open a community post and I'll get back to you - thanks ❗❗ </b>
 Newer quants (IQ3_S, IQ4_NL, etc) are confirmed working in Koboldcpp as of 1.59.1 - if you run into any issues kindly let me know.
+<s>For IQ3_S only: Please offload to GPU completely for best speed.</s> Update: No longer necessary. IQ3_S has been generated after PR [#5829](https://github.com/ggerganov/llama.cpp/pull/5829) was merged.
+This should provide a significant speed boost even if you are offloading to CPU.
 (Credits to [TeeZee](https://huggingface.co/TeeZee/) for the original model and [ikawrakow](https://github.com/ikawrakow) for the stellar work on IQ quants)