InferenceIllusionist commited on
Commit
61b79de
1 Parent(s): 9e8a02d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -15,10 +15,9 @@ After testing the new important matrix quants for 11b and 8x7b models and being
15
 
16
  <b>❗❗Need a different quantization/model? Please open a community post and I'll get back to you - thanks ❗❗ </b>
17
 
18
- Newer quants (IQ3_S, IQ4_NL, etc) are confirmed working in Koboldcpp as of 1.59.1 - if you run into any issues kindly let me know.
19
 
20
- <s>For IQ3_S only: Please offload to GPU completely for best speed.</s> Update: No longer necessary. IQ3_S has been generated after PR [#5829](https://github.com/ggerganov/llama.cpp/pull/5829) was merged.
21
- This should provide a significant speed boost even if you are offloading to CPU.
22
 
23
  (Credits to [TeeZee](https://huggingface.co/TeeZee/) for the original model and [ikawrakow](https://github.com/ikawrakow) for the stellar work on IQ quants)
24
 
 
15
 
16
  <b>❗❗Need a different quantization/model? Please open a community post and I'll get back to you - thanks ❗❗ </b>
17
 
18
+ <i>UPDATE 3/4/24: Newer quants ([IQ4_XS](https://github.com/ggerganov/llama.cpp/pull/5747), IQ2_S, etc) are confirmed working in Koboldcpp as of version <b>[1.60](https://github.com/LostRuins/koboldcpp/releases/tag/v1.60)</b> - if you run into any issues kindly let me know.</i>
19
 
20
+ IQ3_S has been generated after PR [#5829](https://github.com/ggerganov/llama.cpp/pull/5829) was merged. This should provide a significant speed boost even if you are offloading to CPU.
 
21
 
22
  (Credits to [TeeZee](https://huggingface.co/TeeZee/) for the original model and [ikawrakow](https://github.com/ikawrakow) for the stellar work on IQ quants)
23