secretmoon/WoonaV1.2-9b-GGUF-Imatrix

About

GGUF imatrix quants of AlexBefest/WoonaV1.2-9b model. All quants, except Q6_k and Q8_0 was maded with imatrix quantization method.

<start_of_turn>user\n {prompt}<end_of_turn>

Name	Quant method	Bits	Size	Min RAM required	Use case
WoonaV1.2-9b-imat-Q2_K.gguf	Q2_K [imatrix]	2	3.5 GB	5.1 GB	small, very high quality loss - not recommended, but usable (probably faster than Q3_XXS, but worse)
WoonaV1.2-9b-imat-IQ3_XXS.gguf	IQ3_XXS [imatrix]	3	3.5 GB	5.1 GB	small, high quality loss
WoonaV1.2-9b-imat-IQ3_M.gguf	IQ3_M [imatrix]	3	4.2 GB	5.7 GB	small, high quality loss
WoonaV1.2-9b-imat-IQ4_XS.gguf	IQ4_XS [imatrix]	4	4.8 GB	6.3 GB	medium, slightly worse than Q4_K_M
WoonaV1.2-9b-imat-Q4_K_S.gguf	Q4_K_S [imatrix]	4	5.1 GB	6.7 GB	medium, balanced quality loss
WoonaV1.2-9b-imat-Q4_K_M.gguf	Q4_K_M [imatrix]	4	5.4 GB	6.9 GB	medium, balanced quality - recommended
WoonaV1.2-9b-imat-Q5_K_S.gguf	Q5_K_S [imatrix]	5	6 GB	7.6 GB	large, low quality loss - recommended
WoonaV1.2-9b-imat-Q5_K_M.gguf	Q5_K_M [imatrix]	5	6.2 GB	7.8 GB	large, very low quality loss - recommended
WoonaV1.2-9b-Q6_K.gguf	Q6_K [static]	6	7.1 GB	8.7 GB	very large, near perfect quality - recommended
WoonaV1.2-9b-Q8_0.gguf	Q8_0 [static]	8	9.2 GB	10.8 GB	very large, extremely low quality loss

llama.cpp The opensource framework for running GGUF LLM models on which all other interfaces are made.
koboldcpp Easy method for windows inference. Lightweight open source fork llama.cpp with a simple graphical interface and many additional features.
LM studio Proprietary free fork llama.cpp with a graphical interface.