Edit model card

About

GGUF imatrix quants of AlexBefest/WoonaV1.2-9b model. All quants, except Q6_k and Q8_0 was maded with imatrix quantization method.

image/png

Prompt template: Gemma (RECOMMENDED TEMP=0.3-0.5)

<start_of_turn>user\n {prompt}<end_of_turn>

Provided files

Name Quant method Bits Size Min RAM required Use case
WoonaV1.2-9b-imat-Q2_K.gguf Q2_K [imatrix] 2 3.5 GB 5.1 GB small, very high quality loss - not recommended, but usable (probably faster than Q3_XXS, but worse)
WoonaV1.2-9b-imat-IQ3_XXS.gguf IQ3_XXS [imatrix] 3 3.5 GB 5.1 GB small, high quality loss
WoonaV1.2-9b-imat-IQ3_M.gguf IQ3_M [imatrix] 3 4.2 GB 5.7 GB small, high quality loss
WoonaV1.2-9b-imat-IQ4_XS.gguf IQ4_XS [imatrix] 4 4.8 GB 6.3 GB medium, slightly worse than Q4_K_M
WoonaV1.2-9b-imat-Q4_K_S.gguf Q4_K_S [imatrix] 4 5.1 GB 6.7 GB medium, balanced quality loss
WoonaV1.2-9b-imat-Q4_K_M.gguf Q4_K_M [imatrix] 4 5.4 GB 6.9 GB medium, balanced quality - recommended
WoonaV1.2-9b-imat-Q5_K_S.gguf Q5_K_S [imatrix] 5 6 GB 7.6 GB large, low quality loss - recommended
WoonaV1.2-9b-imat-Q5_K_M.gguf Q5_K_M [imatrix] 5 6.2 GB 7.8 GB large, very low quality loss - recommended
WoonaV1.2-9b-Q6_K.gguf Q6_K [static] 6 7.1 GB 8.7 GB very large, near perfect quality - recommended
WoonaV1.2-9b-Q8_0.gguf Q8_0 [static] 8 9.2 GB 10.8 GB very large, extremely low quality loss

How to Use

  • llama.cpp The opensource framework for running GGUF LLM models on which all other interfaces are made.
  • koboldcpp Easy method for windows inference. Lightweight open source fork llama.cpp with a simple graphical interface and many additional features.
  • LM studio Proprietary free fork llama.cpp with a graphical interface.
Downloads last month
546
GGUF
Model size
9.24B params
Architecture
gemma2

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for secretmoon/WoonaV1.2-9b-GGUF-Imatrix

Base model

google/gemma-2-9b
Quantized
this model