fedric95/gemma-2-9b-GGUF

Llamacpp Quantizations of Meta-Llama-3.1-8B

Using llama.cpp release b3583 for quantization.

Filename	Quant type	File Size	Perplexity (wikitext-2-raw-v1.test)
gemma-2-9b.FP32.gguf	FP32	37.00GB	6.9209 +/- 0.04660
gemma-2-9b-Q8_0.gguf	Q8_0	9.83GB	6.9222 +/- 0.04660
gemma-2-9b-Q6_K.gguf	Q6_K	7.59GB	6.9353 +/- 0.04675
gemma-2-9b-Q5_K_M.gguf	Q5_K_M	6.65GB	6.9571 +/- 0.04687
gemma-2-9b-Q5_K_S.gguf	Q5_K_S	6.48GB	6.9623 +/- 0.04690
gemma-2-9b-Q4_K_M.gguf	Q4_K_M	5.76GB	7.0220 +/- 0.04737
gemma-2-9b-Q4_K_S.gguf	Q4_K_S	5.48GB	7.0622 +/- 0.04777
gemma-2-9b-Q3_K_L.gguf	Q3_K_L	5.13GB	7.2144 +/- 0.04910
gemma-2-9b-Q3_K_M.gguf	Q3_K_M	4.76GB	7.2849 +/- 0.04970
gemma-2-9b-Q3_K_S.gguf	Q3_K_S	4.34GB	7.6869 +/- 0.05373
gemma-2-9b-Q2_K.gguf	Q2_K	3.81GB	8.7979 +/- 0.06191

Results have been computed using:

First, make sure you have hugginface-cli installed:

pip install -U "huggingface_hub[cli]"

Then, you can target the specific file you want:

huggingface-cli download fedric95/gemma-2-9b-GGUF --include "gemma-2-9b-Q4_K_M.gguf" --local-dir ./

If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:

huggingface-cli download fedric95/gemma-2-9b-GGUF --include "gemma-2-9b-Q8_0.gguf/*" --local-dir gemma-2-9b-Q8_0

You can either specify a new local-dir (gemma-2-9b-Q8_0) or download them all in place (./)