Edit model card

Llamacpp Quantizations of Meta-Llama-3.1-8B

Using llama.cpp release b3583 for quantization.

Original model: https://huggingface.co/google/Qwen2-7B

Download a file (not the whole branch) from below:

Filename Quant type File Size Perplexity (wikitext-2-raw-v1.test)
Qwen2-7B.BF16.gguf BF16 15.2GB coming_soon
Qwen2-7B-Q8_0.gguf Q8_0 8.1GB 7.3817 +/- 0.04777
Qwen2-7B-Q6_K.gguf Q6_K 6.25GB 7.3914 +/- 0.04776
Qwen2-7B-Q5_K_M.gguf Q5_K_M 5.44GB 7.4067 +/- 0.04794
Qwen2-7B-Q5_K_S.gguf Q5_K_S 5.32GB 7.4291 +/- 0.04822
Qwen2-7B-Q4_K_M.gguf Q4_K_M 4.68GB 7.4796 +/- 0.04856
Qwen2-7B-Q4_K_S.gguf Q4_K_S 4.46GB 7.5221 +/- 0.04879
Qwen2-7B-Q3_K_L.gguf Q3_K_L 4.09GB 7.6843 +/- 0.05000
Qwen2-7B-Q3_K_M.gguf Q3_K_M 3.81GB 7.7390 +/- 0.05015
Qwen2-7B-Q3_K_S.gguf Q3_K_S 3.49GB 9.3743 +/- 0.06023
Qwen2-7B-Q2_K.gguf Q2_K 3.02GB 10.5122 +/- 0.06850

Benchmark Results

Benchmark Quant type Metric
WinoGrande (0-shot) Q8_0 71.8232 +/- 1.2643
WinoGrande (0-shot) Q4_K_M 71.3496 +/- 1.2707
WinoGrande (0-shot) Q3_K_M 70.1657 +/- 1.2859
WinoGrande (0-shot) Q3_K_S 70.3236 +/- 1.2839
WinoGrande (0-shot) Q2_K 68.2715 +/- 1.3081
HellaSwag (0-shot) Q8_0 78.00238996
HellaSwag (0-shot) Q4_K_M 77.92272456
HellaSwag (0-shot) Q3_K_M 76.97669787
HellaSwag (0-shot) Q3_K_S 74.96514639
HellaSwag (0-shot) Q2_K 72.71459869
MMLU (0-shot) Q8_0 39.1473 +/- 1.2409
MMLU (0-shot) Q4_K_M 38.5013 +/- 1.2372
MMLU (0-shot) Q3_K_M 38.0491 +/- 1.2344
MMLU (0-shot) Q3_K_S 39.3411 +/- 1.2420
MMLU (0-shot) Q2_K 35.4005 +/- 1.2158

Downloading using huggingface-cli

First, make sure you have hugginface-cli installed:

pip install -U "huggingface_hub[cli]"

Then, you can target the specific file you want:

huggingface-cli download fedric95/Qwen2-7B-GGUF --include "Qwen2-7B-Q4_K_M.gguf" --local-dir ./

If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:

huggingface-cli download fedric95/Qwen2-7B-GGUF --include "Qwen2-7B-Q8_0.gguf/*" --local-dir Qwen2-7B-Q8_0

You can either specify a new local-dir (Qwen2-7B-Q8_0) or download them all in place (./)

Reproducibility

Same instructions of: https://github.com/ggerganov/llama.cpp/discussions/9020#discussioncomment-10335638

Downloads last month
428
GGUF
Model size
7.62B params
Architecture
qwen2

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for fedric95/Qwen2-7B-GGUF

Base model

Qwen/Qwen2-7B
Quantized
this model