Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,22 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
|
5 |
+
This repository contains improved Mistral-7B quantized models in GGUF format for use with `llama.cpp`. The models are fully compatible with the oficial `llama.cpp` release and can be used out=of-the-box.
|
6 |
+
|
7 |
+
The table shows a comparison between these models and the current `llama.cpp` quantization approach using Wikitext perplexities for a context length of 512 tokens.
|
8 |
+
The "Quantization Error" columns in the table are defined as `(PPL(quantized model) - PPL(fp16))/PPL(fp16)`.
|
9 |
+
|
10 |
+
| Quantization | Model file | PPL(llama.cpp) | Quantization Error | PPL(new quants) | Quantization Error |
|
11 |
+
|--:|--:|--:|--:|--:|--:|
|
12 |
+
|Q3_K_S | mistral-7b-q3ks.gguf | 6.0692 | 6.62% | 6.0021 | 5.44% |
|
13 |
+
|Q3_K_M| mistral-7b-q3km.gguf | 5.8894 | 3.46% | 5.8489 | 2.75% |
|
14 |
+
|Q4_K_S| mistral-7b-q4ks.gguf | 5.7764 | 1.48% | 5.7349 | 0.75% |
|
15 |
+
|Q4_K_M| mistral-7b-q4km.gguf | 5.7539 | 1.08% | 5.7259 | 0.59% |
|
16 |
+
|Q5_K_S | mistral-7b-q5ks.gguf | 5.7258 | 0.59% | 5.7100 | 0.31% |
|
17 |
+
|Q4_0 | mistral-7b-q40.gguf | 5.8189 | 2.23% | 5.7924 | 1.76% |
|
18 |
+
|Q4_1 | mistral-7b-q41.gguf | 5.8244 | 2.32% | 5.7455 | 0.94% |
|
19 |
+
|Q5_0 | mistral-7b-q50.gguf | 5.7180 | 0.45% | 5.7070 | 0.26% |
|
20 |
+
|Q5_1 | mistral-7b-q51.gguf | 5.7128 | 0.36% | 5.7057 | 0.24% |
|
21 |
+
|
22 |
+
In addition, a 2-bit model is provided (`mistral-7b-q2k-extra-small.gguf`). It has a perplexity of `6.7099` for a context length of 512, and `5.5744` for a context of 4096.
|