readme: update info
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ Quantizised from [https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat](https://h
|
|
22 |
|
23 |
Using llama.cpp [b3026](https://github.com/ggerganov/llama.cpp/releases/tag/b3026) for quantizisation. Given the rapid release of llama.cpp builds, this will likely change over time.
|
24 |
|
25 |
-
**
|
26 |
|
27 |
# Usage:
|
28 |
|
@@ -85,7 +85,8 @@ Note: Use iMatrix quants only if you can fully offload to GPU, otherwise speed w
|
|
85 |
|----------|-------------|-----------|--------------------------------------------|-------------|----------|-------|
|
86 |
| BF16 | Available | 439 GB | Lossless :) | Old | No | Q8_0 is sufficient for most cases |
|
87 |
| Q8_0 | Available | 233.27 GB | High quality *recommended* | Updated | Yes | |
|
88 |
-
|
|
|
|
89 |
| Q4_K_M | Available | 132 GB | Medium quality *recommended* | Old | No | |
|
90 |
| Q3_K_M | Available | 104 GB | Medium-low quality | Updated | Yes | |
|
91 |
| IQ3_XS | Available | 89.6 GB | Better than Q3_K_M | Old | Yes | |
|
@@ -101,7 +102,6 @@ Note: Use iMatrix quants only if you can fully offload to GPU, otherwise speed w
|
|
101 |
| Q5_K_S | |
|
102 |
| Q4_K_S | |
|
103 |
| Q3_K_S | |
|
104 |
-
| Q6_K | |
|
105 |
| IQ4_XS | |
|
106 |
| IQ2_XS | |
|
107 |
| IQ2_S | |
|
@@ -118,10 +118,6 @@ deepseek2.leading_dense_block_count=int:1
|
|
118 |
deepseek2.rope.scaling.yarn_log_multiplier=float:0.0707
|
119 |
```
|
120 |
|
121 |
-
Quants with "Updated" metadata contain these parameters, so as long as you're running a supported build of llama.cpp no `--override-kv` parameters are required.
|
122 |
-
|
123 |
-
A precompiled Windows AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c5063c46a52da3e01.zip` in the root of this repo.
|
124 |
-
|
125 |
# License:
|
126 |
- DeepSeek license for model weights, which can be found in the `LICENSE` file in the root of this repo
|
127 |
- MIT license for any repo code
|
|
|
22 |
|
23 |
Using llama.cpp [b3026](https://github.com/ggerganov/llama.cpp/releases/tag/b3026) for quantizisation. Given the rapid release of llama.cpp builds, this will likely change over time.
|
24 |
|
25 |
+
**Please set the metadata KV overrides below.**
|
26 |
|
27 |
# Usage:
|
28 |
|
|
|
85 |
|----------|-------------|-----------|--------------------------------------------|-------------|----------|-------|
|
86 |
| BF16 | Available | 439 GB | Lossless :) | Old | No | Q8_0 is sufficient for most cases |
|
87 |
| Q8_0 | Available | 233.27 GB | High quality *recommended* | Updated | Yes | |
|
88 |
+
| Q8_0 | Available | ~110 GB | High quality *recommended* | Updated | Yes | |
|
89 |
+
| Q5_K_M | Available | 155 GB | Medium-high quality *recommended* | Updated | Yes | |
|
90 |
| Q4_K_M | Available | 132 GB | Medium quality *recommended* | Old | No | |
|
91 |
| Q3_K_M | Available | 104 GB | Medium-low quality | Updated | Yes | |
|
92 |
| IQ3_XS | Available | 89.6 GB | Better than Q3_K_M | Old | Yes | |
|
|
|
102 |
| Q5_K_S | |
|
103 |
| Q4_K_S | |
|
104 |
| Q3_K_S | |
|
|
|
105 |
| IQ4_XS | |
|
106 |
| IQ2_XS | |
|
107 |
| IQ2_S | |
|
|
|
118 |
deepseek2.rope.scaling.yarn_log_multiplier=float:0.0707
|
119 |
```
|
120 |
|
|
|
|
|
|
|
|
|
121 |
# License:
|
122 |
- DeepSeek license for model weights, which can be found in the `LICENSE` file in the root of this repo
|
123 |
- MIT license for any repo code
|