leafspark
/

DeepSeek-V2-Chat-GGUF

@@ -22,7 +22,7 @@ Quantizised from [https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat](https://h
 Using llama.cpp [b3026](https://github.com/ggerganov/llama.cpp/releases/tag/b3026) for quantizisation. Given the rapid release of llama.cpp builds, this will likely change over time.
-**If you are using an older quant, please set the metadata KV overrides below.**
 # Usage:
@@ -85,7 +85,8 @@ Note: Use iMatrix quants only if you can fully offload to GPU, otherwise speed w
 |----------|-------------|-----------|--------------------------------------------|-------------|----------|-------|
 | BF16     | Available   | 439 GB    | Lossless :)                                | Old         | No       | Q8_0 is sufficient for most cases |
 | Q8_0     | Available   | 233.27 GB | High quality *recommended*                 | Updated     | Yes      |       |
-| Q5_K_M   | Uploading   | 155 GB    | Medium-low quality                         | Updated     | Yes      |       |
 | Q4_K_M   | Available   | 132 GB    | Medium quality *recommended*               | Old         | No       |       |
 | Q3_K_M   | Available   | 104 GB    | Medium-low quality                         | Updated     | Yes      |       |
 | IQ3_XS   | Available   | 89.6 GB   | Better than Q3_K_M                         | Old         | Yes      |       |
@@ -101,7 +102,6 @@ Note: Use iMatrix quants only if you can fully offload to GPU, otherwise speed w
 | Q5_K_S            |         |
 | Q4_K_S            |         |
 | Q3_K_S            |         |
-| Q6_K              |         |
 | IQ4_XS            |         |
 | IQ2_XS            |         |
 | IQ2_S             |         |
@@ -118,10 +118,6 @@ deepseek2.leading_dense_block_count=int:1
 deepseek2.rope.scaling.yarn_log_multiplier=float:0.0707
 ```
-Quants with "Updated" metadata contain these parameters, so as long as you're running a supported build of llama.cpp no `--override-kv` parameters are required.
-A precompiled Windows AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c5063c46a52da3e01.zip` in the root of this repo.
 # License:
 - DeepSeek license for model weights, which can be found in the `LICENSE` file in the root of this repo
 - MIT license for any repo code

 Using llama.cpp [b3026](https://github.com/ggerganov/llama.cpp/releases/tag/b3026) for quantizisation. Given the rapid release of llama.cpp builds, this will likely change over time.
+**Please set the metadata KV overrides below.**
 # Usage:
 |----------|-------------|-----------|--------------------------------------------|-------------|----------|-------|
 | BF16     | Available   | 439 GB    | Lossless :)                                | Old         | No       | Q8_0 is sufficient for most cases |
 | Q8_0     | Available   | 233.27 GB | High quality *recommended*                 | Updated     | Yes      |       |
+| Q8_0     | Available   | ~110 GB   | High quality *recommended*                 | Updated     | Yes      |       |
+| Q5_K_M   | Available   | 155 GB    | Medium-high quality *recommended*          | Updated     | Yes      |       |
 | Q4_K_M   | Available   | 132 GB    | Medium quality *recommended*               | Old         | No       |       |
 | Q3_K_M   | Available   | 104 GB    | Medium-low quality                         | Updated     | Yes      |       |
 | IQ3_XS   | Available   | 89.6 GB   | Better than Q3_K_M                         | Old         | Yes      |       |
 | Q5_K_S            |         |
 | Q4_K_S            |         |
 | Q3_K_S            |         |
 | IQ4_XS            |         |
 | IQ2_XS            |         |
 | IQ2_S             |         |
 deepseek2.rope.scaling.yarn_log_multiplier=float:0.0707
 ```
 # License:
 - DeepSeek license for model weights, which can be found in the `LICENSE` file in the root of this repo
 - MIT license for any repo code