Update README.md
Browse files
README.md
CHANGED
@@ -16,15 +16,19 @@ Based on original model: [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7
|
|
16 |
Created by: [Qwen](https://huggingface.co/Qwen)
|
17 |
|
18 |
## Quants
|
19 |
-
|
20 |
-
|
21 |
-
[
|
22 |
-
[
|
23 |
-
[
|
24 |
-
[
|
|
|
|
|
25 |
|
26 |
## Quantization notes
|
27 |
-
Made with Exllamav2 0.1.5 and the default dataset.
|
|
|
|
|
28 |
|
29 |
## How to run
|
30 |
|
|
|
16 |
Created by: [Qwen](https://huggingface.co/Qwen)
|
17 |
|
18 |
## Quants
|
19 |
+
|Quant|VRAM/4k|VRAM/8k|VRAM/16k|VRAM/32k|
|
20 |
+
|:---|:---|:---|:---|:---|
|
21 |
+
|[4bpw h6 (main)](https://huggingface.co/cgus/Qwen2-7B-Instruct-abliterated-exl2/tree/main) | 5.3GB | 5.6GB | 5.9GB | 6.8GB |
|
22 |
+
|[4.25bpw h6](https://huggingface.co/cgus/Qwen2-7B-Instruct-abliterated-exl2/tree/4.25bpw-h6) | 5.5GB | 5.8GB | 6.2GB | 7.1GB |
|
23 |
+
|[4.65bpw h6](https://huggingface.co/cgus/Qwen2-7B-Instruct-abliterated-exl2/tree/4.65bpw-h6) | 5.8GB | 6.1GB | 6.5GB | 7.3GB |
|
24 |
+
|[5bpw h6](https://huggingface.co/cgus/Qwen2-7B-Instruct-abliterated-exl2/tree/5bpw-h6) | 6GB | 6.4GB | 6.7GB | 7.7GB |
|
25 |
+
|[6bpw h6](https://huggingface.co/cgus/Qwen2-7B-Instruct-abliterated-exl2/tree/6bpw-h6) | 6.8GB | 7.2GB | 7.5GB | 8.4GB |
|
26 |
+
|[8bpw h8](https://huggingface.co/cgus/Qwen2-7B-Instruct-abliterated-exl2/tree/8bpw-h8) | 8.2GB | 8.6GB | 8.9GB | 9.8GB |
|
27 |
|
28 |
## Quantization notes
|
29 |
+
Made with Exllamav2 0.1.5 and the default dataset.
|
30 |
+
Doesn't seem to work with 4 or 8bit cache with Exllamav2-0.1.5, maybe it could change in the future.
|
31 |
+
I'm quite impressed with its ability to process a non-English text at 32k context with usable results with my 12GB GPU, with 8bpw precision at that.
|
32 |
|
33 |
## How to run
|
34 |
|