cgus
/

Qwen2-7B-Instruct-abliterated-exl2

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

cgus commited on Jun 18

Commit

ff63363

•

1 Parent(s): 7f3adc3

Update README.md

Files changed (1) hide show

README.md +11 -7

README.md CHANGED Viewed

@@ -16,15 +16,19 @@ Based on original model: [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7
 Created by: [Qwen](https://huggingface.co/Qwen)
 ## Quants
-[4bpw h6 (main)](https://huggingface.co/cgus/Qwen2-7B-Instruct-abliterated-exl2/tree/main)
-[4.25bpw h6](https://huggingface.co/cgus/Qwen2-7B-Instruct-abliterated-exl2/tree/4.25bpw-h6)
-[4.65bpw h6](https://huggingface.co/cgus/Qwen2-7B-Instruct-abliterated-exl2/tree/4.65bpw-h6)
-[5bpw h6](https://huggingface.co/cgus/Qwen2-7B-Instruct-abliterated-exl2/tree/5bpw-h6)
-[6bpw h6](https://huggingface.co/cgus/Qwen2-7B-Instruct-abliterated-exl2/tree/6bpw-h6)
-[8bpw h8](https://huggingface.co/cgus/Qwen2-7B-Instruct-abliterated-exl2/tree/8bpw-h8)
 ## Quantization notes
-Made with Exllamav2 0.1.5 and the default dataset.
 ## How to run

 Created by: [Qwen](https://huggingface.co/Qwen)
 ## Quants
+|Quant|VRAM/4k|VRAM/8k|VRAM/16k|VRAM/32k|
+|:---|:---|:---|:---|:---|
+|[4bpw h6 (main)](https://huggingface.co/cgus/Qwen2-7B-Instruct-abliterated-exl2/tree/main) | 5.3GB | 5.6GB | 5.9GB | 6.8GB |
+|[4.25bpw h6](https://huggingface.co/cgus/Qwen2-7B-Instruct-abliterated-exl2/tree/4.25bpw-h6) | 5.5GB | 5.8GB | 6.2GB | 7.1GB |
+|[4.65bpw h6](https://huggingface.co/cgus/Qwen2-7B-Instruct-abliterated-exl2/tree/4.65bpw-h6) | 5.8GB | 6.1GB | 6.5GB | 7.3GB |
+|[5bpw h6](https://huggingface.co/cgus/Qwen2-7B-Instruct-abliterated-exl2/tree/5bpw-h6) | 6GB | 6.4GB | 6.7GB | 7.7GB |
+|[6bpw h6](https://huggingface.co/cgus/Qwen2-7B-Instruct-abliterated-exl2/tree/6bpw-h6) | 6.8GB | 7.2GB | 7.5GB | 8.4GB |
+|[8bpw h8](https://huggingface.co/cgus/Qwen2-7B-Instruct-abliterated-exl2/tree/8bpw-h8) | 8.2GB | 8.6GB | 8.9GB | 9.8GB |
 ## Quantization notes
+Made with Exllamav2 0.1.5 and the default dataset.
+Doesn't seem to work with 4 or 8bit cache with Exllamav2-0.1.5, maybe it could change in the future.
+I'm quite impressed with its ability to process a non-English text at 32k context with usable results with my 12GB GPU, with 8bpw precision at that.
 ## How to run