InferenceIllusionist
commited on
Commit
•
1b17f50
1
Parent(s):
b03731a
Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ tags:
|
|
11 |
|
12 |
|
13 |
<b>Special request.</b> Quantized from fp32 with love. If you can't fit IQ quants in your VRAM, try using the K quants in this repo instead.
|
14 |
-
* The [.imatrix](https://huggingface.co/InferenceIllusionist/Llama-3-70B-Instruct-Storywriter-iMat-GGUF/resolve/main/Llama-3-70B-Instruct-Storywriter.imatrix?download=true) file in this repo was created using
|
15 |
* Calculated in 88 chunks with n_ctx=512 using groups_merged.txt
|
16 |
|
17 |
For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)
|
|
|
11 |
|
12 |
|
13 |
<b>Special request.</b> Quantized from fp32 with love. If you can't fit IQ quants in your VRAM, try using the K quants in this repo instead.
|
14 |
+
* The [.imatrix](https://huggingface.co/InferenceIllusionist/Llama-3-70B-Instruct-Storywriter-iMat-GGUF/resolve/main/Llama-3-70B-Instruct-Storywriter.imatrix?download=true) file in this repo was created using the Q8_0 quantization of Llama-3-70B-Instruct-Storywriter-iMat-GGUF.
|
15 |
* Calculated in 88 chunks with n_ctx=512 using groups_merged.txt
|
16 |
|
17 |
For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)
|