Joseph717171
/

Llama-3.1-SuperNova-Lite-8.0B-OQ8_0-F32.EF32.IQ4_K-Q8_0-GGUF

Inference Endpoints

Model card Files Files and versions Community

Joseph717171 commited on Sep 17

Commit

24cf3e5

•

1 Parent(s): f57e8fd

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -3,4 +3,4 @@ Custom GGUF quants of arcee-ai’s [Llama-3.1-SuperNova-Lite-8B](https://hugging
 Update: For some reason, the model was initially smaller than LLama-3.1-8B-Instruct after quantizing. This has since been rectified: if you want the most intelligent and most capable quantized GGUF version of Llama-3.1-SuperNova-Lite-8.0B, use the OF32.EF32.IQuants.
 The original OQ8_0.EF32.IQuants will remain in the repo for those who want to use them. Cheers! 😁
-Addendum: The 0Q8_0.EF32.IQuants are right for the model's size; I was just being naive: I was comparing my OQ8_0.EF32 IQuants of Llama-3.1-SuperNova-Lite-8B to that of my OQ8_0.EF32 IQuants of Hermes-3-Llama-3.1-8B - thinking they were both the same size as my OQ8_0.EF32.IQuants of LLama-3.1-8B-Instruct; they're not: Hereme-3-Llama-3.1-8B is bigger. So, now we have both OQ8_0.EF32.IQuants and OF32.EF32.IQuants, and they're both great quant schemes. The only difference is being, of course, that OF32.EF32.IQuants have even more accuracy at the expense of more vRAM. So, there you have it. I'm a dumbass, but its okay because I learned something, and now we have even more quantizations to play with now. Cheers! 😂😏

 Update: For some reason, the model was initially smaller than LLama-3.1-8B-Instruct after quantizing. This has since been rectified: if you want the most intelligent and most capable quantized GGUF version of Llama-3.1-SuperNova-Lite-8.0B, use the OF32.EF32.IQuants.
 The original OQ8_0.EF32.IQuants will remain in the repo for those who want to use them. Cheers! 😁
+Addendum: The OQ8_0.EF32.IQuants are right for the model's size; I was just being naive: I was comparing my OQ8_0.EF32 IQuants of Llama-3.1-SuperNova-Lite-8B to that of my OQ8_0.EF32 IQuants of Hermes-3-Llama-3.1-8B - thinking they were both the same size as my OQ8_0.EF32.IQuants of LLama-3.1-8B-Instruct; they're not: Hereme-3-Llama-3.1-8B is bigger. So, now we have both OQ8_0.EF32.IQuants and OF32.EF32.IQuants, and they're both great quant schemes. The only difference is being, of course, that OF32.EF32.IQuants have even more accuracy at the expense of more vRAM. So, there you have it. I'm a dumbass, but its okay because I learned something, and now we have even more quantizations to play with now. Cheers! 😂😏