Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Custom GGUF quants of arcee-aiโ€™s Llama-3.1-SuperNova-Lite-8B, where the Output Tensors are quantized to Q8_0 while the Embeddings are kept at F32. Enjoy! ๐Ÿง ๐Ÿ”ฅ๐Ÿš€

UPDATE: This repo now contains updated O.E.IQuants, which were quantized, using a new F32-imatrix, using llama.cpp version: 4067 (54ef9cfc). This particular version of llama.cpp made it so all KQ mat_mul computations were done in F32 vs BF16, when using FA (Flash Attention). This change, plus the other very impactful prior change, which made all KQ mat_muls be computed with F32 (float32) precision for CUDA-Enabled devices, has compoundedly enhanced the O.E.IQuants and has made it excitingly necessary for this update to be pushed. Cheers!

Downloads last month
577
GGUF
Model size
8.03B params
Architecture
llama

4-bit

6-bit

8-bit

Inference API
Unable to determine this model's library. Check the docs .

Collection including Joseph717171/Llama-3.1-SuperNova-Lite-8.0B-OQ8_0-F32.EF32.IQ4_K-Q8_0-GGUF