My quantizations.

by ZeroWw - opened Jul 20

Discussion

ZeroWw

Jul 20

•

edited Jul 23

These are my own quantizations (updated almost daily).

The difference with normal quantizations is that I quantize the output and embed tensors to f16.
and the other tensors to 15_k,q6_k or q8_0.
This creates models that are little or not degraded at all and have a smaller size.
They run at about 3-6 t/sec on CPU only using llama.cpp
And obviously faster on computers with potent GPUs

ZeroWw/L3-8B-Celeste-V1.2-GGUF

AuriAetherwiing

Nothing is Real org Jul 20

Thanks! I've added the link to the model card

AuriAetherwiing changed discussion status to closed Jul 20

aaronday3 changed discussion status to open Jul 23

aaronday3 changed discussion status to closed Jul 23

ZeroWw

Jul 23

Thanks! I've added the link to the model card

the first link was wrong I corrected it.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment