@appoose on Hugging Face: "Excited to announce the release of our high-quality Llama-3.1 8B 4-bit…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

appoose

posted an update Jul 31

Post

1788

Excited to announce the release of our high-quality Llama-3.1 8B 4-bit HQQ/calibrated quantized model! Achieving an impressive 99.3% relative performance to FP16, it also delivers the fastest inference speed for transformers.

mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib

SicariusSicariiStuff

Jul 31

How's the speed compared to EXL2 quant at the same bits per weight?

In this post

appoose Shaij
SicariusSicariiStuff Sica Rius