Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
appoose 
posted an update Jul 31
Post
1788
Excited to announce the release of our high-quality Llama-3.1 8B 4-bit HQQ/calibrated quantized model! Achieving an impressive 99.3% relative performance to FP16, it also delivers the fastest inference speed for transformers.

mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib

How's the speed compared to EXL2 quant at the same bits per weight?