Post
530
Check out AutoRound, SOTA LLM quantization algorithm across 2-4 bits without adding any inference overhead to any model
paper: https://arxiv.org/abs/2309.05516
github: https://github.com/intel/auto-round
lowbits leaderboard: https://huggingface.co/spaces/Intel/low-bit-leaderboard
paper: https://arxiv.org/abs/2309.05516
github: https://github.com/intel/auto-round
lowbits leaderboard: https://huggingface.co/spaces/Intel/low-bit-leaderboard