Longer inference time
#4
by
dittops
- opened
Inference time seems higher than a normal fp16 model. I was expecting better throughput as the advantage of 1bit models
The advantage of 1 bit models is that they are 32 smaller compared ro 32 bit model. The inference on 1 bit models includes the overhead of dequantization.
However, as per the paper, there is a significant improvement in memory and throughput.