A Python equiv
#1
by
Pythonic456
- opened
How would I use and run the 4 bit quantized model on my local machine? Sorry, I am not very experienced with this side of Python/Torch etc. Any help is much appreciated!
You can make it with Candle to infer your quantized model.
So you gain in quick load model and inferencing time also. And from Python, you just call via the REST API.
@Pythonic456