Specs for inference
#5
by
mzhadigerov
- opened
What is the size of VRAM required to run it for inference?
Roughly 14gb vram
Use quantized GGUF/GGML/AWQ models if you want to run on machines with lower computational resources.
Yeah then it will be roughly 6gb vram.
Can you suggest the smallest SageMaker instance I can use to deploy? For some reason loading the model via sample notebook given fails on the ml.g5.12xlarge instance even though the VRAM should be enough based on your suggestion?
@smrazaabbas you have to use the quantized version with 4 bit. It should work then