Specs for inference

by mzhadigerov - opened Oct 3, 2023

Discussion

mzhadigerov

Oct 3, 2023

What is the size of VRAM required to run it for inference?

YaTharThShaRma999

Oct 5, 2023

Roughly 14gb vram

aborgohain

Oct 6, 2023

Use quantized GGUF/GGML/AWQ models if you want to run on machines with lower computational resources.

YaTharThShaRma999

Oct 6, 2023

Yeah then it will be roughly 6gb vram.

smrazaabbas

Nov 29, 2023

Can you suggest the smallest SageMaker instance I can use to deploy? For some reason loading the model via sample notebook given fails on the ml.g5.12xlarge instance even though the VRAM should be enough based on your suggestion?

YaTharThShaRma999

Dec 6, 2023

@smrazaabbas you have to use the quantized version with 4 bit. It should work then

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment