Text Generation
Transformers
Safetensors
llama
conversational
text-generation-inference
Inference Endpoints

Will this run on a 4090 and 64GB of DDR5?

#6
by AIGUYCONTENT - opened

I know there is an 8B quant available. However, I need an intelligent AI that can help me reason through things throughout a multi-step conversation on a single topic.

The full model will not, but a quantized model will and at Q4 (which is usually the preferred quant) it will certainly better than the 8B model. You will need to trade-off between speed and performance. You can find performance comparisons here.

hi plese use try to use

https://huggingface.co/mradermacher/Smaug-Llama-3-70B-Instruct-i1-GGUF

iq4 xs

if you have 64 gp you can use i1-Q5_K_M

Sign up or log in to comment