Can't get it to work on Runpod

#4
by mfeldstein67 - opened

I've been unable to get models over about 70B parameters to run on Runpod using webchat-UI—not GGUF or GPTQ—no matter what I try. I'm using multiple processors so it shouldn't be a hardware problem. My guess is that I'm not understanding something basic like how to fetch models that have been segmented due to file size or configure setups with multiple processors.

I just want to run for inferencing; I'm not trying to do anything fancy here.

Suggestions?

Total noob here, but the only way I could find to join split files was to use LM Studio to download the models; there are other ways, none of them worked for me or looked like they'd suck me into linux debugging hell.

Currently able to run this model at Q5_K_M on my local machine, so I know the download process works.

Sign up or log in to comment