RTX 3070, only getting about 0,38 tokens/minute

#32

by jojokingxp45 - opened Apr 14, 2023

Apr 14, 2023

I've played with the parameters a bit, but even with using:

call python server.py --auto-devices --chat --wbits 4 --groupsize 128 --gpu-memory 7 --pre_layer 19

It still is really slow. Now I know my Card only has 8 Gigs of VRAM, and I've fixed the running out of VRAM problem, but it still seems a bit slow, no matter what I do.

I don't know if this is relevant, but my general specs are:
Ryzen 9 3900X
16GB DDR4 RAM
RTX 3070 8GB

alkhanzi

Apr 15, 2023

Whoa I get around 8 tokens/s with a 3060 12GB.
So you have set the --pre_layer to 19 which basically puts parts of your model in GPU VRAM and the rest in CPU RAM. The communication between VRAM and CPU-RAM would be extremely slower. Not sure if this is the only reason, check if you installed with the latest / fastest CUDA and pytorch versions.

djstraylight

Apr 15, 2023

I also have a video card that only has 8GB VRam and the model fits in the card's memory but there isn't enough room for inference. Once you put some layers of the model on the CPU it became super slow. I'm very disappointed that they only put 8GB on a 3070.

This model runs pretty well on a 3080 and fits into the 10GB of VRAM. If you have another nvidia card, you might be able to use the vram on both cards.

Dezmo

Apr 18, 2023

I've played with the parameters a bit, but even with using:

call python server.py --auto-devices --chat --wbits 4 --groupsize 128 --gpu-memory 7 --pre_layer 19

It still is really slow. Now I know my Card only has 8 Gigs of VRAM, and I've fixed the running out of VRAM problem, but it still seems a bit slow, no matter what I do.

I don't know if this is relevant, but my general specs are:
Ryzen 9 3900X
16GB DDR4 RAM
RTX 3070 8GB

Yes, I have the same problem. The GPU is used at only 2%

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment