run this model on a 2xRTX4090 machine with vLLM
#2
by
choucavalier
- opened
i'm trying to run this model on a 2xRTX4090 machine using vLLM for serving
it seems that my system is not able to run it (each GPU has 24GB of VRAM)
is this expected?
thanks
i'm trying to run this model on a 2xRTX4090 machine using vLLM for serving
it seems that my system is not able to run it (each GPU has 24GB of VRAM)
is this expected?
thanks
I have two 3090's, similar to your case, I use DPO version, but it may be similar. I enter following parameters for vllm
--gpu-memory-utilization 0.8
--quantization awq
--tensor-parallel-size 2
thanks man, this worked! i was using the same args but with ---gpu-memory-utilization 0.98
(I thought I was giving more memory to the model)
choucavalier
changed discussion status to
closed