WHY-Using vllm to run inference on openbmb/MiniCPM-V-2_6, after processing one image, the GPU is no longer utilized, but the GPU memory is still being occupied.

#15
by fern4444 - opened

Using vllm to run inference on openbmb/MiniCPM-V-2_6, after processing one image, the GPU is no longer utilized, but the GPU memory is still being occupied.

fern4444 changed discussion title from Using vllm to run inference on openbmb/MiniCPM-V-2_6, after processing one image, the GPU is no longer utilized, but the GPU memory is still being occupied. to WHY-Using vllm to run inference on openbmb/MiniCPM-V-2_6, after processing one image, the GPU is no longer utilized, but the GPU memory is still being occupied.
OpenBMB org

Is your vllm closed? If vllm is not closed, the video memory will indeed be occupied all the time. vllm allocates memory in advance.

hai
{cookie}

hello ed

linglingdan changed discussion status to closed

Sign up or log in to comment