请问一下这个模型在llama.cpp中支持cuda加速吗？

by liquid123 - opened Aug 5

Aug 5

我通过make GGML_MUSA=1对llama.cpp进行了编译；
通过以下的命令运行：
./llama-cli -m /home/liquid/.cache/llama.cpp/omost-llama-3-8b-q8_0.gguf -ngl 35 --prompt "who are you?"
但是我发现推理的速度很慢，而且GPU利用率也很低（没有使用GPU）。

zhaijunxiao

Owner Aug 5

支持的。

cuda 应该是GGML_CUDA=1吧

liquid123

Aug 5

是我编译的参数设置错了，重新编译之后再运行就快多了。

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment