how to accelerate the inference speed

#22
by tobywang - opened

Is there any frameworks which can accelerate the inference speed of this model

maybe you can try vllm
https://github.com/vllm-project/vllm

Hello, does vllm work for you? I tried vllm but found that the generation quality is degraded and the model simply outputs repetitive words.

Sign up or log in to comment