how to accelerate the inference speed
#22
by
tobywang
- opened
Is there any frameworks which can accelerate the inference speed of this model
maybe you can try vllm
https://github.com/vllm-project/vllm
Hello, does vllm work for you? I tried vllm but found that the generation quality is degraded and the model simply outputs repetitive words.