is it possible to use a continuous batching inference server with this model?

#14
by natserrano - opened

vLLM doesn't work

any other recommendations to achieve 10 calls per sec?

any other AWQ model similar/comparable to this bad boy?

thanks!

Sign up or log in to comment