Exllama loader works with this model

#2
by djstraylight - opened

I'm using this model on an RTX A6000 and it runs pretty well with the Exllama loader. I'm getting 10 tokens a sec. Though I feel like Exllama can be less coherent at times.

Yeah, I'd expect ExLlama to work with all the quants except the 3-bit ones. Which quant are you using?

I'm using 'gptq-4bit-32g-actorder_True' branch . It fits perfectly on an A6000.

Good to hear

Sign up or log in to comment