Exllama loader works with this model
#2
by
djstraylight
- opened
I'm using this model on an RTX A6000 and it runs pretty well with the Exllama loader. I'm getting 10 tokens a sec. Though I feel like Exllama can be less coherent at times.
Yeah, I'd expect ExLlama to work with all the quants except the 3-bit ones. Which quant are you using?
I'm using 'gptq-4bit-32g-actorder_True' branch . It fits perfectly on an A6000.
Good to hear