Exllama loader works with this model

by djstraylight - opened Jul 15, 2023

Jul 15, 2023

I'm using this model on an RTX A6000 and it runs pretty well with the Exllama loader. I'm getting 10 tokens a sec. Though I feel like Exllama can be less coherent at times.

TheBloke

Owner Jul 15, 2023

Yeah, I'd expect ExLlama to work with all the quants except the 3-bit ones. Which quant are you using?

djstraylight

Jul 15, 2023

I'm using 'gptq-4bit-32g-actorder_True' branch . It fits perfectly on an A6000.

TheBloke

Owner Jul 15, 2023

Good to hear

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment