Very slow quant

#1
by manwith32 - opened

I'm running windows with an Nvidia 3090 TI. Nous-Capybara-limarpv3-34B-5bpw-hb6-exl2, is generating tokens at 2.78 tokens/s.

Switching to a 4bpw version made it go much faster. (I forget to write down the exact amount, it was night and day, maybe 20 tokens/ second?)

I though I'd give a heads up if anyone else is struggling with inference speed.

Sign up or log in to comment