Very slow quant
#1
by
manwith32
- opened
I'm running windows with an Nvidia 3090 TI. Nous-Capybara-limarpv3-34B-5bpw-hb6-exl2, is generating tokens at 2.78 tokens/s.
Switching to a 4bpw version made it go much faster. (I forget to write down the exact amount, it was night and day, maybe 20 tokens/ second?)
I though I'd give a heads up if anyone else is struggling with inference speed.