Very slow quant

by manwith32 - opened Mar 28

Mar 28

•

I'm running windows with an Nvidia 3090 TI. Nous-Capybara-limarpv3-34B-5bpw-hb6-exl2, is generating tokens at 2.78 tokens/s.

Switching to a 4bpw version made it go much faster. (I forget to write down the exact amount, it was night and day, maybe 20 tokens/ second?)

I though I'd give a heads up if anyone else is struggling with inference speed.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment