For a context of at least 32K tokens which version on a 2x16GB Gpu Config?

by Kalemnor - opened

Question as in the title. And what version for at least 16K Tokens?

@Kalemnor Hi there, take a look at my response here, there is no one answer as it depends on a few things but hopefully this will give you some good hints. You may be able to even run IQ4_XS if you don't mind taking a heavy hit on performance and you tweak a few options, otherwise you'll have to use smaller quants.

Sign up or log in to comment