exllama 0.1.6

by flflow - opened

Not compatible with latest exllama v2 0.1.6 - generating garbage.

The Gemma2 quants require changes that are only in the dev branch so far. You can use them now by checking out the dev branch, but there are some limitations for now since flash-attn currently doesn't support softcapping. There will be a release version (0.1.7) as soon as support in flash-attn is finished, which shouldn't take long.

Okay. Thank you!

Sign up or log in to comment