Please, add GGUF version!
#2
by
Anderson452
- opened
Please
There is currently work happening on the llama.cpp side to actively support this (for example, this and this).
Specifically for this model, adding LongRoPE support for the 128k context length and the heterogeneous block-sparsity attention makes it a bit tricky, but hopefully this should be there soon :)
bapatra
changed discussion status to
closed