10k you say...
#1
by
Doomed1986
- opened
Yet the config says 4096. Should i just change to 10240 or use scaling?
Linear rope scaling (8x). You shouldn't need to change the config: that's for the original model which was Llama-2 based & had a 4K context. It is a 32k model as the title indicates (hence the 8x scaling), as described on the original model page. The 10k comment is just a specific test. You should use 8x scaling even for < 32K context. There's more information in the main model page.
Also, this is a really old model! You should use a different one, and the 2.4bpw is pretty bad for long context in particular (anything below 4 bit sadly is).