ChatML template is not working
#2
by
NK-Spacewalk
- opened
Yeah so I get the exact same thing with flash attention off, you'll want to turn it on. Something about Qwen2 HATES having flash attention off in llama.cpp..
@bartowski
Thank you for comment.
Im not using flash attention. im using latest lm studio.
this gguf file is not working with LM Studio, jan.
but ollama model is working , which i created manually by using same gguf file.
@NK-Spacewalk you can enable flash attention in lm studio it's at the bottom of the right sidebar like way down