ChatML template is not working

#2
by NK-Spacewalk - opened

image.png

Arcee AI org

Yeah so I get the exact same thing with flash attention off, you'll want to turn it on. Something about Qwen2 HATES having flash attention off in llama.cpp..

@bartowski Thank you for comment.
Im not using flash attention. im using latest lm studio.
this gguf file is not working with LM Studio, jan.

but ollama model is working , which i created manually by using same gguf file.

@NK-Spacewalk you can enable flash attention in lm studio it's at the bottom of the right sidebar like way down

Sign up or log in to comment