ChatML template is not working

by NK-Spacewalk - opened Jun 30

Discussion

NK-Spacewalk

Jun 30

bartowski

Arcee AI org Jun 30

Yeah so I get the exact same thing with flash attention off, you'll want to turn it on. Something about Qwen2 HATES having flash attention off in llama.cpp..

NK-Spacewalk

Jun 30

@bartowski Thank you for comment.
Im not using flash attention. im using latest lm studio.
this gguf file is not working with LM Studio, jan.

but ollama model is working , which i created manually by using same gguf file.

CoolOppo

Jul 7

@NK-Spacewalk you can enable flash attention in lm studio it's at the bottom of the right sidebar like way down

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment