DavidAU/M-Metaphors-Of-Madness-19.4B-GGUF

This is not unusual.
As soon as a model is fine tuned (this model contains TWO fine tunes), it can impair / affect the model's max context "comprehension" (input).

Based on experience and reading about a lot of other user experiences the way to combat some of these issues are:

1 - Rope , but with "rope" you must modify your prompt to be more specific.
2 - Flash Attention.

On the output side:

Make sure your prompt has plenty of "meat on the bone" - short prompts can produce long generations, however the model can get
to a point it has no idea what to do as it hits 2k/3k/4k +. Sometimes just a little bit more info can fix this.

Likewise, rep pen / temp make HUGE differences - too low a temp , and the model can run out of steam.
Too high a rep pen... can stifle creativity. (likewise: too low => way too wordy and small words at that).

Top_K: Raise this for more word choices, and this may help with long gen coherence.

Other parameters like new XTC will drastically alter output, especially at length.

DavidAU
/

M-Metaphors-Of-Madness-19.4B-GGUF

Max context