Max context
This is not unusual.
As soon as a model is fine tuned (this model contains TWO fine tunes), it can impair / affect the model's max context "comprehension" (input).
Based on experience and reading about a lot of other user experiences the way to combat some of these issues are:
1 - Rope , but with "rope" you must modify your prompt to be more specific.
2 - Flash Attention.
On the output side:
Make sure your prompt has plenty of "meat on the bone" - short prompts can produce long generations, however the model can get
to a point it has no idea what to do as it hits 2k/3k/4k +. Sometimes just a little bit more info can fix this.
Likewise, rep pen / temp make HUGE differences - too low a temp , and the model can run out of steam.
Too high a rep pen... can stifle creativity. (likewise: too low => way too wordy and small words at that).
Top_K: Raise this for more word choices, and this may help with long gen coherence.
Other parameters like new XTC will drastically alter output, especially at length.