Love this model but I wish the context was higher
This model is great once you find the right temperature to work with. I get a much more human prose than other models below 70B. The only downside is the context limit. 8k for a writing model is way too small these days. I'd love to see a merge of this with something that has higher context @32k. But great job anyway.
Yes the context size is a bit limiting. There may be options for extending context like rope scaling & self-extend, but tbh I haven't tried either.
Glad you are enjoying the model though! I suggest using min_p in conjunction with temp if you aren't already.
Yeah, I've been playing with the rope scaling and I think I've hit the mark. I'm using 32k context and a rope frequency base of 59300.5 and this is the result I'm getting at the moment:
Tokens = 31,024
Characters = 138298
Which is almost spot on before it starts hallucinating.
I'm currently using 0.2 for the temp. Do you have a suggested range for the min_p?
Very cool!
I suggest using min_p 0.1 or thereabouts. It will let you push the temp up much higher (try 1.0) without losing coherence.