Love this model but I wish the context was higher

by HannaLovvold - opened 2 days ago

2 days ago

•

This model is great once you find the right temperature to work with. I get a much more human prose than other models below 70B. The only downside is the context limit. 8k for a writing model is way too small these days. I'd love to see a merge of this with something that has higher context @32k. But great job anyway.

sam-paech

Owner about 20 hours ago

Yes the context size is a bit limiting. There may be options for extending context like rope scaling & self-extend, but tbh I haven't tried either.

Glad you are enjoying the model though! I suggest using min_p in conjunction with temp if you aren't already.

HannaLovvold

about 20 hours ago

Yeah, I've been playing with the rope scaling and I think I've hit the mark. I'm using 32k context and a rope frequency base of 59300.5 and this is the result I'm getting at the moment:
Tokens = 31,024
Characters = 138298
Which is almost spot on before it starts hallucinating.
I'm currently using 0.2 for the temp. Do you have a suggested range for the min_p?

sam-paech

Owner about 18 hours ago

Very cool!

I suggest using min_p 0.1 or thereabouts. It will let you push the temp up much higher (try 1.0) without losing coherence.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment