Stheno 3.3?

by djuna - opened Aug 1

Discussion

djuna

Aug 1

Is there any reason not to use this model? Is has 32K context window. It should be better, right?

HiroseKoichi

Owner Aug 1

The 32K context comes from artificial context extension rather than continued pre-training at 32K. Methods like this usually have a significant performance trade-off for the additional context, and Stheno 3.3 was degraded far too much for me to even consider using.

I went back and reevaluated the models that I've been using, new ones that I've found, and took a second look at other merge methods. Dare Ties seems like a good option, and in particular, mixing in Llama-3-Spellbound-Instruct-8B-0.3 has been giving very, very good results in my testing. I've been pretty busy recently with personal stuff, but I'll probably release a new merge in about a week.

My current schedule is this: release Stroganoff-3.0, potentially my last roleplay-focused Llama-3 merge if I'm satisfied with it; work on a story-writing-focused Llama-3 merge; and then finally release a new Llama-3 4x8B MoE.

Have you used Stroganoff 1 or 2 yet? If so, what did you think of them?

djuna

Aug 2

Thanks for your work, I like the writing, the character don't talk too long. But sometimes it remember, it doesn't. Kinda random, I use 0.3 as temp and 1 as top p

pyjayTl0w

Aug 2

•

edited Aug 2

Since the creator asked for thoughts, I thought I might butt in to also share mine. As recommended, I tested the model on tail-free sampling at 0.85 with the rest at their defaults. The writing is great, best noticeable in the first reply of the model. Sadly, the repetition kills it over time. It takes a lot of work to regenerate/edit the responses to remove the repetition. It tends to repeat a certain text structure, while just employing the character's mannerisms. This is best noticeable in group chats. Characters start saying the same, just in different wording according to their personality. The fact that it does this, adhering well to the character's personality, is great, but the repetition really wrecks a lot of it. Models like stheno v3.2 (I haven't tested v3.3) or your Lunar-Stheno merge often give completely new responses if the message is regenerated, usually fully lacking repetition. Sadly your model often regenerates the same kind of response, not often taking a new course of action (except in the first message), which I dislike, since it's the most interesting seeing different paths within the responses. Your v2 suffers greatly from this repetition issue, so does v1 to a degree. I really hope you can fix this in your v3, and if possible to a degree that it can easily respond in different ways if a response is regenerated. Still, this series of merges feels like they're on the brink of a 'revolution' akin to something like MythoMax from the Llama-2 days.
Edit: For clarity, I tested on this quantization of the model: https://huggingface.co/mradermacher/Llama-3-8B-Stroganoff-2.0-i1-GGUF -- thus I am unsure if these issues are present on the unquantized version.

HiroseKoichi

Owner Aug 2

Thank you for the input! I didn't see much repetition in v1, but v2 definitely has a big issue with it. I tracked the source down to LLAMA-3_8B_Unaligned_Alpha, so I won't be including it in the future.

What I've generally seen is that including story writing models cuts back on repetition. Llama-3-Spellbound-Instruct-8B-0.3 immediately gave completely uncensored stories in my testing, so I'll be using it as the base model and possibly including other story writing models in the merge, and hopefully there should be almost no repetition while still giving diverse responses.

pyjayTl0w

Aug 2

•

edited Aug 2

Good luck! Really looking forward to see how far you can push this, especially since it's on the lower end specs-wise!

djuna

Aug 3

•

edited Aug 3

... possibly including other story writing models in the merge...

Will you consider llama-3-Nephilim-v3-8B
Or it's SPPO+SimPO merge?

HiroseKoichi

Owner Aug 4

I just tried Nephilim-v3 and didn't really like it; the writing is super dry and predictable, it forces parts of the character card in when it doesn't make sense, and its main selling point, coherency, was worse than the set of models I'm merging currently.

djuna

Aug 4

I see.. I understand your point.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment