Questiona, again.
Hi mate, hope you don't mind me asking yous few questions again. How have you found this model, compared to say Command -r or the new Mistal small? What sampler settings are you using? I've been really impressed with the new Mistral if you haven't tried it.
I also have tried one of Bruce The Mouses fiction quants of one of his merges. 200k context. Worth a try if you haven't.
I've been really impressed with the new Mistral if you haven't tried it.
It's fine up to 24K, but Mistral 22B totally breaks down past like 32K tokens in my tests. It was totally incoherent at 80K.
I just tested this quant moments ago, on 110K of story tokens (which is like 135K with Mistral's tokenizer, maybe 120K in Cohere's), and its... great. It's very smart.
First, let me note that this is a base model I quantized so the formatting is very different than an instruct model. It works best with long continuations of "raw" text like a novel.
...With that being said, I believe this could replace the new Command-R (or my Star-Lite merge). It's really smart, though I need to test it more to be sure.
I also have tried one of Bruce The Mouses fiction quants of one of his merges. 200k context. Worth a try if you haven't.
Some of the "story" quants like this are actually very broken, and bruce acknowledges this in some of the model cards. I've also found that exllama is very sensitive to anything outside its default quantization context length/data and that the defaults tend to be best.
Anyway this seems way smarter than any of those old Yi models. Honestly the newer Yi 200K base model was better at mega context than them anyway.
What sampler settings are you using?
Oh, and I'm not sure. It seems to like like 0.6 DRY and 1.01 rep penalty, 0.05 temp to start, temp last, 0.09MinP (maybe even higher if chinese characters start coming it) but I have literally just started using using it.
Obviously take everything I say with a grain of salt lol, I could be holding stuff wrong.
thanks for replying and helping me out, I need something, ideally, that is good at picking out things in the world info and context at a decent length, I'm using low BPW Miqu models at the moment, they are great but at low sizes they can be a bit hit and miss. Some of his quants are broken but he does have some that work and they are decent enough, but I am hoping I can tune a model on a few stories to get a similar effect, as that is all he has done really. So far I have had zero luck with that, since textgenuis Training_pro broke on windows.
Training 34B models on a 24GB to pick up the style of a story is hard, especially in windows. I've tried with InternLM (which is much easier/more doable) with unsloth, but my results are still mixed. You pretty much have to use linux + cloud hardware.
But base models like this are very good at "picking up" writing styles and referencing content if you're just looking to continue a story.
I have considered training the new Mistral model but it seems like it has limited context from what you have said. I don't write stories that way, unfortunately writing isnt my strong point. So what I do is instruct it to write a scene based in what I want to happen. I have the character personality and speech patterns in the world info with keys to trigger it, such as their names etc and the context/memory is a summary of the story so far. I'm basically making a visual novel in renpy. So, rendering all the stuff, writing the story etc is a massive undertaking, so I'm using llms to speed it up and hide my deficiencies, lol.