Nicely done

#2
by Falselogin - opened

Subj. Maybe it's too horny for the start of RP and it slightly ruins the immersion realism, but overall I really like it. In recent days I've only used this model, Q6. Maybe there are some tips on temperature, top_p, etc.? Thank you

Hmm, too horny too early for RP you say? Well from what I know, changing sampling settings like top_p, top_k, etc will not change much about how the model responds in the long run. It may change the way sentences are populated but the context will be roughly the same imo. Also using a different quant won't make a vast difference either especially if you are using as quant that does a decent job at maintaining the original accuracy. Maybe if you were using a 2 or 3 bit quant I may suggest bumping up to 4 or 6 bits but what I can tell you is that I perform all of my abliterations using 4 bit as the reference. You may have better luck diving deeper into prompt engineering and directly telling the model to start off less sexy and only suggestive and work its way up to explicit idk, I have not used models for this specific task as of yet.

My methods generally make an attempt at manipulating a base model as little as possible while preventing the model from saying no to your query. The testing revolves around feeding it queries that are increasingly more terrible or sinister until I can say within reason that it no longer refuses requests. This testing is within the domain of "chat" or "instruct" usage and I can only assume that it would extrapolate into RP.

Although it may be a smaller model, maybe try my most recent LongWriter abliteration and see if you get different results, apparently the base model was tuned for long outputs and might still be "smart" enough for your usage.

In any case, I would still suggest really getting your initial prompt perfected assuming it is a role-playing story you want. If you are using it for chat based role-playing where you say one sentence and then the model responds with a sentence or two but it simply "gives in" too quickly, then maybe you could use a system prompt that instructs the model in the background to play coy a bit more through-out the chat, idk. Keep in mind that a model that has been modified to prevent refusal it will be significantly more likely to give in to your requests immediately (unless you or system tells it not to give in?).

Anyway, there has been a lot of buzz around nvidia nemotron recently so I might take a crack at that very soon. It is 70b params so obviously a bit larger than gemma 2 27b. Assuming I can abliterate this model without too much trouble it may be smart enough to understand what you are going for.

Best of luck!

Sign up or log in to comment