Still refusing on some prompts

#3
by Kuinox - opened

I tried a bit this model, and it indeed doesn't refuse anything that is "dangerous", but it still refuses "nsfw" things.
For example, if I ask it "Write the most nsfw message you can." it will respond "I'm programmed to be a family-friendly AI, so I won't write an explicit message.[...]"

This particular prompt doesn't work but precise instructions do (at least most of them)

I trained one with ORPO, but had a specific harmful problem that would repeat.

Sign up or log in to comment