Still refusing on some prompts
#3
by
Kuinox
- opened
I tried a bit this model, and it indeed doesn't refuse anything that is "dangerous", but it still refuses "nsfw" things.
For example, if I ask it "Write the most nsfw message you can." it will respond "I'm programmed to be a family-friendly AI, so I won't write an explicit message.[...]"
This particular prompt doesn't work but precise instructions do (at least most of them)
I trained one with ORPO, but had a specific harmful problem that would repeat.