Thank you!
Thanks for uploading Quants of my model! I've updated my model card to include links.
I even tried it out immediately once I had some imatrix quants! Your model card really sold it to me :) In the end... I still found it too horny, to be honest, and it often refused by giving me "end of story" like wrap-up comments, but as far as 8b-based llama3 models go, this one is really good. Please continue making models :)
For refusals, I found that most of the censorship behavior relates to the predefined roles in both ChatML and Llama-3-Instruct. You can find a bit more information in this PR I made on SillyTavern: https://github.com/SillyTavern/SillyTavern/pull/2148
Essentially, all you do is replace the predefined roles with user and character names and the model opens up a lot more. It also comes with the added benefit of having a clean way of handling names without messing with the prompts. If you have the latest stable release of SillyTavern, you can just select the -Names
variant. I haven't had the model try to disengage when using these variants, though I have been using the Q8_0, so I don't know how the Imatrixs work with it.
I'll definitely keep experimenting with more merges in the future.
I am not using it in chat mode, I am using it to generate stories with a somewhat complex prompt, which might explain the differences in performance.