This model surprised me.
I usually check how the model performs in general logic (simple puzzles) at the beginning. This model performed poorly in these tests, worse even than the basic llama 3 8b, so I didn't expect it to be any different in roleplay. Generally, I noticed a lot of correlation between the model's skills in logic and sticking to the plot, or understanding the character cards.
But that wasn't the case. Your model picks up information from the script or character cards and actually uses it... And it's consistent in that. It's not prose on the level of big models, but it's consistent and immersive. A very interesting model, just 12B... it just begs for some MoE and then a little training to polish it up.
Thank you for your work, it's a great model for roleplaying and storytelling.
Thanks for testing and the praise! while we currently have no plans for MoE training, maybe we might reconsider in the future!