frankenstein model inception

#1
by gileneo - opened

Am I understanding this correctly? is this 2 identical copies of sonya-7b merged into one 11B model using mergekit and then multiplied by 8 creating MoE?

That's frankenstein model inception 😬

That's exactly it according by dillfrescott.

I didn't test it, but if you did, how does these type of frankenmerger perform compared for example with the 11B version? They don't have additional data, so it should be like the base sonya but a little more intelligent.

I tried but it was bad for my use case... so it's maybe good but not universal

Sign up or log in to comment