Is this a merge?
#3
by
mrfakename
- opened
Hi,
Is this a merge or pretrained model?
At its core, GemMoE comprises 8 separately fine-tuned Gemma models, with 2 experts per token
thanks! i assume this means that each expert was finetuned, then merged?
combine them using a hidden gate with a heavily modified version of mergekit, a tool developed by the brilliant Charles Goddard.
Ah, makes sense. Thanks for the clarification!!
mrfakename
changed discussion status to
closed