About Merging
#5
by
odysseusq
- opened
Hi there, I'm new to the MoE. Is this MoE model generated by directly merging the FFN layers from Meta-Llama-3-8B-Instruct, Llama3-8B-OpenHermes-DPO, ...? If so, how is the gating layers designed to let different experts handle different positive prompts?
(Sry if this sounds silly cause I'm quite new to this area)