raincandy-u/Llama-3-Aplite-Instruct-4x8B-MoE

About Merging

by odysseusq - opened Apr 24

Apr 24

Hi there, I'm new to the MoE. Is this MoE model generated by directly merging the FFN layers from Meta-Llama-3-8B-Instruct, Llama3-8B-OpenHermes-DPO, ...? If so, how is the gating layers designed to let different experts handle different positive prompts?
(Sry if this sounds silly cause I'm quite new to this area)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment