Bui Van Hop

hllj

AI & ML interests

Computer Vision, Deep Learning, NLP

Organizations

Vietnamese VLM's profile picture Hugging Face Discord Community's profile picture

hllj's activity

updated a collection 3 months ago
upvoted an article 4 months ago
Reacted to kenshinn's post with ❤️ 4 months ago
view post
Post
2023
Sparse MoE (SMoE) has an unavoidable drawback: the performance of SMoE heavily relies on the choice of hyper-parameters, such as the number of activated experts per token (top-k) and the number of experts.

Also, identifying the optimal hyper-parameter without a sufficient number of ablation studies is challenging. As the size of the models continues to grow, this limitation could result in a significant waste of computational resources, and in turn, could hinder the efficiency of training MoE-based models in practice.

(READ MORE ↓↓↓) Now, our DynMoE addresses these challenges! 🙌 DynMoE incorporates:
(1) a novel gating method that enables each token to automatically determine the number of experts to activate.

(2) An adaptive process automatically adjusts the number of experts during training. Extensive numerical results across Vision, Language, and Vision-Language tasks demonstrate the effectiveness of our approach to achieve competitive performance compared to GMoE for vision and language tasks, and MoE-LLaVA for vision-language tasks, while maintaining efficiency by activating fewer parameters.

Our code is available at https://github.com/LINs-lab/DynMoE, also see the checkpoints at LINs-lab/dynmoe-family-665ed5a331a7e84463cab01a