Model Card for Model ID

$\gamma$-MOD is a novel approach to enhance computational efficiency in Multimodal Large Language Models (MLLMs) by incorporating Mixture-of-Depth (MoD) layers. This plug-and-play strategy seamlessly replaces redundant dense layers, significantly reducing computational costs while maintaining performance.

Model Details

Model Description

$\gamma$-MOD introduces a new paradigm that focuses on reducing activated tokens, offering superior efficiency compared to existing methods. The approach is inspired by the concept of activated tokens and aims to transform dense MLLM layers into sparse MoD layers, ultimately making MLLMs more accessible and applicable in resource-constrained environments. Key features include:

ARank Metric: Guides replacing redundant layers with MoD layers.
Shared Vision-Language Router: Facilitates cross-modality token routing.
Masked Routing Learning: Prevents critical tokens from being skipped during model adaptation.

Developed by: Yaxin Luo
License: MIT License
Finetuned from model : Vicuna-v1.5-7B

Model Sources

Repository: https://github.com/Yaxin9Luo/Gamma-MOD
Paper: https://arxiv.org/abs/2410.13859
Demo: https://yaxin9luo.github.io/gamma-mod-webpage/

Citation

BibTeX:

@misc{luo2024gammamodexploringmixtureofdepthadaptation,
      title={$\gamma-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models}, 
      author={Yaxin Luo and Gen Luo and Jiayi Ji and Yiyi Zhou and Xiaoshuai Sun and Zhiqiang Shen and Rongrong Ji},
      year={2024},
      eprint={2410.13859},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.13859}, 
}