File size: 1,780 Bytes
70def18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
license: mit
language:
- en
---
# Model Card for Model ID

$\gamma$-MOD is a novel approach to enhance computational efficiency in Multimodal Large Language Models (MLLMs) by incorporating Mixture-of-Depth (MoD) layers. This plug-and-play strategy seamlessly replaces redundant dense layers, significantly reducing computational costs while maintaining performance.
## Model Details
### Model Description

$\gamma$-MOD introduces a new paradigm that focuses on reducing activated tokens, offering superior efficiency compared to existing methods. The approach is inspired by the concept of activated tokens and aims to transform dense MLLM layers into sparse MoD layers, ultimately making MLLMs more accessible and applicable in resource-constrained environments.
Key features include:
1. ARank Metric: Guides replacing redundant layers with MoD layers.
2. Shared Vision-Language Router: Facilitates cross-modality token routing.
3. Masked Routing Learning: Prevents critical tokens from being skipped during model adaptation.

- **Developed by:** Yaxin Luo
- **License:** MIT License
- **Finetuned from model :** Vicuna-v1.5-7B

### Model Sources 

- **Repository:** https://github.com/Yaxin9Luo/Gamma-MOD
- **Paper:** https://arxiv.org/abs/2410.13859
- **Demo:** https://yaxin9luo.github.io/gamma-mod-webpage/

## Citation

**BibTeX:**
```
@misc{luo2024gammamodexploringmixtureofdepthadaptation,
      title={$\gamma-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models}, 
      author={Yaxin Luo and Gen Luo and Jiayi Ji and Yiyi Zhou and Xiaoshuai Sun and Zhiqiang Shen and Rongrong Ji},
      year={2024},
      eprint={2410.13859},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.13859}, 
}
```