Chat-UniVi
commited on
Commit
•
ce46844
1
Parent(s):
d643284
Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,7 @@ license: apache-2.0
|
|
4 |
# MoH: Multi-Head Attention as Mixture-of-Head Attention
|
5 |
|
6 |
**Paper or resources for more information:**
|
7 |
-
[[Paper]()] [[Code](https://github.com/SkyworkAI/
|
8 |
|
9 |
## ⚡ Overview
|
10 |
We propose Mixture-of-Head attention (MoH), a new architecture that treats attention heads as experts in the Mixture-of-Experts (MoE) mechanism. MoH has two significant advantages:
|
@@ -70,4 +70,14 @@ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch \
|
|
70 |
--tasks winogrande \
|
71 |
--batch_size 1 \
|
72 |
--output_path Results/winogrande
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
73 |
```
|
|
|
4 |
# MoH: Multi-Head Attention as Mixture-of-Head Attention
|
5 |
|
6 |
**Paper or resources for more information:**
|
7 |
+
[[Paper]()] [[Code](https://github.com/SkyworkAI/MoH)]
|
8 |
|
9 |
## ⚡ Overview
|
10 |
We propose Mixture-of-Head attention (MoH), a new architecture that treats attention heads as experts in the Mixture-of-Experts (MoE) mechanism. MoH has two significant advantages:
|
|
|
70 |
--tasks winogrande \
|
71 |
--batch_size 1 \
|
72 |
--output_path Results/winogrande
|
73 |
+
```
|
74 |
+
|
75 |
+
## ✏️ Citation
|
76 |
+
If you find this paper useful, please consider staring 🌟 this repo and citing 📑 our paper:
|
77 |
+
```
|
78 |
+
@article{jin2024moh,
|
79 |
+
title={MoH: Multi-Head Attention as Mixture-of-Head Attention},
|
80 |
+
author={Peng Jin and Bo Zhu and Li Yuan and Shuicheng Yan},
|
81 |
+
year={2024}
|
82 |
+
}
|
83 |
```
|