Chat-UniVi
commited on
Commit
•
be55fea
1
Parent(s):
ce46844
Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,7 @@ license: apache-2.0
|
|
4 |
# MoH: Multi-Head Attention as Mixture-of-Head Attention
|
5 |
|
6 |
**Paper or resources for more information:**
|
7 |
-
[[Paper]()] [[Code](https://github.com/SkyworkAI/MoH)]
|
8 |
|
9 |
## ⚡ Overview
|
10 |
We propose Mixture-of-Head attention (MoH), a new architecture that treats attention heads as experts in the Mixture-of-Experts (MoE) mechanism. MoH has two significant advantages:
|
|
|
4 |
# MoH: Multi-Head Attention as Mixture-of-Head Attention
|
5 |
|
6 |
**Paper or resources for more information:**
|
7 |
+
[[Paper](https://huggingface.co/papers/2410.11842)] [[Code](https://github.com/SkyworkAI/MoH)]
|
8 |
|
9 |
## ⚡ Overview
|
10 |
We propose Mixture-of-Head attention (MoH), a new architecture that treats attention heads as experts in the Mixture-of-Experts (MoE) mechanism. MoH has two significant advantages:
|