Chat-UniVi
/

MoH-LLaMA3-8B

Model card Files Files and versions Community

Chat-UniVi commited on 23 days ago

Commit

ce46844

•

1 Parent(s): d643284

Update README.md

Files changed (1) hide show

README.md +11 -1

README.md CHANGED Viewed

@@ -4,7 +4,7 @@ license: apache-2.0
 # MoH: Multi-Head Attention as Mixture-of-Head Attention
 **Paper or resources for more information:**
-[[Paper]()] [[Code](https://github.com/SkyworkAI/MoE-plus-plus)]
 ## ⚡ Overview
 We propose Mixture-of-Head attention (MoH), a new architecture that treats attention heads as experts in the Mixture-of-Experts (MoE) mechanism. MoH has two significant advantages:
@@ -70,4 +70,14 @@ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch \
 --tasks winogrande \
 --batch_size 1 \
 --output_path Results/winogrande
 ```

 # MoH: Multi-Head Attention as Mixture-of-Head Attention
 **Paper or resources for more information:**
+[[Paper]()] [[Code](https://github.com/SkyworkAI/MoH)]
 ## ⚡ Overview
 We propose Mixture-of-Head attention (MoH), a new architecture that treats attention heads as experts in the Mixture-of-Experts (MoE) mechanism. MoH has two significant advantages:
 --tasks winogrande \
 --batch_size 1 \
 --output_path Results/winogrande
+```
+## ✏️ Citation
+If you find this paper useful, please consider staring 🌟 this repo and citing 📑 our paper:
+```
+@article{jin2024moh,
+  title={MoH: Multi-Head Attention as Mixture-of-Head Attention},
+  author={Peng Jin and Bo Zhu and Li Yuan and Shuicheng Yan},
+  year={2024}
+}
 ```