Chat-UniVi commited on
Commit
ce46844
1 Parent(s): d643284

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -1
README.md CHANGED
@@ -4,7 +4,7 @@ license: apache-2.0
4
  # MoH: Multi-Head Attention as Mixture-of-Head Attention
5
 
6
  **Paper or resources for more information:**
7
- [[Paper]()] [[Code](https://github.com/SkyworkAI/MoE-plus-plus)]
8
 
9
  ## ⚡ Overview
10
  We propose Mixture-of-Head attention (MoH), a new architecture that treats attention heads as experts in the Mixture-of-Experts (MoE) mechanism. MoH has two significant advantages:
@@ -70,4 +70,14 @@ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch \
70
  --tasks winogrande \
71
  --batch_size 1 \
72
  --output_path Results/winogrande
 
 
 
 
 
 
 
 
 
 
73
  ```
 
4
  # MoH: Multi-Head Attention as Mixture-of-Head Attention
5
 
6
  **Paper or resources for more information:**
7
+ [[Paper]()] [[Code](https://github.com/SkyworkAI/MoH)]
8
 
9
  ## ⚡ Overview
10
  We propose Mixture-of-Head attention (MoH), a new architecture that treats attention heads as experts in the Mixture-of-Experts (MoE) mechanism. MoH has two significant advantages:
 
70
  --tasks winogrande \
71
  --batch_size 1 \
72
  --output_path Results/winogrande
73
+ ```
74
+
75
+ ## ✏️ Citation
76
+ If you find this paper useful, please consider staring 🌟 this repo and citing 📑 our paper:
77
+ ```
78
+ @article{jin2024moh,
79
+ title={MoH: Multi-Head Attention as Mixture-of-Head Attention},
80
+ author={Peng Jin and Bo Zhu and Li Yuan and Shuicheng Yan},
81
+ year={2024}
82
+ }
83
  ```