stereoplegic
's Collections
Replacing softmax with ReLU in Vision Transformers
Paper
•
2309.08586
•
Published
•
17
Softmax Bias Correction for Quantized Generative Models
Paper
•
2309.01729
•
Published
•
1
The Closeness of In-Context Learning and Weight Shifting for Softmax
Regression
Paper
•
2304.13276
•
Published
•
1
Quantizable Transformers: Removing Outliers by Helping Attention Heads
Do Nothing
Paper
•
2306.12929
•
Published
•
12
Revisiting Softmax Masking for Stability in Continual Learning
Paper
•
2309.14808
•
Published
•
1
A General Theory for Softmax Gating Multinomial Logistic Mixture of
Experts
Paper
•
2310.14188
•
Published
•
1
Superiority of Softmax: Unveiling the Performance Edge Over Linear
Attention
Paper
•
2310.11685
•
Published
•
1
Interpret Vision Transformers as ConvNets with Dynamic Convolutions
Paper
•
2309.10713
•
Published
•
1
Softmax-free Linear Transformers
Paper
•
2207.03341
•
Published
•
1
Agent Attention: On the Integration of Softmax and Linear Attention
Paper
•
2312.08874
•
Published
•
2
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax
Mimicry
Paper
•
2402.04347
•
Published
•
13