stereoplegic
's Collections
Scaling MLPs: A Tale of Inductive Bias
Paper
•
2306.13575
•
Published
•
14
Trap of Feature Diversity in the Learning of MLPs
Paper
•
2112.00980
•
Published
•
1
Understanding the Spectral Bias of Coordinate Based MLPs Via Training
Dynamics
Paper
•
2301.05816
•
Published
•
1
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial
Locality?
Paper
•
2108.04384
•
Published
•
1
MetaFormer Is Actually What You Need for Vision
Paper
•
2111.11418
•
Published
•
1
One Wide Feedforward is All You Need
Paper
•
2309.01826
•
Published
•
31
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Paper
•
2310.10837
•
Published
•
10
Attention is Not All You Need: Pure Attention Loses Rank Doubly
Exponentially with Depth
Paper
•
2103.03404
•
Published
•
1
A technical note on bilinear layers for interpretability
Paper
•
2305.03452
•
Published
•
1
Cross-token Modeling with Conditional Computation
Paper
•
2109.02008
•
Published
•
1
Efficient Language Modeling with Sparse all-MLP
Paper
•
2203.06850
•
Published
•
1
MLP-Mixer as a Wide and Sparse MLP
Paper
•
2306.01470
•
Published
•
1
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as
an Alternative to Attention Layers in Transformers
Paper
•
2311.10642
•
Published
•
23
Exponentially Faster Language Modelling
Paper
•
2311.10770
•
Published
•
118
Linear Self-Attention Approximation via Trainable Feedforward Kernel
Paper
•
2211.04076
•
Published
•
1
On the Universality of Linear Recurrences Followed by Nonlinear
Projections
Paper
•
2307.11888
•
Published
•
1
HyperMixer: An MLP-based Low Cost Alternative to Transformers
Paper
•
2203.03691
•
Published
•
1
NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning
Paper
•
2307.08941
•
Published
•
1
Pixelated Butterfly: Simple and Efficient Sparse training for Neural
Network Models
Paper
•
2112.00029
•
Published
•
1
Fast Feedforward Networks
Paper
•
2308.14711
•
Published
•
2
KAN: Kolmogorov-Arnold Networks
Paper
•
2404.19756
•
Published
•
108
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Paper
•
2404.07413
•
Published
•
36
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context
Language Modeling
Paper
•
2406.07522
•
Published
•
36
Enhancing Fast Feed Forward Networks with Load Balancing and a Master
Leaf Node
Paper
•
2405.16836
•
Published