fuck quadratic attention - a mishig Collection

mishig 's Collections

A little guide to building Large Language Models in 2024

fuck quadratic attention

fuck quadratic attention

updated Apr 24

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Paper • 2404.05892 • Published Apr 8 • 31
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 138
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11 • 41
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10 • 103
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12 • 63
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29 • 52
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

Paper • 2006.16236 • Published Jun 29, 2020 • 3
Scaling Transformer to 1M tokens and beyond with RMT

Paper • 2304.11062 • Published Apr 19, 2023 • 2
CoLT5: Faster Long-Range Transformers with Conditional Computation

Paper • 2303.09752 • Published Mar 17, 2023 • 2
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry

Paper • 2402.04347 • Published Feb 6 • 13
The Illusion of State in State-Space Models

Paper • 2404.08819 • Published Apr 12 • 1