The Impact of Positional Encoding on Length Generalization in Transformers Paper • 2305.19466 • Published May 31, 2023 • 2
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings Paper • 2305.13571 • Published May 23, 2023 • 2
Position Prediction as an Effective Pretraining Strategy Paper • 2207.07611 • Published Jul 15, 2022 • 1
Transformer Language Models without Positional Encodings Still Learn Positional Information Paper • 2203.16634 • Published Mar 30, 2022 • 5
2-D SSM: A General Spatial Layer for Visual Transformers Paper • 2306.06635 • Published Jun 11, 2023 • 1