Blockwise Parallel Transformer for Long Context Large Models Paper • 2305.19370 • Published May 30, 2023 • 3
Blockwise Self-Attention for Long Document Understanding Paper • 1911.02972 • Published Nov 7, 2019 • 1
Blockwise Compression of Transformer-based Models without Retraining Paper • 2304.01483 • Published Apr 4, 2023 • 1