MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models Paper • 2409.17481 • Published 4 days ago • 37
Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling Paper • 2409.14683 • Published 7 days ago • 8
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction Paper • 2409.17422 • Published 4 days ago • 18