Balancing Pipeline Parallelism with Vocabulary Parallelism Paper • 2411.05288 • Published 15 days ago • 19
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks Paper • 2410.20650 • Published 26 days ago • 16
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters Paper • 2410.23168 • Published 24 days ago • 22
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity Paper • 2411.02335 • Published 19 days ago • 11
Adaptive Caching for Faster Video Generation with Diffusion Transformers Paper • 2411.02397 • Published 19 days ago • 20
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models Paper • 2411.03884 • Published 17 days ago • 26
Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper • 2410.10306 • Published Oct 14 • 52
What Matters in Transformers? Not All Attention is Needed Paper • 2406.15786 • Published Jun 22 • 29
AutoTrain: No-code training for state-of-the-art models Paper • 2410.15735 • Published Oct 21 • 57
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published 16 days ago • 48
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices Paper • 2410.00531 • Published Oct 1 • 29