Activation Functions in Deep Learning: A Comprehensive Survey and Benchmark Paper • 2109.14545 • Published Sep 29, 2021 • 1
Learning Activation Functions for Sparse Neural Networks Paper • 2305.10964 • Published May 18, 2023 • 1
Exploiting Transformer Activation Sparsity with Dynamic Inference Paper • 2310.04361 • Published Oct 6, 2023 • 1
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models Paper • 2310.04564 • Published Oct 6, 2023 • 2
Memory-Efficient Backpropagation through Large Linear Layers Paper • 2201.13195 • Published Jan 31, 2022 • 1
AP: Selective Activation for De-sparsifying Pruned Neural Networks Paper • 2212.06145 • Published Dec 9, 2022 • 1
Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction Paper • 2202.00441 • Published Feb 1, 2022 • 1
Hard ASH: Sparsity and the right optimizer make a continual learner Paper • 2404.17651 • Published Apr 26