Collections
Discover the best community collections!
Collections including paper arxiv:2305.17190
-
Wide Residual Networks
Paper • 1605.07146 • Published • 2 -
Characterizing signal propagation to close the performance gap in unnormalized ResNets
Paper • 2101.08692 • Published • 2 -
Pareto-Optimal Quantized ResNet Is Mostly 4-bit
Paper • 2105.03536 • Published • 2 -
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Paper • 2106.01548 • Published • 2
-
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model
Paper • 2305.15265 • Published • 1 -
Mesa: A Memory-saving Training Framework for Transformers
Paper • 2111.11124 • Published • 1 -
Full Parameter Fine-tuning for Large Language Models with Limited Resources
Paper • 2306.09782 • Published • 29 -
Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models
Paper • 2106.02679 • Published • 1
-
Sparse Backpropagation for MoE Training
Paper • 2310.00811 • Published • 2 -
The Forward-Forward Algorithm: Some Preliminary Investigations
Paper • 2212.13345 • Published • 2 -
Fine-Tuning Language Models with Just Forward Passes
Paper • 2305.17333 • Published • 2 -
Towards Green AI in Fine-tuning Large Language Models via Adaptive Backpropagation
Paper • 2309.13192 • Published • 1