stereoplegic
's Collections
Shared params
updated
Matryoshka Diffusion Models
Paper
•
2310.15111
•
Published
•
40
SortedNet, a Place for Every Network and Every Network in its Place:
Towards a Generalized Solution for Training Many-in-One Neural Networks
Paper
•
2309.00255
•
Published
•
1
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large
Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)
Paper
•
2309.08968
•
Published
•
22
Matryoshka Representation Learning
Paper
•
2205.13147
•
Published
•
9
Matryoshka: Stealing Functionality of Private ML Data by Hiding Models
in Model
Paper
•
2206.14371
•
Published
•
3
MatFormer: Nested Transformer for Elastic Inference
Paper
•
2310.07707
•
Published
•
1
Paper
•
1312.4400
•
Published
•
1
Composable Sparse Fine-Tuning for Cross-Lingual Transfer
Paper
•
2110.07560
•
Published
•
1
Visual Programming: Compositional visual reasoning without training
Paper
•
2211.11559
•
Published
•
1
One Wide Feedforward is All You Need
Paper
•
2309.01826
•
Published
•
31
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with
Architecture-Routed Mixture-of-Experts
Paper
•
2306.04845
•
Published
•
4
Improving Differentiable Architecture Search via Self-Distillation
Paper
•
2302.05629
•
Published
•
1
TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression
For On-device ASR Models
Paper
•
2309.01947
•
Published
•
1
NetDistiller: Empowering Tiny Deep Learning via In-Situ Distillation
Paper
•
2310.19820
•
Published
•
1
Beyond Universal Transformer: block reusing with adaptor in Transformer
for automatic speech recognition
Paper
•
2303.13072
•
Published
•
1
Learning a Consensus Sub-Network with Polarization Regularization and
One Pass Training
Paper
•
2302.10798
•
Published
•
1
An EMO Joint Pruning with Multiple Sub-networks: Fast and Effect
Paper
•
2303.16212
•
Published
•
1
Looped Transformers are Better at Learning Learning Algorithms
Paper
•
2311.12424
•
Published
•
1
Looped Transformers as Programmable Computers
Paper
•
2301.13196
•
Published
•
1
Learning Stackable and Skippable LEGO Bricks for Efficient,
Reconfigurable, and Variable-Resolution Diffusion Modeling
Paper
•
2310.06389
•
Published
•
1
Sliced Recursive Transformer
Paper
•
2111.05297
•
Published
•
1
Transformer in Transformer
Paper
•
2103.00112
•
Published
•
1
Go Wider Instead of Deeper
Paper
•
2107.11817
•
Published
•
1
Sparse Universal Transformer
Paper
•
2310.07096
•
Published
Matryoshka Multimodal Models
Paper
•
2405.17430
•
Published
•
30
MoEUT: Mixture-of-Experts Universal Transformers
Paper
•
2405.16039
•
Published
Beyond KV Caching: Shared Attention for Efficient LLMs
Paper
•
2407.12866
•
Published