Abstract
Diffusion models have achieved remarkable success in image and video generation. In this work, we demonstrate that diffusion models can also generate high-performing neural network parameters. Our approach is simple, utilizing an autoencoder and a standard latent diffusion model. The autoencoder extracts latent representations of a subset of the trained network parameters. A diffusion model is then trained to synthesize these latent parameter representations from random noise. It then generates new representations that are passed through the autoencoder's decoder, whose outputs are ready to use as new subsets of network parameters. Across various architectures and datasets, our diffusion process consistently generates models of comparable or improved performance over trained networks, with minimal additional cost. Notably, we empirically find that the generated models perform differently with the trained networks. Our results encourage more exploration on the versatile use of diffusion models.
Community
Wow... ๐คฏ
predicting loras on the fly, routing in moe, etc. could all use this
I read some of the paper and give a very short and brief summary of it: https://twitter.com/JavArButt/status/1760273030540869868
Also, I pose a question, that might be interesting for further research
maybe in the next version, we will explore the tech of cross-arch parameter generation. Thanks for your question!
interesting
thinking about getting optimal initial state of parameters ahead of training, would reduce pre-training cost.
hope this reply (https://x.com/liuzhuang1234/status/1760363128309600607?s=20) can address your question.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Bring Metric Functions into Diffusion Models (2024)
- DDMI: Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations (2024)
- Cross-view Masked Diffusion Transformers for Person Image Synthesis (2024)
- Improving the Stability of Diffusion Models for Content Consistent Super-Resolution (2023)
- Improving Diffusion-Based Image Synthesis with Context Prediction (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Harnessing Diffusion Models for Superior Neural Network Parameters
Links ๐:
๐ Subscribe: https://www.youtube.com/@Arxflix
๐ Twitter: https://x.com/arxflix
๐ LMNT (Partner): https://lmnt.com/
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper