metadata

license: apple-ascl
tags:
  - mdm

Matryoshka Diffusion Models

Matryoshka Diffusion Models was introduced in the paper of the same name, by Jiatao Gu,Shuangfei Zhai, Yizhe Zhang, Josh Susskind, Navdeep Jaitly.

This repository contains the Flickr 64 checkpoint.

Highlights

This checkpoint was trained on a dataset of 50M text-image pairs collected from Flickr.
This model was trained using a single UNet (not nested), and generates images with a resolution of 64 × 64.
Despite training on relatively small datasets, MDMs show strong zero-shot capabilities of generating high-resolution images and videos.

Checkpoints

Model	Dataset	Resolution	Nested UNets
mdm-flickr-64	Flickr 50M	64 × 64	❎
mdm-flickr-256	Flickr 50M	256 × 256	✅
mdm-flickr-1024	Flickr 50M	1024 × 1024	✅

How to Use

Please, refer to the original repository for training and inference instructions.

Citation

@misc{gu2023matryoshkadiffusionmodels,
      title={Matryoshka Diffusion Models},
      author={Jiatao Gu and Shuangfei Zhai and Yizhe Zhang and Josh Susskind and Navdeep Jaitly},
      year={2023},
      eprint={2310.15111},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2310.15111},
}