Improving Visual Commonsense in Language Models via Multiple Image Generation Paper • 2406.13621 • Published Jun 19 • 13
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency Paper • 2310.03734 • Published Oct 5, 2023 • 14
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9 • 42
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation Paper • 2309.16429 • Published Sep 28, 2023 • 11
AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation Paper • 2305.13050 • Published May 22, 2023 • 3