Abstract
We introduce Open-Sora Plan, an open-source project that aims to contribute a large generation model for generating desired high-resolution videos with long durations based on various user inputs. Our project comprises multiple components for the entire video generation process, including a Wavelet-Flow Variational Autoencoder, a Joint Image-Video Skiparse Denoiser, and various condition controllers. Moreover, many assistant strategies for efficient training and inference are designed, and a multi-dimensional data curation pipeline is proposed for obtaining desired high-quality data. Benefiting from efficient thoughts, our Open-Sora Plan achieves impressive video generation results in both qualitative and quantitative evaluations. We hope our careful design and practical experience can inspire the video generation research community. All our codes and model weights are publicly available at https://github.com/PKU-YuanGroup/Open-Sora-Plan.
Community
We introduce Open-Sora Plan, an open-source project that aims to contribute a large generation model for generating desired high-resolution videos with long durations based on various user inputs. Our project comprises multiple components for the entire video generation process, including a Wavelet-Flow Variational Autoencoder, a Joint Image-Video Skiparse Denoiser, and various condition controllers. Moreover, many assistant strategies for efficient training and inference are designed, and a multi-dimensional data curation pipeline is proposed for obtaining desired high-quality data. Benefiting from efficient thoughts, our Open-Sora Plan achieves impressive video generation results in both qualitative and quantitative evaluations. We hope our careful design and practical experience can inspire the video generation research community. All our codes and model weights are publicly available at https://github.com/PKU-YuanGroup/Open-Sora-Plan.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Allegro: Open the Black Box of Commercial-Level Video Generation Model (2024)
- Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content (2024)
- Pyramidal Flow Matching for Efficient Video Generative Modeling (2024)
- ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation (2024)
- LVD-2M: A Long-take Video Dataset with Temporally Dense Captions (2024)
- PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance (2024)
- MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper