File size: 4,442 Bytes
85fe96a 3abf421 f24bb22 85fe96a 3f99bb7 a75220d 3f99bb7 d4b5e81 8278082 d4b5e81 1e280cc 3abf421 bcdde5f 3abf421 1e280cc 3abf421 bf031d5 bcdde5f 00d6b36 bcdde5f ef79da9 41f1943 2afcdf8 e522c75 2afcdf8 74c2f27 2afcdf8 bcdde5f 2afcdf8 0893849 2afcdf8 ef79da9 21e440d bcdde5f 21e440d 8a57a3c 319e9cf 8a57a3c 319e9cf 21e440d dc9c070 21e440d 0893849 5e69637 0893849 5e69637 ef79da9 3abf421 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
---
license: apache-2.0
language:
- en
library_name: diffusers
---
<p align="center">
<img src="https://huggingface.co/rhymes-ai/Allegro/resolve/main/banner_white.gif">
</p>
<p align="center">
<a href="https://rhymes.ai/allegro_gallery" target="_blank"> Gallery</a> 路 <a href="https://github.com/rhymes-ai/Allegro" target="_blank">GitHub</a> 路 <a href="https://rhymes.ai/blog-details/allegro-advanced-video-generation-model" target="_blank">Blog</a> 路 <a href="https://arxiv.org/abs/2410.15458" target="_blank">Paper</a> 路 <a href="https://discord.com/invite/u8HxU23myj" target="_blank">Discord</a> 路 <a href="https://docs.google.com/forms/d/e/1FAIpQLSfq4Ez48jqZ7ncI7i4GuL7UyCrltfdtrOCDnm_duXxlvh5YmQ/viewform" target="_blank">Join Waitlist</a> (Try it on Discord!)
</p>
# Gallery
<img src="https://huggingface.co/rhymes-ai/Allegro/resolve/main/gallery.gif" width="1000" height="800"/>For more demos and corresponding prompts, see the [Allegro Gallery](https://rhymes.ai/allegro_gallery).
# Key Feature
- **Open Source**: Full [model weights](https://huggingface.co/rhymes-ai/Allegro) and [code](https://github.com/rhymes-ai/Allegro) available to the community, Apache 2.0!
- **Versatile Content Creation**: Capable of generating a wide range of content, from close-ups of humans and animals to diverse dynamic scenes.
- **High-Quality Output**: Generate detailed 6-second videos at 15 FPS with 720x1280 resolution, which can be interpolated to 30 FPS with [EMA-VFI](https://github.com/MCG-NJU/EMA-VFI).
- **Small and Efficient**: Features a 175M parameter VideoVAE and a 2.8B parameter VideoDiT model. Supports multiple precisions (FP32, BF16, FP16) and uses 9.3 GB of GPU memory in BF16 mode with CPU offloading. Context length is 79.2K, equivalent to 88 frames.
# Model info
<table>
<tr>
<th>Model</th>
<td>Allegro</td>
</tr>
<tr>
<th>Description</th>
<td>Text-to-Video Generation Model</td>
</tr>
<tr>
<th>Download</th>
<td><a href="https://huggingface.co/rhymes-ai/Allegro">Hugging Face</a></td>
</tr>
<tr>
<th rowspan="2">Parameter</th>
<td>VAE: 175M</td>
</tr>
<tr>
<td>DiT: 2.8B</td>
</tr>
<tr>
<th rowspan="2">Inference Precision</th>
<td>VAE: FP32/TF32/BF16/FP16 (best in FP32/TF32)</td>
</tr>
<tr>
<td>DiT/T5: BF16/FP32/TF32</td>
</tr>
<tr>
<th>Context Length</th>
<td>79.2K</td>
</tr>
<tr>
<th>Resolution</th>
<td>720 x 1280</td>
</tr>
<tr>
<th>Frames</th>
<td>88</td>
</tr>
<tr>
<th>Video Length</th>
<td>6 seconds @ 15 FPS</td>
</tr>
<tr>
<th>Single GPU Memory Usage</th>
<td>9.3G BF16 (with cpu_offload)</td>
</tr>
</table>
# Quick start
1. Download the [Allegro GitHub code](https://github.com/rhymes-ai/Allegro).
2. Install the necessary requirements.
- Ensure Python >= 3.10, PyTorch >= 2.4, CUDA >= 12.4. For details, see [requirements.txt](https://github.com/rhymes-ai/Allegro/blob/main/requirements.txt).
- It is recommended to use Anaconda to create a new environment (Python >= 3.10) to run the following example.
3. Download the [Allegro model weights](https://huggingface.co/rhymes-ai/Allegro). Before diffuser integration, use git lfs or snapshot_download.
4. Run inference.
```python
python single_inference.py \
--user_prompt 'A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water. From an aerial view, the boats vary in size and color, some moving and some stationary. Fishing boats in the water suggest that this location might be a popular spot for docking fishing boats.' \
--save_path ./output_videos/test_video.mp4
--vae your/path/to/vae \
--dit your/path/to/transformer \
--text_encoder your/path/to/text_encoder \
--tokenizer your/path/to/tokenizer \
--guidance_scale 7.5 \
--num_sampling_steps 100 \
--seed 42
```
Use '--enable_cpu_offload' to offload the model into CPU for less GPU memory cost (about 9.3G, compared to 27.5G if CPU offload is not enabled), but the inference time will increase significantly.
5. (Optional) Interpolate the video to 30 FPS.
It is recommended to use [EMA-VFI](https://github.com/MCG-NJU/EMA-VFI) to interpolate the video from 15 FPS to 30 FPS.
For better visual quality, please use imageio to save the video.
# License
This repo is released under the Apache 2.0 License.
|