LarryTsai commited on
Commit
aabda43
2 Parent(s): 8b0e3d0 e522c75

Merge branch 'main' of hf.co:rhymes-ai/Allegro

Browse files
Files changed (1) hide show
  1. README.md +11 -9
README.md CHANGED
@@ -17,7 +17,10 @@ library_name: diffusers
17
 
18
 
19
  # Key Feature
20
- Allegro is capable of producing high-quality, 6-second videos at 30 frames per second and 720p resolution from simple text prompts.
 
 
 
21
 
22
 
23
  # Model info
@@ -29,7 +32,7 @@ Allegro is capable of producing high-quality, 6-second videos at 30 frames per s
29
  </tr>
30
  <tr>
31
  <th>Description</th>
32
- <td>Text-to-Video Diffusion Transformer</td>
33
  </tr>
34
  <tr>
35
  <th>Download</th>
@@ -76,17 +79,14 @@ Allegro is capable of producing high-quality, 6-second videos at 30 frames per s
76
  You can quickly get started with Allegro using the Hugging Face Diffusers library.
77
  For more tutorials, see Allegro GitHub (link-tbd).
78
 
79
- Install necessary requirements:
80
- ```python
81
- pip install diffusers transformers imageio
82
- ```
83
- Inference on single gpu:
84
  ```python
85
  from diffusers import DiffusionPipeline
86
  import torch
87
 
88
  allegro_pipeline = DiffusionPipeline.from_pretrained(
89
- "rhythms-ai/allegro", trust_remote_code=True, torch_dtype=torch.bfloat16
90
  ).to("cuda")
91
 
92
  allegro_pipeline.vae = allegro_pipeline.vae.to(torch.float32)
@@ -121,8 +121,10 @@ out_video = allegro_pipeline(
121
  ).video[0]
122
 
123
  imageio.mimwrite("test_video.mp4", out_video, fps=15, quality=8)
124
-
125
  ```
 
 
 
126
 
127
  # License
128
  This repo is released under the Apache 2.0 License.
 
17
 
18
 
19
  # Key Feature
20
+ - **High-Quality Output**: Generate detailed 6-second videos at 15 FPS with 720x1280 resolution, which can be interpolated to 30 FPS with EMA-VFI.
21
+ - **Small and Efficient**: Features a 175M parameter VAE and a 2.8B parameter DiT model. Supports multiple precisions (FP32, BF16, FP16) and uses 9.3 GB of GPU memory in BF16 mode with CPU offloading.
22
+ - **Extensive Context Length**: Handles up to 79.2k tokens, providing rich and comprehensive text-to-video generation capabilities.
23
+ - **Versatile Content Creation**: Capable of generating a wide range of content, from close-ups of humans and animals to diverse dynamic scenes.
24
 
25
 
26
  # Model info
 
32
  </tr>
33
  <tr>
34
  <th>Description</th>
35
+ <td>Text-to-Video Generation Model</td>
36
  </tr>
37
  <tr>
38
  <th>Download</th>
 
79
  You can quickly get started with Allegro using the Hugging Face Diffusers library.
80
  For more tutorials, see Allegro GitHub (link-tbd).
81
 
82
+ 1. Install necessary requirements. Please refer to [requirements.txt](https://github.com/rhymes-ai) on Allegro GitHub.
83
+ 2. Perform inference on a single GPU.
 
 
 
84
  ```python
85
  from diffusers import DiffusionPipeline
86
  import torch
87
 
88
  allegro_pipeline = DiffusionPipeline.from_pretrained(
89
+ "rhymes-ai/Allegro", trust_remote_code=True, torch_dtype=torch.bfloat16
90
  ).to("cuda")
91
 
92
  allegro_pipeline.vae = allegro_pipeline.vae.to(torch.float32)
 
121
  ).video[0]
122
 
123
  imageio.mimwrite("test_video.mp4", out_video, fps=15, quality=8)
 
124
  ```
125
+ Tip:
126
+ - It is highly recommended to use a video frame interpolation model (such as EMA-VFI) to enhance the result to 30 FPS.
127
+ - For more tutorials, see [Allegro GitHub](https://github.com/rhymes-ai).
128
 
129
  # License
130
  This repo is released under the Apache 2.0 License.