RhymesAI commited on
Commit
bcdde5f
1 Parent(s): a75220d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -53
README.md CHANGED
@@ -8,20 +8,20 @@ library_name: diffusers
8
  <img src="https://huggingface.co/rhymes-ai/Allegro/resolve/main/banner_white.gif">
9
  </p>
10
  <p align="center">
11
- <a href="https://rhymes.ai/" target="_blank"> Gallery</a> · <a href="https://github.com/rhymes-ai/Aria" target="_blank">GitHub</a> · <a href="https://www.rhymes.ai/blog-details/" target="_blank">Blog</a> · <a href="https://arxiv.org/pdf/2410.05993" target="_blank">Paper</a> · <a href="https://discord" target="_blank">Discord</a>
12
 
13
  </p>
14
 
15
  # Gallery
16
- <img src="https://huggingface.co/rhymes-ai/Allegro/resolve/main/gallery.gif" width="1000" height="800"/>For more demos and corresponding prompts, see the [Allegro Gallery](TBD).
17
 
18
 
19
  # Key Feature
20
 
21
- - **Open Source**: [Full model weights](https://huggingface.co/rhymes-ai/Allegro) and [code](https://github.com/rhymes-ai/Allegro) available to the community, Apache 2.0!
22
  - **Versatile Content Creation**: Capable of generating a wide range of content, from close-ups of humans and animals to diverse dynamic scenes.
23
- - **High-Quality Output**: Generate detailed 6-second videos at 15 FPS with 720x1280 resolution, can be interpolated to 30 FPS with [EMA-VFI](https://github.com/MCG-NJU/EMA-VFI).
24
- - **Small and Efficient**: Features a 175M parameter VideoVAE and a 2.8B parameter VideoDiT model. Supports multiple precisions (FP32, BF16, FP16) and uses 9.3 GB of GPU memory in BF16 mode with CPU offloading. Context length is 79.2k, equivalent to 88 frames.
25
 
26
  # Model info
27
 
@@ -54,7 +54,7 @@ library_name: diffusers
54
  </tr>
55
  <tr>
56
  <th>Context Length</th>
57
- <td>79.2k</td>
58
  </tr>
59
  <tr>
60
  <th>Resolution</th>
@@ -76,55 +76,23 @@ library_name: diffusers
76
 
77
 
78
  # Quick start
79
- You can quickly get started with Allegro using the Hugging Face Diffusers library.
80
- For more tutorials, see Allegro GitHub (link-tbd).
81
-
82
- 1. Install necessary requirements. Please refer to [requirements.txt](https://github.com/rhymes-ai) on Allegro GitHub.
83
- 2. Perform inference on a single GPU.
 
84
  ```python
85
- from diffusers import DiffusionPipeline
86
- import torch
87
-
88
- allegro_pipeline = DiffusionPipeline.from_pretrained(
89
- "rhymes-ai/Allegro", trust_remote_code=True, torch_dtype=torch.bfloat16
90
- ).to("cuda")
91
-
92
- allegro_pipeline.vae = allegro_pipeline.vae.to(torch.float32)
93
-
94
- prompt = "a video of an astronaut riding a horse on mars"
95
-
96
- positive_prompt = """
97
- (masterpiece), (best quality), (ultra-detailed), (unwatermarked),
98
- {}
99
- emotional, harmonious, vignette, 4k epic detailed, shot on kodak, 35mm photo,
100
- sharp focus, high budget, cinemascope, moody, epic, gorgeous
101
- """
102
-
103
- negative_prompt = """
104
- nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality,
105
- low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry.
106
- """
107
-
108
- num_sampling_steps, guidance_scale, seed = 100, 7.5, 42
109
-
110
- user_prompt = positive_prompt.format(args.user_prompt.lower().strip())
111
- out_video = allegro_pipeline(
112
- user_prompt,
113
- negative_prompt=negative_prompt,
114
- num_frames=88,
115
- height=720,
116
- width=1280,
117
- num_inference_steps=num_sampling_steps,
118
- guidance_scale=guidance_scale,
119
- max_sequence_length=512,
120
- generator = torch.Generator(device="cuda:0").manual_seed(seed)
121
- ).video[0]
122
-
123
- imageio.mimwrite("test_video.mp4", out_video, fps=15, quality=8)
124
  ```
125
- Tip:
126
- - It is highly recommended to use a video frame interpolation model (such as EMA-VFI) to enhance the result to 30 FPS.
127
- - For more tutorials, see [Allegro GitHub](https://github.com/rhymes-ai).
128
 
129
  # License
130
  This repo is released under the Apache 2.0 License.
 
8
  <img src="https://huggingface.co/rhymes-ai/Allegro/resolve/main/banner_white.gif">
9
  </p>
10
  <p align="center">
11
+ <a href="https://rhymes.ai/allegro_gallery" target="_blank"> Gallery</a> · <a href="https://github.com/rhymes-ai/Allegro" target="_blank">GitHub</a> · <a href="https://rhymes.ai/blog-details/allegro-advanced-video-generation-model" target="_blank">Blog</a> · <a href="https://arxiv.org/pdf/2410.05993" target="_blank">Paper</a> · <a href="https://discord.com/invite/u8HxU23myj" target="_blank">Discord</a>
12
 
13
  </p>
14
 
15
  # Gallery
16
+ <img src="https://huggingface.co/rhymes-ai/Allegro/resolve/main/gallery.gif" width="1000" height="800"/>For more demos and corresponding prompts, see the [Allegro Gallery](https://rhymes.ai/allegro_gallery).
17
 
18
 
19
  # Key Feature
20
 
21
+ - **Open Source**: Full [model weights](https://huggingface.co/rhymes-ai/Allegro) and [code](https://github.com/rhymes-ai/Allegro) available to the community, Apache 2.0!
22
  - **Versatile Content Creation**: Capable of generating a wide range of content, from close-ups of humans and animals to diverse dynamic scenes.
23
+ - **High-Quality Output**: Generate detailed 6-second videos at 15 FPS with 720x1280 resolution, which can be interpolated to 30 FPS with [EMA-VFI](https://github.com/MCG-NJU/EMA-VFI).
24
+ - **Small and Efficient**: Features a 175M parameter VideoVAE and a 2.8B parameter VideoDiT model. Supports multiple precisions (FP32, BF16, FP16) and uses 9.3 GB of GPU memory in BF16 mode with CPU offloading. Context length is 79.2K, equivalent to 88 frames.
25
 
26
  # Model info
27
 
 
54
  </tr>
55
  <tr>
56
  <th>Context Length</th>
57
+ <td>79.2K</td>
58
  </tr>
59
  <tr>
60
  <th>Resolution</th>
 
76
 
77
 
78
  # Quick start
79
+ 1. Download the [Allegro GitHub code](https://github.com/rhymes-ai/Allegro).
80
+ 2. Install the necessary requirements.
81
+ a. Ensure Python >= 3.10, PyTorch >= 2.4, CUDA >= 12.4. For details, see [requirements.txt](https://github.com/rhymes-ai/Allegro/blob/main/requirements.txt).
82
+ b. It is recommended to use Anaconda to create a new environment (Python >= 3.10) to run the following example.
83
+ 3. Download the [Allegro model weights](https://huggingface.co/rhymes-ai/Allegro).
84
+ 4. Run inference.
85
  ```python
86
+ python single_inference.py \
87
+ --user_prompt 'A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water. From an aerial view, the boats vary in size and color, some moving and some stationary. Fishing boats in the water suggest that this location might be a popular spot for docking fishing boats.' \
88
+ --vae your/path/to/vae \
89
+ --dit your/path/to/transformer \
90
+ --text_encoder your/path/to/text_encoder \
91
+ --tokenizer your/path/to/tokenizer \
92
+ --guidance_scale 7.5 \
93
+ --num_sampling_steps 100 \
94
+ --seed 42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
  ```
 
 
 
96
 
97
  # License
98
  This repo is released under the Apache 2.0 License.