|
--- |
|
license: mit |
|
--- |
|
# VidToMe: Video Token Merging for Zero-Shot Video Editing |
|
|
|
Edit videos instantly with just a prompt! 🎥 |
|
|
|
Diffusers Implementation of VidToMe is a diffusion-based pipeline for zero-shot video editing that enhances temporal consistency and reduces memory usage by merging self-attention tokens across video frames. |
|
This approach allows for a harmonious video generation and editing without needing to fine-tune the model. |
|
By aligning and compressing redundant tokens across frames, VidToMe ensures smooth transitions and coherent video output, improving over traditional video editing methods. |
|
It follows by [this paper](https://arxiv.org/abs/2312.10656). |
|
|
|
## Usage |
|
|
|
```python |
|
from diffusers import DiffusionPipeline |
|
|
|
# load the pretrained model |
|
pipeline = DiffusionPipeline.from_pretrained("jadechoghari/VidToMe", trust_remote_code=True, custom_pipeline="jadechoghari/VidToMe", sd_version="depth", device="cuda", float_precision="fp16") |
|
|
|
# Edit a video with prompts |
|
pipeline( |
|
video_path="path/to/video.mp4", |
|
video_prompt="A serene beach scene", |
|
edit_prompt="Make the sunset more vibrant", |
|
control_type="depth", |
|
n_timesteps=50 |
|
) |
|
``` |
|
|
|
## Applications: |
|
- Zero-shot video editing for content creators |
|
- Video transformation using natural language prompts |
|
- Memory-optimized video generation for longer or complex sequences |
|
|
|
**Model Authors:** |
|
- Xirui Li |
|
- Chao Ma |
|
- Xiaokang Yang |
|
- Ming-Hsuan Yang |
|
|
|
For more check the [Github Repo](https://github.com/lixirui142/VidToMe). |