Diffusers
Safetensors
Edit model card

EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

*Equal Contribution.
Terminal Technology Department, Alipay, Ant Group.

Model Files

./pretrained_models/
β”œβ”€β”€ denoising_unet.pth
β”œβ”€β”€ reference_unet.pth
β”œβ”€β”€ motion_module.pth
β”œβ”€β”€ face_locator.pth
β”œβ”€β”€ sd-vae-ft-mse
β”‚   └── ...
β”œβ”€β”€ sd-image-variations-diffusers
β”‚   └── ...
└── audio_processor
    └── whisper_tiny.pt

Some models in this hub can be directly downloaded from it's original hub:

Gallery

Audio Driven (Sing)

Audio Driven (English)

Audio Driven (Chinese)

Landmark Driven

Audio + Selected Landmark Driven

(Some demo images above are sourced from image websites. If there is any infringement, we will immediately remove them and apologize.οΌ‰

Citation

If you find our work useful for your research, please consider citing the paper:

@misc{chen2024echomimic,
  title={EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning},
  author={Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma},
  year={2024},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Spaces using BadToBest/EchoMimic 8