MotionCLR: Motion Generation and Training-free Editing via Understanding Attention Mechanisms
Abstract
This research delves into the problem of interactive editing of human motion generation. Previous motion diffusion models lack explicit modeling of the word-level text-motion correspondence and good explainability, hence restricting their fine-grained editing ability. To address this issue, we propose an attention-based motion diffusion model, namely MotionCLR, with CLeaR modeling of attention mechanisms. Technically, MotionCLR models the in-modality and cross-modality interactions with self-attention and cross-attention, respectively. More specifically, the self-attention mechanism aims to measure the sequential similarity between frames and impacts the order of motion features. By contrast, the cross-attention mechanism works to find the fine-grained word-sequence correspondence and activate the corresponding timesteps in the motion sequence. Based on these key properties, we develop a versatile set of simple yet effective motion editing methods via manipulating attention maps, such as motion (de-)emphasizing, in-place motion replacement, and example-based motion generation, etc. For further verification of the explainability of the attention mechanism, we additionally explore the potential of action-counting and grounded motion generation ability via attention maps. Our experimental results show that our method enjoys good generation and editing ability with good explainability.
Community
This is an interactive motion editing model.
Demo: https://youtu.be/5cYXUA9JPnc
Interactive demo: https://youtu.be/5cYXUA9JPnc
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning (2024)
- Text-driven Human Motion Generation with Motion Masked Diffusion Model (2024)
- MoRAG - Multi-Fusion Retrieval Augmented Generation for Human Motion (2024)
- DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control (2024)
- Unimotion: Unifying 3D Human Motion Synthesis and Understanding (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper