NeuroSync Audio-to-Face Blendshape Transformer Model

05/11/24 Update

Updated to 128 frame seq length model. Update your config and model.pth as well as the .py for the local api to ensure everything works as intended.

Model Overview

The NeuroSync audio-to-face blendshape transformer seq2seq model is designed to transform sequences of audio features into corresponding facial blendshape coefficients. This enables facial animation from audio input, making it useful for real-time character animation, including integration with Unreal Engine via LiveLink.

The model maps sequences of 128 frames of audio features to facial blendshapes used for character animation. By leveraging a transformer-based encoder-decoder architecture, it generates highly accurate blendshape coefficients that can be streamed to Unreal Engine 5 using LiveLink, ensuring real-time synchronization between audio and facial movements.

Features

Audio-to-Face Transformation: Converts raw audio features into facial blendshape coefficients for driving facial animations.
Transformer Seq2Seq Architecture: Uses transformer encoder-decoder layers to capture complex dependencies between audio features and facial expressions.
Integration with Unreal Engine (LiveLink): Supports real-time streaming of generated facial blendshapes into Unreal Engine 5 through the NeuroSync Player using LiveLink.
Non-Commercial License: This model is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).

YouTube Channel

For more updates on the progress of training, development of the tools, and tutorials on how to use the NeuroSync model, check out our YouTube channel.

Stay tuned for insights into the ongoing development and enhancements related to the model and its integration with tools like Unreal Engine and LiveLink.

YouTube channel

Usage

You can set up the local API for this model using the NeuroSync Local API repository. This API allows you to process audio files and stream the generated blendshapes to Unreal Engine using NeuroSync Player (Unreal Engine LiveLink).

Non-Local API Option (Alpha Access)

If you prefer not to host the model locally, you can apply for access to the NeuroSync Alpha API, which enables non-local usage. This allows you to connect directly with the NeuroSync Player (Unreal Engine LiveLink) and stream facial blendshapes without running the local model. To apply for access to the alpha API, visit neurosync.info.

Model Architecture

The model consists of:

Encoder: A transformer encoder that processes audio features and applies positional encodings to capture temporal relationships.
Decoder: A transformer decoder with cross-attention, which attends to the encoder outputs and generates the corresponding blendshape coefficients.
Blendshape Output: The output consists of 52 blendshape coefficients used for facial animations (some coefficients, such as head movements and tongue movements, are excluded from being sent to LiveLink).

Blendshape Coefficients

The model outputs 61 blendshape coefficients including:

Eye movements (e.g., EyeBlinkLeft, EyeSquintRight)
Jaw movements (e.g., JawOpen, JawRight)
Mouth movements (e.g., MouthSmileLeft, MouthPucker)
Brow movements (e.g., BrowInnerUp, BrowDownLeft)
Cheek and nose movements (e.g., CheekPuff, NoseSneerRight)

Currently, coefficients 52 to 68 should be ignored (or used to drive additive sliders) as they pertain to head movements and emotional states (e.g., Angry, Happy, Sad), and they are not streamed into LiveLink.

License

This model is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). You may use, adapt, and share this model for non-commercial purposes, but you must give appropriate credit. For more details, see the Creative Commons BY-NC 4.0 License.

References

For any questions or further support, please feel free to contribute to the repository or raise an issue.

AnimaVR
/

NEUROSYNC_Audio_To_Face_Blendshape

You need to agree to share your contact information to access this model