SyncAnimation: A Real-Time End-to-End Framework for Audio-Driven Human Pose and Talking Head Animation

Audiodynamic Image

SyncAnimation is the first NeRF-based fully generative approach that utilizes audio-driven generation to create expressions and an adjustable torso (left). SyncAnimation requires only audio and monocular, or even noise, to render highly detailed identity information, along with realistic and dynamic facial and torso changes, while maintaining audio consistency (right).

Framework

Audiodynamic Image

Comparison with SOTA

Comparison with Gan

Comparison with Nerf

Comparison with Stable Diffusion

Torso Scaling Expansion

Live News

-->

-->

-->