MindFlow

Abstract

Generating lifelike facial animation for dyadic conversations requires reconciling high-level cognitive intent with precise low-level motor reflexes, yet existing methods fall short in semantic understanding of dialogue context and in precise dynamic control. In this paper, we propose MindFlow, a dual-pathway generative framework inspired by the Ventral-Dorsal pathway model in neuroscience, which decouples generation into two collaborative streams, thereby harmonizing deep semantic reasoning with fine-grained control. In the Ventral module, we transform the conventional Sentence-Action approach into a novel Chunk-State approach that models raw acoustic streams as a context-aware, evolving emotional state chain, capturing subtle paralinguistic nuances and mid-utterance emotional shifts missed by sentence-level modeling. The Dorsal module features a conditional autoregressive flow matching network for high-fidelity facial motion, driven by high-frequency acoustic cues and modulated by emotion states, plus a Selective Acoustic Injector for adaptive audio gating to ensure robustness in talking-and-listening dynamics without interference. Extensive experiments demonstrate that MindFlow achieves superior semantic appropriateness and motion naturalness compared to state-of-the-art baselines.

Highlights

            Ventral-Dorsal dual pathway: MindFlow separates cognitive semantic understanding from reflexive motion generation to address rigid and hollow facial expressions in dyadic conversation animation.
Chunk-State approach: The Ventral module uses multimodal LLMs to analyze dynamic emotion states directly from audio streams, preserving prosodic cues and enabling continuous fine-grained expression control.
Selective Acoustic Injector: The Dorsal module adaptively gates listening and talking dynamics to generate high-quality facial motion across both conversational roles.

          

Demo Video

Trouble playing? Watch on YouTube · Download MP4

BibTeX

@inproceedings{chen2026mindflow,
  title={{MindFlow}: Harmonizing Cognitive Semantics and Acoustic Dynamics for Facial Animation Generation in Dyadic Conversations},
  author={Chen, Hejia and Zhang, Haoxian and He, Xu and Liu, Xiaoqiang and Wan, Pengfei and Zhang, Shoulong and Li, Shuai},
  booktitle={European Conference on Computer Vision},
  year={2026}
}

Acknowledgments

This work was supported by Zhongguancun Laboratory, the National Key R&D Program of China (2023YFF1203803), and the National Natural Science Foundation of China (62502469, 62525204).

MindFlow: Harmonizing Cognitive Semantics and Acoustic Dynamics for Facial Animation Generation in Dyadic Conversations

Abstract

Highlights

Demo Video

BibTeX

Acknowledgments