Abstract
Multimodal facial action units (AU) recognition aims to build models that are capable of processing, correlating, and integrating information from multiple modalities (i.e., 2D images from a visual sensor, 3D geometry from 3D imaging, and thermal images from an infrared sensor). Although the multimodal data can provide rich information, there are two challenges that have to be addressed when learning from multimodal data: 1) the model must capture the complex cross-modal interactions in order to utilize the additional and mutual information effectively; 2) the model must be robust enough in the circumstance of unexpected data corruptions during testing, in case of a certain modality missing or being noisy. In this paper, we propose a novel Adaptive Multimodal Fusion method (AMF) for AU detection, which learns to select the most relevant feature representations from different modalities by a re-sampling procedure conditioned on a feature scoring module. The feature scoring module is designed to allow for evaluating the quality of features learned from multiple modalities. As a result, AMF is able to adaptively select more discriminative features, thus increasing the robustness to missing or corrupted modalities. In addition, to alleviate the over-fitting problem and make the model generalize better on the testing data, a cut-switch multimodal data augmentation method is designed, by which a random block is cut and switched across multiple modalities. We have conducted a thorough investigation on two public multimodal AU datasets, BP4D and BP4D+, and the results demonstrate the effectiveness of the proposed method. Ablation studies on various circumstances also show that our method remains robust to missing or noisy modalities during tests.
Recommended Citation
H. Yang et al., "Adaptive Multimodal Fusion For Facial Action Units Recognition," MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia, pp. 2982 - 2990, Association for Computing Machinery, Oct 2020.
The definitive version is available at https://doi.org/10.1145/3394171.3413538
Department(s)
Computer Science
Publication Status
Public Access
Keywords and Phrases
au; facial action units; multi-modalities; multimodal fusion
International Standard Book Number (ISBN)
978-145037988-5
Document Type
Article - Conference proceedings
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2023 Association for Computing Machinery, All rights reserved.
Publication Date
12 Oct 2020
Comments
National Science Foundation, Grant CNS-1629898