Abstract

Multimodal facial action units (AU) recognition aims to build models that are capable of processing, correlating, and integrating information from multiple modalities (i.e., 2D images from a visual sensor, 3D geometry from 3D imaging, and thermal images from an infrared sensor). Although the multimodal data can provide rich information, there are two challenges that have to be addressed when learning from multimodal data: 1) the model must capture the complex cross-modal interactions in order to utilize the additional and mutual information effectively; 2) the model must be robust enough in the circumstance of unexpected data corruptions during testing, in case of a certain modality missing or being noisy. In this paper, we propose a novel Adaptive Multimodal Fusion method (AMF) for AU detection, which learns to select the most relevant feature representations from different modalities by a re-sampling procedure conditioned on a feature scoring module. The feature scoring module is designed to allow for evaluating the quality of features learned from multiple modalities. As a result, AMF is able to adaptively select more discriminative features, thus increasing the robustness to missing or corrupted modalities. In addition, to alleviate the over-fitting problem and make the model generalize better on the testing data, a cut-switch multimodal data augmentation method is designed, by which a random block is cut and switched across multiple modalities. We have conducted a thorough investigation on two public multimodal AU datasets, BP4D and BP4D+, and the results demonstrate the effectiveness of the proposed method. Ablation studies on various circumstances also show that our method remains robust to missing or noisy modalities during tests.

Recommended Citation

H. Yang et al., "Adaptive Multimodal Fusion For Facial Action Units Recognition," MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia, pp. 2982 - 2990, Association for Computing Machinery, Oct 2020.

The definitive version is available at https://doi.org/10.1145/3394171.3413538

Department(s)

Computer Science

Publication Status

Public Access

Comments

National Science Foundation, Grant CNS-1629898

Keywords and Phrases

au; facial action units; multi-modalities; multimodal fusion

International Standard Book Number (ISBN)

978-145037988-5

Document Type

Article - Conference proceedings

Document Version

Citation

File Type

text

Language(s)

English

Rights

Publication Date

12 Oct 2020

Download

Full Text Link

Included in

Computer Sciences Commons

COinS

Computer Science Faculty Research & Creative Works

Adaptive Multimodal Fusion For Facial Action Units Recognition

Abstract

Recommended Citation

Department(s)

Publication Status

Comments

Keywords and Phrases

International Standard Book Number (ISBN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Search

Browse

Faculty Gallery

Author Corner

Related Content

Useful Links

Article Locations

Computer Science Faculty Research & Creative Works

Adaptive Multimodal Fusion For Facial Action Units Recognition

Author

Abstract

Recommended Citation

Department(s)

Publication Status

Comments

Keywords and Phrases

International Standard Book Number (ISBN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Share

Search

Browse

Faculty Gallery

Author Corner

Related Content

Useful Links

Article Locations