This paper demonstrates the effectiveness of a diversification mechanism for building a more robust multi-attention system in generic facial action analysis. While previous multi-attention (e.g., visual attention and self-attention) research on facial expression recognition (FER) and Action Unit (AU) detection have been thoroughly studied to focus on "external attention diversification", where attention branches localize different facial areas, we delve into the realm of "internal attention diversification" and explore the impact of diverse attention patterns within the same Region of Interest (RoI). Our experiments reveal that variability in attention patterns significantly impacts model performance, indicating that unconstrained multi-attention plagued by redundancy and over-parameterization, leading to sub-optimal results. To tackle this issue, we propose a compact module that guides the model to achieve self-diversified multi-attention. Our method is applied to both CNN-based and Transformer-based models, benchmarked on popular databases such as BP4D and DISFA for AU detection, as well as CK+, MMI, BU-3DFE, and BP4D+ for facial expression recognition. We also evaluate the mechanism on Self-attention and Channel-wise attention designs for improving their adaptive capabilities in multi-modal feature fusion tasks. The multi-modal evaluation is conducted on BP4D, BP4D+, and our newly developed large-scale comprehensive emotion database BP4D++, which contains well-synchronized and aligned sensor modalities, addressing the scarcity of annotations and identities in human affective computing. We plan to release the new database to the research community, fostering further advancements in this field.
X. Li and Z. Zhang and X. Zhang and T. Wang and Z. Li and H. Yang and U. Ciftci and Q. Ji and J. Cohn and L. Yin, "Disagreement Matters: Exploring Internal Diversification For Redundant Attention In Generic Facial Action Analysis," IEEE Transactions on Affective Computing, Institute of Electrical and Electronics Engineers, Jan 2023.
The definitive version is available at https://doi.org/10.1109/TAFFC.2023.3286838
Keywords and Phrases
attention; Databases; disagreement; diversity; Face recognition; Facial action unit detection; facial expression recognition; Feature extraction; Gold; multi-channel; multi-head; multi-modal feature fusion; Redundancy; Task analysis; transformer; Transformers
International Standard Serial Number (ISSN)
Article - Journal
© 2023 Institute of Electrical and Electronics Engineers, All rights reserved.
01 Jan 2023