As a large number of parameters exist in deep model-based methods, training such models usually requires many fully AU-annotated facial images. This is true with regard to the number of frames in two widely used datasets: BP4D [31] and DISFA [18], while those frames were captured from a small number of subjects (41, 27 respectively). This is problematic, as subjects produce highly consistent facial muscle movements, adding more frames per subject would only adds more close points in the feature space, and thus the classifier does not benefit from those extra frames. Data augmentation methods can be applied to alleviate the problem to a certain degree, but they fail to augment new subjects. We propose a novel Set Operation Aided Network (SO-Net) for action units' detection. Specifically, new features and the corresponding labels are generated by adding set operations to both the feature and label spaces. The generated new features can be treated as a representation of a hypothetical image. As a result, we can implicitly obtain training examples beyond what was originally observed in the dataset. Therefore, the deep model is forced to learn subject-independent features and is generalizable to unseen subjects. SO-Net is end-to-end trainable and can be flexibly plugged in any CNN model during training. We evaluate the proposed method on two public datasets, BP4D and DISFA. The experiment shows a state-of-the-art performance, demonstrating the effectiveness of the proposed method.


Computer Science


National Science Foundation, Grant CNS-1629898

Keywords and Phrases

Data augmentation; deep neural networks; Facial Action Units detection

International Standard Book Number (ISBN)


Document Type

Article - Conference proceedings

Document Version


File Type





© 2023 Institute of Electrical and Electronics Engineers, All rights reserved.

Publication Date

01 Nov 2020