Automatic Facial Action Units (AUs) detection is the recognition of the facial appearance changes caused by the contraction or relaxation of one or more related facial muscles. Compared to the sequence-based methods, a decreased performance is observed for the static image-based AU detection, due to the loss of temporal information. To solve this problem, we propose a novel method that implicitly learns temporal information from a single image for AU detection by adding a hidden optical-flow layer to concatenate two Convolutional Neural Networks (CNNs) models: optical-flow net (OF-Net) and AU detection net (AU-Net). The OF-Net is designed to estimate the facial appearance changes (optical flow) from a single input image through unsupervised learning. The AU-Net accepts the estimated optical-flow as input and predicts the AU occurrence. By training both OF-Net and AU-Net jointly, our model achieves better performance than training them separately, as the AU-Net provides semantic constraints for the optical-flow learning and helps generate more meaningful optical-flow. In return, the estimated optical-flow, which reflects facial appearance changes, benefits the AU-Net. Our proposed method has been evaluated on two benchmarks: BP4D and DISFA, and the experiments show significant performance improvement as compared to the state-of-the-art methods.
H. Yang and L. Yin, "Learning Temporal Information From A Single Image For AU Detection," Proceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019, article no. 8756556, Institute of Electrical and Electronics Engineers, May 2019.
The definitive version is available at https://doi.org/10.1109/FG.2019.8756556
International Standard Book Number (ISBN)
Article - Conference proceedings
© 2023 Institute of Electrical and Electronics Engineers, All rights reserved.
01 May 2019