Automatic Facial Action Units (AUs) detection is the recognition of the facial appearance changes caused by the contraction or relaxation of one or more related facial muscles. Compared to the sequence-based methods, a decreased performance is observed for the static image-based AU detection, due to the loss of temporal information. To solve this problem, we propose a novel method that implicitly learns temporal information from a single image for AU detection by adding a hidden optical-flow layer to concatenate two Convolutional Neural Networks (CNNs) models: optical-flow net (OF-Net) and AU detection net (AU-Net). The OF-Net is designed to estimate the facial appearance changes (optical flow) from a single input image through unsupervised learning. The AU-Net accepts the estimated optical-flow as input and predicts the AU occurrence. By training both OF-Net and AU-Net jointly, our model achieves better performance than training them separately, as the AU-Net provides semantic constraints for the optical-flow learning and helps generate more meaningful optical-flow. In return, the estimated optical-flow, which reflects facial appearance changes, benefits the AU-Net. Our proposed method has been evaluated on two benchmarks: BP4D and DISFA, and the experiments show significant performance improvement as compared to the state-of-the-art methods.


Computer Science


National Science Foundation, Grant CNS-1205664

International Standard Book Number (ISBN)


Document Type

Article - Conference proceedings

Document Version


File Type





© 2023 Institute of Electrical and Electronics Engineers, All rights reserved.

Publication Date

01 May 2019