Multi-Modal Recognition of Worker Activity for Human-Centered Intelligent Manufacturing


This study aims at sensing and understanding the worker's activity in a human-centered intelligent manufacturing system. We propose a novel multi-modal approach for worker activity recognition by leveraging information from different sensors and in different modalities. Specifically, a smart armband and a visual camera are applied to capture Inertial Measurement Unit (IMU) signals and videos, respectively. For the IMU signals, we design two novel feature transform mechanisms, in both frequency and spatial domains, to assemble the captured IMU signals as images, which allow using convolutional neural networks to learn the most discriminative features. Along with the above two modalities, we propose two other modalities for the video data, i.e., at the video frame and video clip levels. Each of the four modalities returns a probability distribution on activity prediction. Then, these probability distributions are fused to output the worker activity classification result. A worker activity dataset is established, which at present contains 6 common activities in assembly tasks, i.e., grab a tool/part, hammer a nail, use a power-screwdriver, rest arms, turn a screwdriver, and use a wrench. The developed multi-modal approach is evaluated on this dataset and achieves recognition accuracies as high as 97% and 100% in the leave-one-out and half-half experiments, respectively.


Mechanical and Aerospace Engineering

Second Department

Computer Science

Research Center/Lab(s)

Intelligent Systems Center


National Science Foundation, Grant 1954548

Keywords and Phrases

Deep learning; Human-centered computing; Intelligent manufacturing; Multi-modal fusion; Worker activity recognition

International Standard Serial Number (ISSN)


Document Type

Article - Journal

Document Version


File Type





© 2020 International Federation of Automatic Control (IFAC) , All rights reserved.

Publication Date

01 Oct 2020