Real-Time Multi-Modal Human-Robot Collaboration using Gestures and Speech


As artificial intelligence and industrial automation are developing, human-robot collaboration (HRC) with advanced interaction capabilities has become an increasingly significant area of research. In this paper, we design and develop a real-time, multi-model HRC system using speech and gestures. A set of 16 dynamic gestures is designed for communication from a human to an industrial robot. A data set of dynamic gestures is designed and constructed, and it will be shared with the community. A convolutional neural network is developed to recognize the dynamic gestures in real time using the motion history image and deep learning methods. An improved open-source speech recognizer is used for real-time speech recognition of the human worker. An integration strategy is proposed to integrate the gesture and speech recognition results, and a software interface is designed for system visualization. A multi-threading architecture is constructed for simultaneously operating multiple tasks, including gesture and speech data collection and recognition, data integration, robot control, and software interface operation. The various methods and algorithms are integrated to develop the HRC system, with a platform constructed to demonstrate the system performance. The experimental results validate the feasibility and effectiveness of the proposed algorithms and the HRC system.


Mechanical and Aerospace Engineering

Second Department

Computer Science


National Science Foundation, Grant CMMI-1646162

Keywords and Phrases

gesture recognition; human-robot collaboration; multi-modal; multiple threads; real-time; speech recognition

International Standard Serial Number (ISSN)

1528-8935; 1087-1357

Document Type

Article - Journal

Document Version


File Type





© 2023 American Society of Mechanical Engineers, All rights reserved.

Publication Date

01 Oct 2022