American Sign Language Alphabet Recognition using Convolutional Neural Networks with Multiview Augmentation and Inference Fusion


American Sign Language (ASL) alphabet recognition by computer vision is a challenging task due to the complexity in ASL signs, high interclass similarities, large intraclass variations, and constant occlusions. This paper describes a method for ASL alphabet recognition using Convolutional Neural Networks (CNN) with multiview augmentation and inference fusion, from depth images captured by Microsoft Kinect. Our approach augments the original data by generating more perspective views, which makes the training more effective and reduces the potential overfitting. During the inference step, our approach comprehends information from multiple views for the final prediction to address the confusing cases caused by orientational variations and partial occlusions. On two public benchmark datasets, our method outperforms the state-of-the-arts.


Mechanical and Aerospace Engineering

Second Department

Computer Science

Research Center/Lab(s)

Intelligent Systems Center


This research work was supported by the National Science Foundation, United States grant CMMI-1646162 on cyber–physical systems and also by the Intelligent Systems Center at Missouri University of Science and Technology, United States.

Keywords and Phrases

American Sign Language; Convolutional Neural Networks (CNN); Data augmentation; Fusion

International Standard Serial Number (ISSN)


Document Type

Article - Journal

Document Version


File Type





© 2018 Elsevier Ltd, All rights reserved.

Publication Date

01 Nov 2018