Electrical and Computer Engineering Faculty Research & Creative Works

Inference-Aware Convolutional Neural Network Pruning

Tejalal Choudhary
Vipul Mishra
Anurag Goswami
Jagannathan Sarangapani, Missouri University of Science and TechnologyFollow

Abstract

Deep neural networks (DNNs) have become an important tool in solving various problems in numerous disciplines. However, DNNs are also known for their high resource requirement, weight redundancy, and large-scale parameters. As a result, the use of DNNs is restricted for devices that lack the necessary resources required to execute, especially resource-constrained devices such as mobile phones, wearable devices, and other edge devices. In recent years, pruning has emerged as an essential technique to reduce insignificant parameters and accelerate the model performance. However, finding the optimal number of parameters that can be pruned without significantly affecting the model performance is a time-consuming, tedious task and require a lot of manual tuning. This paper represent pruning as an optimization problem with the goal of improving DNN run-time inference performance by pruning low impacting parameters (filters) and their corresponding feature maps. To do this, we present a Bayesian optimization-based method for automatically determining the appropriate number of filters for each convolutional layer. Also, we proposed an objective function incorporating distinct model performance and resource-specific constraints. The proposed method is applied to two different kinds of convolutional network architectures (i.e., VGG16 and deeper network ResNet34) on CIFAR10, CIFAR100, and ImageNet datasets. The large-scale ImageNet experimental findings showed that the floating-point operations of the ResNet34 and VGG16 could be reduced by 35.46 percent and 84.97 percent, respectively, with negligible loss of accuracy.

Recommended Citation

T. Choudhary et al., "Inference-Aware Convolutional Neural Network Pruning," Future Generation Computer Systems, vol. 135, pp. 44 - 56, Elsevier, Oct 2022.

The definitive version is available at https://doi.org/10.1016/j.future.2022.04.031

Department(s)

Electrical and Computer Engineering

Keywords and Phrases

Bayesian Optimization; Convolutional Neural Network; Efficient Inference; Filter Pruning; Model Compression and Acceleration; Resource-Constrained Devices

International Standard Serial Number (ISSN)

0167-739X

Document Type

Article - Journal

Document Version

Citation

File Type

text

Language(s)

English

Rights

Publication Date

01 Oct 2022

Link to Full Text

COinS

Electrical and Computer Engineering Faculty Research & Creative Works

Inference-Aware Convolutional Neural Network Pruning

Abstract

Recommended Citation

Department(s)

Keywords and Phrases

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Search

Browse

Faculty Gallery

Author Corner

Related Content

Useful Links

Article Locations

Electrical and Computer Engineering Faculty Research & Creative Works

Inference-Aware Convolutional Neural Network Pruning

Author

Abstract

Recommended Citation

Department(s)

Keywords and Phrases

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Share

Search

Browse

Faculty Gallery

Author Corner

Related Content

Useful Links

Article Locations