Abstract

Today's deep neural networks (DNNs) are becoming deeper and wider because of increasing demand on the analysis quality and more and more complex applications to resolve. The wide and deep DNNs, however, require large amounts of resources (such as memory, storage, and I/O), significantly restricting their utilization on resource-constrained platforms. Although some DNN simplification methods (such as weight quantization) have been proposed to address this issue, they suffer from either low compression ratios or high compression errors, which may introduce an expensive fine-tuning overhead (i.e., a costly retraining process for the target inference accuracy). In this paper, we propose DeepSZ: an accuracyloss expected neural network compression framework, which involves four key steps: network pruning, error bound assessment, optimization for error bound configuration, and compressed model generation, featuring a high compression ratio and low encoding time. The contribution is threefold. (1)We develop an adaptive approach to select the feasible error bounds for each layer. (2) We build a model to estimate the overall loss of inference accuracy based on the inference accuracy degradation caused by individual decompressed layers. (3) We develop an efficient optimization algorithm to determine the best-fit configuration of error bounds in order to maximize the compression ratio under the user-set inference accuracy constraint. Experiments show that DeepSZ can compress AlexNet and VGG-16 on the ImageNet dataset by a compression ratio of 46× and 116×, respectively, and compress LeNet-300-100 and LeNet-5 on the MNIST dataset by a compression ratio of 57× and 56×, respectively, with only up to 0.3% loss of inference accuracy. Compared with other state-of-the-art methods, DeepSZ can improve the compression ratio by up to 1.43×, the DNN encoding performance by up to 4.0× with four V100 GPUs, and the decoding performance by up to 6.2×.

Recommended Citation

S. Jin et al., "DeepSZ: A Novel Framework to Compress Deep Neural Networks by using Error-Bounded Lossy Compression," Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing (2019, Phoenix, AZ), pp. 159 - 170, Association for Computing Machinery (ACM), Jun 2019.

The definitive version is available at https://doi.org/10.1145/3307681.3326608

Meeting Name

28th International Symposium on High-Performance Parallel and Distributed Computing, HPDC '19 (2019: Jun. 22-29, Phoenix, AZ)

Department(s)

Computer Science

Comments

This research was supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC.

Keywords and Phrases

Deep Learning; Lossy Compression; Neural Networks; Performance

International Standard Book Number (ISBN)

978-145036670-0

Document Type

Article - Conference proceedings

Document Version

Final Version

File Type

text

Language(s)

English

Rights

Publication Date

17 Jun 2019

Download

Full Text Link

Included in

Computer Sciences Commons

COinS

See more details

Computer Science Faculty Research & Creative Works

DeepSZ: A Novel Framework to Compress Deep Neural Networks by using Error-Bounded Lossy Compression

Abstract

Recommended Citation

Meeting Name

Department(s)

Comments

Keywords and Phrases

International Standard Book Number (ISBN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations

Computer Science Faculty Research & Creative Works

DeepSZ: A Novel Framework to Compress Deep Neural Networks by using Error-Bounded Lossy Compression

Author

Abstract

Recommended Citation

Meeting Name

Department(s)

Comments

Keywords and Phrases

International Standard Book Number (ISBN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Share

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations