Abstract

Error-bounded lossy compression is a critical technique for significantly reducing scientific data volumes. With ever-emerging heterogeneous high-performance computing (HPC) architecture, GPU-accelerated error-bounded compressors (such as CUSZ and cuZFP) have been developed. However, they suffer from either low performance or low compression ratios. To this end, we propose CUSZ+ to target both high compression ratios and throughputs. We identify that data sparsity and data smoothness are key factors for high compression throughputs. Our key contributions in this work are fourfold: (1) We propose an efficient compression workflow to adaptively perform run-length encoding and/or variable-length encoding. (2) We derive Lorenzo reconstruction in decompression as multidimensional partial-sum computation and propose a fine-grained Lorenzo reconstruction algorithm for GPU architectures. (3) We carefully optimize each of CUSZ kernels by leveraging state-of-the-art CUDA parallel primitives. (4) We evaluate CUSZ+ using seven real-world HPC application datasets on V100 and A100 GPUs. Experiments show CUSZ+ improves the compression throughputs and ratios by up to 18.4x and 5.3x, respectively, over CUSZ on the tested datasets.

Department(s)

Computer Science

Comments

National Science Foundation, Grant CCF-1619253

International Standard Book Number (ISBN)

978-172819666-4

International Standard Serial Number (ISSN)

1552-5244

Document Type

Article - Conference proceedings

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 2024 Institute of Electrical and Electronics Engineers, All rights reserved.

Publication Date

01 Jan 2021

Share

 
COinS