Title
Wavesz: A Hardware-Algorithm Co-Design of Efficient Lossy Compression for Scientific Data
Abstract
Error-bounded lossy compression is critical to the success of extreme-scale scientific research because of ever-increasing volumes of data produced by today’s high-performance computing (HPC) applications. Not only can error-controlled lossy compressors significantly reduce the I/O and storage burden but they can retain high data fidelity for post analysis. Existing state-of-the-art lossy compressors, however, generally suffer from relatively low compression and decompression throughput (up to hundreds of megabytes per second on a single CPU core), which considerably restrict the adoption of lossy compression by many HPC applications especially those with a fairly high data production rate. In this paper, we propose a highly efficient lossy compression approach based on field programmable gate arrays (FPGAs) under the state-of-the-art lossy compression model SZ. Our contributions are fourfold. (1) We adopt a wavefront memory layout to alleviate the data dependency during the prediction for higher-dimensional predictors, such as the Lorenzo predictor. (2) We propose a co-design framework named waveSZ based on the wavefront memory layout and the characteristics of SZ algorithm and carefully implement it by using high-level synthesis. (3) We propose a hardware-algorithm co-optimization method to improve the performance. (4) We evaluate our proposed waveSZ on three real-world HPC simulation datasets from the Scientific Data Reduction Benchmarks and compare it with other state-of-the-art methods on both CPUs and FPGAs. Experiments show that our waveSZ can improve SZ’s compression throughput by 6.9× ∼ 8.7× over the production version running on a state-of-the-art CPU and improve the compression ratio and throughput by 2.1× and 5.8× on average, respectively, compared with the state-of-the-art FPGA design.
Recommended Citation
J. Tian et al., "Wavesz: A Hardware-Algorithm Co-Design of Efficient Lossy Compression for Scientific Data," Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2020: Feb. 22-26, San Diego, CA), pp. 74 - 88, Association for Computing Machinery (ACM), Feb 2020.
The definitive version is available at https://doi.org/10.1145/3332466.3374525
Meeting Name
25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP '25 (2020: Feb. 22-26, San Diego, CA)
Department(s)
Computer Science
Keywords and Phrases
Compression Ratio; FPGA; Lossy Compression; Scientific Data; Software-Hardware Co-Design; Throughput
International Standard Book Number (ISBN)
978-145036818-6
Document Type
Article - Conference proceedings
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2020 Association for Computing Machinery (ACM), All rights reserved.
Publication Date
26 Feb 2020
Comments
This research was supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC.