Improving I/O Performance for Exascale Applications through Online Data Layout Reorganization
Abstract
The applications being developed within the U.S. Exascale Computing Project (ECP) to run on imminent Exascale computers will generate scientific results with unprecedented fidelity and record turn-around time. Many of these codes are based on particle-mesh methods and use advanced algorithms, especially dynamic load-balancing and mesh-refinement, to achieve high performance on Exascale machines. Yet, as such algorithms improve parallel application efficiency, they raise new challenges for I/O logic due to their irregular and dynamic data distributions. Thus, while the enormous data rates of Exascale simulations already challenge existing file system write strategies, the need for efficient read and processing of generated data introduces additional constraints on the data layout strategies that can be used when writing data to secondary storage. We review these I/O challenges and introduce two online data layout reorganization approaches for achieving good tradeoffs between read and write performance. We demonstrate the benefits of using these two approaches for the ECP particle-in-cell simulation WarpX, which serves as a motif for a large class of important Exascale applications. We show that by understanding application I/O patterns and carefully designing data layouts we can increase read performance by more than 80 percent.
Recommended Citation
L. Wan et al., "Improving I/O Performance for Exascale Applications through Online Data Layout Reorganization," IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 4, pp. 878 - 890, Institute of Electrical and Electronics Engineers (IEEE), Apr 2022.
The definitive version is available at https://doi.org/10.1109/TPDS.2021.3100784
Department(s)
Computer Science
Research Center/Lab(s)
Intelligent Systems Center
Keywords and Phrases
Data Access Optimization; Data Layout; IO Performance; Parallel IO; WarpX
International Standard Serial Number (ISSN)
1558-2183; 1045-9219
Document Type
Article - Journal
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2021 Institute of Electrical and Electronics Engineers (IEEE), All rights reserved.
Publication Date
01 Apr 2022
Comments
This work was supported in part by the Exascale Computing Project under Grant 17-SC-20-SC, a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration, in part by the Center of Advanced Systems Understanding (CASUS), Germany's Federal Ministry of Education and Research (BMBF), and in part by the Saxon Ministry for Science, Culture and Tourism (SMWK), with tax funds on the basis of the budget approved by the Saxon State Parliament. This research used resources of the Oak Ridge Leadership Computing Facility, a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.