Algorithmic GPGPU Memory Optimization
The performance of General-Purpose computation on Graphics Processing Units (GPGPU) is heavily dependent on the memory access behavior. In this paper, we present an algorithmic methodology to semi-automatically find the best mapping of memory accesses present in serial loop nest to underlying data-parallel architectures based on a comprehensive static memory access pattern analysis. To that end we present a simple, yet powerful, mathematical model that captures all memory access pattern information present in serial data-parallel loop nests. We then show how this model is used in practice to select the most appropriate memory space for data and to search for an appropriate thread mapping and work group size from a large design space. Our experimental results are reported using the industry standard heterogeneous programming language, OpenCL, targeting the NVIDIA GT200 architecture. The full version of the paper can be found at .
B. Jang et al., "Algorithmic GPGPU Memory Optimization," Proceedings of the International SoC Design Conference (2013, Busan, South Korea), pp. 154-157, Institute of Electrical and Electronics Engineers (IEEE), Nov 2013.
The definitive version is available at https://doi.org/10.1109/ISOCC.2013.6863959
International SoC Design Conference: ISOCC (2013: Nov. 17-19, Busan, South Korea)
Electrical and Computer Engineering
Keywords and Phrases
Algorithms; Computer Graphics; Mapping; Mathematical Models; Parallel Architectures; Data Parallel; Data-Parallel Architectures; General-Purpose Computations; Graphics Processing Unit; Heterogeneous Programming; Industry Standards; Memory Access Patterns; Memory Optimization; Program Processors
International Standard Book Number (ISBN)
Article - Conference proceedings
© 2013 Institute of Electrical and Electronics Engineers (IEEE), All rights reserved.
01 Nov 2013