The Ohio State University
Code Optimization on GPUs.
Degree: PhD, Computer Science and Engineering, 2019, The Ohio State University
Graphic Processing Units (GPUs) have become popular in
the last decade due to their high memory bandwidth and powerful
computing capacity. Nevertheless, achieving high-performance on
GPUs is not trivial. It generally requires significant programming
expertise and understanding of details of low-level execution
mechanisms in GPUs. This dissertation introduces approaches for
optimizing regular and irregular applications. To optimize regular
applications, it introduces a novel approach to GPU kernel
optimization by identifying and alleviating bottleneck resources.
This approach, however, is not effective in irregular applications
because of data-dependent branches and memory accesses. Hence,
tailored approaches are developed for two popular domains of
irregular applications: graph algorithms and sparse matrix
primitives. Performance modeling for GPUs is carried out by
abstract kernel emulation along with latency/gap modeling of
resources. Sensitivity analysis with respect to resource
latency/gap parameters is used to predict the bottleneck resource
for a given kernel's execution. The utility of the bottleneck
analysis is demonstrated in two contexts: i) Enhancing the
OpenTuner auto-tuner with the new bottleneck-driven optimization
strategy. Effectiveness is demonstrated by experimental results on
all kernels from the Rodinia suite and GPU tensor contraction
kernels from the NWChem computational chemistry suite. ii) Manual
code optimization. Two case studies illustrate the use of a
bottleneck analysis to iteratively improve the performance of code
from state-of-the-art DSL code generators. However, the above
approach is ineffective for irregular applications such as graph
algorithms and sparse linear systems. Graph algorithms are used in
various applications, and high-level GPU graph processing
frameworks are an attractive alternative for achieving both high
productivity and high-performance. This dissertation develops an
approach to graph processing on GPUs that seeks to overcome some of
the performance limitations of existing frameworks. It uses
multiple data representations and execution strategies for dense-
versus sparse vertex frontiers, dependent on the fraction of active
graph vertices. Experimental results demonstrate performance
improvement over current state-of-the-art GPU graph processing
frameworks for many benchmark programs and data sets. Sparse matrix
primitves such as sparse matrix vector multiplication (SpMV),
sparse matrix multi-vector multiplication (SpMM), and Sampled
Dense-dense matrix multiplication (SDDMM), are key kernels for
scientific computing as well as data science and machine learning.
A large number of recent research studies have focused on various
GPU implementations of the SpMV kernel. But SpMM and SDDMM kernels
have received much less attention. This dissertation presents
in-depth analyses to contrast SpMV and SpMM, and develops new
sparse-matrix representations and computation approaches suited to
achieving high data-movement efficiency and effective GPU
parallelization of SpMM. It…
Advisors/Committee Members: Sadayappan, Ponnuswamy (Advisor).
Subjects/Keywords: Computer Science; GPU; performance; modeling; optimization; SpMV; SpMM; SDDMM; sparse matrix; graph processing; tiling; multicore; manycore; matrix multiplication; tensor; stencil; SIMD; data locality; CSR; parallel; load balance; shared memory; graph analytics
to Zotero / EndNote / Reference
APA (6th Edition):
Hong, C. (2019). Code Optimization on GPUs. (Doctoral Dissertation). The Ohio State University. Retrieved from http://rave.ohiolink.edu/etdc/view?acc_num=osu1557123832601533
Chicago Manual of Style (16th Edition):
Hong, Changwan. “Code Optimization on GPUs.” 2019. Doctoral Dissertation, The Ohio State University. Accessed November 17, 2019.
MLA Handbook (7th Edition):
Hong, Changwan. “Code Optimization on GPUs.” 2019. Web. 17 Nov 2019.
Hong C. Code Optimization on GPUs. [Internet] [Doctoral dissertation]. The Ohio State University; 2019. [cited 2019 Nov 17].
Available from: http://rave.ohiolink.edu/etdc/view?acc_num=osu1557123832601533.
Council of Science Editors:
Hong C. Code Optimization on GPUs. [Doctoral Dissertation]. The Ohio State University; 2019. Available from: http://rave.ohiolink.edu/etdc/view?acc_num=osu1557123832601533