Speaker
Nara Prasetya
(StreamHPC)
Description
The performance of a GPU kernel is influenced by many factors, with some easier to change than others. In some cases, however, the resulting performance is beholden to the compiler. In this presentation we will go over a set of kernel optimization techniques that go beyond profiling and reducing memory bottlenecks, but instead focus on the analysis of AMDGCN assembly, reducing register pressure to improve occupancy, and manually recover performance due to losses from compiler changes.
Author
Nara Prasetya
(StreamHPC)