Speaker
Description
The SYCL specification allows for multiple implementation strategies, in particular SSCP (single-source, single compiler pass) and SMCP (single-source, multiple compiler passes). The default compiler of the AdaptiveCpp SYCL implementation is an SSCP JIT compiler, which has previously been shown to deliver substantial speedups for certain applications, while also reducing compilation times. However, systematic performance evaluations of that compiler have focused mostly on small or medium-sized applications. Additionally, to our knowledge, the impact of supporting both SSCP as well as SMCP compilers in large production code bases has not yet been thoroughly studied.
In this work, we explore the applicability of the AdaptiveCpp JIT compiler to a highly-optimized, production code base: GROMACS – a widely used molecular dynamics software package that currently relies on SYCL and the AdaptiveCpp SMCP compiler to target AMD GPUs. We evaluate the ported application across a variety of input problems covering common simulation scenarios on MI210, MI300A, and MI300X AMD GPUs. We show that the SSCP JIT compiler outperforms the currently used SMCP AdaptiveCpp compiler in high-atom-count workload configurations by up to 10-25% and increases the peak simulation throughput of each tested GPU by up to 10%, measured in terms of simulated atoms per second. These findings confirm that the performance advantages of the SSCP JIT compiler also translate to production applications like GROMACS.