30–31 May 2024
Wigner Datacenter - HUN-REN Wigner Research Centre for Physics
Europe/Budapest timezone

Heterogenous CPU-GPU differential equation solver

31 May 2024, 15:45
25m
Wigner Datacenter - HUN-REN Wigner Research Centre for Physics

Wigner Datacenter - HUN-REN Wigner Research Centre for Physics

HUN-REN Wigner RCP 1121 Budapest, Konkoly-Thege Miklós rd 29-33, Hungary

Speaker

Dániel Nagy (Budapest University of Technology and Economics)

Description

GPUs are increasingly common in scientific high-performance computing; however, their benefits are not uniform across all areas of scientific computing. In certain fields, such as in sonochemistry where delay differential equations can arise, large amounts of data must be accessed based on the current state of the simulation in an unaligned and uncoalesced manner. This usually hinders the applicability of GPUs. The goal is to leverage the advantages of both GPU and CPU architectures by overlapping parallel and serial computations, as well as memory copy and write instructions, in the solution of an ensemble of differential equations (ODEs). This approach could make certain computing tasks faster and more efficient.

The general idea of a heterogeneous CPU-GPU differential equation solver involves deploying four asynchronous CUDA streams and partitioning the workload into four equal parts, with each stream managing 1/4 of the workload. Each stream consists of four stages. Initially, data is transferred from the CPU to the GPU in the first stage, then a kernel is executed on the GPU in the second stage to compute a single Runge-Kutta step. Following this, data is transferred from the GPU to the CPU in the third stage, and finally, in the fourth stage serial calculations on the CPU are carried out. These serial calculations could involve predicting, accepting, or rejecting the time-step, or computing the delayed terms in a delay differential equation. Ideally, these four stages can be overlapped to optimize performance.

Parameter sensitivity studies were conducted on a first-order Bernoulli-type ODE and the Duffing equation (second order ODE) using both a homogeneous GPU solver and the heterogeneous CPU-GPU approach described earlier. GPU profiling was employed to show that an overlap of the 4 stages is possible. It demonstrates that, while overlapping CPU, GPU, and memory copy operations using CUDA streams is feasible, achieving an ideal overlap is only possible under specific conditions. For the simple problems investigated, the pure GPU approach proves to be the most effective solution. However, in the future, adaptive delay differential equation solvers may benefit from employing such a heterogeneous approach.

Primary author

Dániel Nagy (Budapest University of Technology and Economics)

Co-author

Dr Ferenc Hegedűs (Budapest University of Technology and Economics, Department of Hydrodynamic Systems)

Presentation materials