Speaker
Description
In HPC, a common method is decomposing large equation systems into a batched
problem of small equation systems. Such small systems are
Tridiagonal/pentadiagonal matrix systems frequently arise in
finite-difference methods for solving multi-dimensional PDEs in various
applications. Such systems are, for instance, present in computational fluid
dynamics (CFD) for flow solvers based on implicit high-order finite-difference
schemes, like the Alternating Direction Implicit (ADI) method. In this talk, I
will present our initial results of developing a scalable batch-pentadiagonal
solver library for large CPU and GPU clusters for ADI applications. We use the
hybrid Thomas-Jacobi and Thomas-PCR algorithms using the Thomas algorithm
locally to create a reduced system and a distributed solver on the reduced
system. We show the cost difference between the local communications of the
approximate solver (Jacobi) and the one-to-one communications with increasing
distance of the exact solver algorithm (PCR). We show the scaling behavior of
these algorithms on LUMI up to 1024 GPUs in one direction.