Speaker
Description
In this talk, I will give a technical overview of the GPU partition of the Komondor supercomputer (No.300 on the TOP500 list). First, some job statistics will be presented, then we will describe the system and node architecture in detail; the CPU and GPU architecture details, intra and inter-node interconnects, their key properties and performance implications. We will follow with the introduction of the software environment, the module architecture, its properties and configurations from multi-GPU development and execution. We will overview the different MPI implementations available in the system and their behaviour in multi-GPU programs. Finally, the GPU job scheduling mechanism used by the SLURM scheduler will be discussed, with examples of GPU job submission scripts. The key theme throughout the talk is computational performance and scalability, which will appear in the hardware, software and development sections of the talk.