Speaker
Description
Many GPU accelerated applications rely on the cuFFT library for fast and efficient Fourier transform implementations, however for certain algorithms it can be a performance limiting factor due to its strictly host-side API. Library functions cannot be called from code running on the GPU, hence unnecessary kernel launches, and host-device communication can occur when custom operations are needed to be performed before and after the transform. One possible solution to this problem is NVIDIA's cuFFT Device Extensions library (cuFFTDx) that provides FFT implementations easily integrable into GPU kernel code. This talk aims to present the main use-cases, the internal workings, capabilities and potential drawbacks of cuFFTDx alongside the C++ metaprogramming techniques behind its API. The practical use of the library will be demonstrated with the custom implementation of Welch's modified periodogram algorithm for spectral density estimation, highlighting the key differences between the standard cuFFT and the cuFFTDx approaches. In conclusion, the performance comparison of the two implementations will be presented based on runtime measurements.