Speaker
Description
Small square grids presented in bit matrices representing occupied sites and some neighborhood definition, like von Neumann, Moore or hexagonal neigborhood, arise in various Monte Carlo simulations and connection games. High speed testing of a very large number of such grids for connection between the opposite edges of the grid under the given neighborhood is often reqired. Because of the relatively few number of registers in CPUs the bit matrices are usually stored in memory, reducing the speed. On the contrary, as CUDA has a large number of processors each with much more registers, it offers much higher theoretical speed if the memory bottleneck can be avoided.In our approach the bit matrices (e.g 32x32 bits) are fully stored and processed in registers, so use of shared, local or global memory is eliminated, except for initial loading of data and storing the results, thus a high speed of processing is achieved with all shader cores working while the thread divergence is kept low.