GPU Day 2026

Name: GPU Day 2026
Start: 2026-05-28T08:20:00+02:00
End: 2026-05-29T23:50:00+02:00
Location: HUN-REN Centre

28 May 2026, 08:20 → 29 May 2026, 23:50 Europe/Budapest

HUN-REN Centre

1054 Budapest Alkotmány utca 29.

Description

16th GPU Day - Massive parallel computing for science and industrial application

The 16th GPU Day will organized by the Wigner Scientific Computation Laboratory, HUN-REN Wigner RCP from May 28-29, 2026 at the HUN-REN Centre (1054 Budapest, Alkotmány utca 29).

The GPU Day is an annually organized international conference series dedicated to massively parallel technologies in scientific and industrial applications for a decade. Its goal is to bring together from academia, developers from industry, and interested students to exchange experiences and learn about future massively parallel technologies. The event provides a unique opportunity to exchange knowledge and expertise on GPU (Graphical Processing Unit), FPGA (Field-Programmable Gate Array), and quantum simulations. As in previous years, we are expecting 80+ participants in person.

For the earlier events see: 2025, 2024, 2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014

Participation is free for students, members of academic institutions, research centers, and universities.

The registration fee is 300 EUR or 120 000 HUF (including VAT) for non-academic participants - payment is via bank transfer.

Sponsorship of the event is welcome - please contact the organizers for the opportunities and for further information.

The conference will be held in offline form and there are limited places to participate personally on-site. We encourage our former, current and future partners to contribute on the conference. Contributions to the conference are welcome.

Keynote speakers

Roland Jakab (Hungarian Research Network)
Örs Legeza (HUN-REN Wigner Research Centre for Physics)

Important deadlines

Talk submission deadline: 2026 May 22

Sponsors

More information is available on gpuday.com

Organizers:
Gergely Gábor Barnaföldi
Gábor Bíró
Balázs Kacskovics

Szabolcs Molnár

Support

biro.gabor@wigner.hun-ren.hu

barnafoldi.gergely@wigner.hun-ren.hu

kacskovics.balazs@wigner.hun-ren.hu

Registration

GPU Day 2026 Registration

Participants

81 View full list

Thursday 28 May
- Thu 28 May
- Fri 29 May
- Opening: Opening Talk and Welcome by the Director
  
  Convener: Gergely Barnafoldi (HUN-REN Wigner Reseach Centre for Physics)
  - 1
    
    Opening
    
    Speakers: Dr Gergely Barnafoldi (HUN-REN Wigner Reseach Centre for Physics), Roland Jakab (HUN-REN)
    
    GGBarnafoldi_GPUDay_WSCLAB_2026.pdf
    
    GPU-Day_JR-presentation_0527.pdf
- Session I
  - 2
    
    Future computing: a journey from academic aspects towards industrial perspectives
    
    In light of emergent evolution in quantum technology, key industrial players allocate significant parts of their budget and resources to identify a real-world problem where quantum advantage, i.e. exponential increase in computational capacity, is expected to appear. The tense competition between quantum computing and simulations on classical hardware over the past decades has been further accelerated by the sever need to develop cost-effective technologies to manipulate things on the atomic and nano scale and control processes that could lead to new products to improve everybody’s life. Currently, Graphics Processing Units (GPUs) accelerated hardware provide unprecedented computational power leading to paradigmatic changes in various business cases. In this contribution, we will overview a class of quantum-inspired algorithms which increase the efficiency of classical computing algorithms without requiring any quantum hardware, and which at the same time are also capable to fully utilize state-of-the-art classical hardware available on high performance computing infrastructures. We will show-case recent results achieved in collaboration with NVIDIA, IBM and Google startup SandboxAQ also addressing important questions like scalability, energy demand and sustainability. Key aspects of transferring academic research to industrial world as part of productization will also be addressed. Finally, we argue that for such Blue Ocean Strategy the required technology is around us, thus raising related R&D to a professional stage would revolutionize future technologies in image recognition, data process, and AI among many others, and open entire new development directions in pharma industry, materials science and beyond.
    
    Speaker: Örs Legeza (Wigner FK)
    
    gpuday2026_v01.pdf
  - 3
    
    Evaluating the AdaptiveCpp Single-Pass (SSCP) SYCL compiler for GROMACS on Modern AMD Accelerators
    
    The SYCL specification allows for multiple implementation strategies, in particular SSCP (single-source, single compiler pass) and SMCP (single-source, multiple compiler passes). The default compiler of the AdaptiveCpp SYCL implementation is an SSCP JIT compiler, which has previously been shown to deliver substantial speedups for certain applications, while also reducing compilation times. However, systematic performance evaluations of that compiler have focused mostly on small or medium-sized applications. Additionally, to our knowledge, the impact of supporting both SSCP as well as SMCP compilers in large production code bases has not yet been thoroughly studied.
    
    In this work, we explore the applicability of the AdaptiveCpp JIT compiler to a highly-optimized, production code base: GROMACS – a widely used molecular dynamics software package that currently relies on SYCL and the AdaptiveCpp SMCP compiler to target AMD GPUs. We evaluate the ported application across a variety of input problems covering common simulation scenarios on MI210, MI300A, and MI300X AMD GPUs. We show that the SSCP JIT compiler outperforms the currently used SMCP AdaptiveCpp compiler in high-atom-count workload configurations by up to 10-25% and increases the peak simulation throughput of each tested GPU by up to 10%, measured in terms of simulated atoms per second. These findings confirm that the performance advantages of the SSCP JIT compiler also translate to production applications like GROMACS.
    
    Speaker: Bálint Soproni (StreamHPC)
    
    gpudays26_balint_soproni.pdf
  - 4
    
    Manual AMDGCN Assembly Analysis & Optimization
    
    The performance of a GPU kernel is influenced by many factors, with some easier to change than others. In some cases, however, the resulting performance is beholden to the compiler. In this presentation we will go over a set of kernel optimization techniques that go beyond profiling and reducing memory bottlenecks, but instead focus on the analysis of AMDGCN assembly, reducing register pressure to improve occupancy, and manually recover performance due to losses from compiler changes.
    
    Speaker: Nara Prasetya (StreamHPC)
    
    hip_asm_optimization_2026.pdf
- 10:40
  
  Coffee Break
- Session II
  - 5
    
    AI from a High-PerformanceComputing Perspective: CPUs,GPUs, and the Quantum Frontier
    
    Speaker: Attila Csaba Marosi (NTT DATA)
    
    NTT_DATA_Presentation_v3.pdf
    
    NTT_DATA_Presentation_v3.pptx
  - 6
    
    Nonequilibrium critrical behavior in brain simulations
    
    The critical brain hypothesis has been confirmed experimentally many times since the pioneering electrode experiments. Power law (PL) distributed neuronal avalanches were shown in neuronal recordings, in blood-oxygen-level-dependent signals,in voltage imaging, in calcium-imaging, in MEG and EEG recordings and in neuronal long-range temporal correlation among others. Whole brain simulations, based on the largest connectomes have been performed by various methods [1].
    Here I show the results of solving the Shinomoto-Kuramoto model on networks of the fruit-fly and two large human connectomes, using an efficient CUDA GPU solver, which provides numerical evidence of near critical scaling in modules of these brains [2]. In particular our Hurst and beta exponents agree with those of recent fMRI data [3].
    This provides an important implication for AI systems by optimizing information processing performance.
    Furthermore, I show how the asymmetry in the inter-neuronal connections drives the system away from equilibrium, by violating the fluctuation-dissipation relation [4].
    
    [1] Géza Ódor, Michael T. Gastner, Jeffrey Kelling and Gustavo Deco
    Modelling on the very large-scale connectome J. Phys. Complex. 2 (2021) 045002.
    [2] Géza Ódor, István Papp, Shengfeng Deng and Jeffrey Kelling
    Synchronization transitions on connectome graphs with external force.
    Front. Phys. 11 (2023) 1150246.
    [3] Ochab, et al. Task-dependent fractal patterns of information processing in working memory. Sci Rep 12, 17866 (2022).
    [4] Géza Ódor, Istvan Papp and Gustavo Deco, Fluctuation-dissipation of the Kuramoto model on fruit-fly connectomes, arXiv:2503.20708
    
    Speaker: Geza Odor (HUN-REN Centre for Energy Research)
    
    gpunap26.pdf
  - 7
    
    Prototypes of HVDC connections in a Kuramoto-like power-grid system
    
    Power grids are large-scale engineered systems that are indispensable to modern society, yet they remain inherently vulnerable to disturbances. Ongoing transitions in the energy sector—particularly the increasing penetration of renewable sources and inverter-based technologies—introduce new challenges, including reduced system inertia and faster propagation of fluctuations. As many emerging technologies rely on Direct Current (DC) components, understanding their impact on grid stability has become increasingly important.
    
    Recent studies have shown that even without explicitly considering HVDC mechanisms, large interregional power flows can destabilize transmission grids, potentially leading to segmentation and system-wide failures [1]. This is further underscored by the 2025 Iberian blackout, where failures in HVDC interconnections between France and Spain were identified as contributing factors and elements that were first overloaded [2].
    
    In our work, we aim to investigate the effect of including HVDC lines in a power grid system and extend the simple second-order Kuramoto Model to include a term that mimics inverter-based frequency control strategies. Here, in the new term, using different linear and non-linear activation functions, we can study their effects during the synchronisation process. Our dynamical simulations show that improvements in the phase synchronization stabilization, as well as in the cascade sizes, occur when we impose a threshold mechanism on our system. Though this comes at the expense of large relaxation times, meaning that node-level frequency and phase fluctuations will be damped out much slower.
    
    In this work, we investigate the role of HVDC lines in power grid stability using an extended second-order Kuramoto model. We introduce an additional term describing HVDC connections that captures inverter-based frequency control strategies, allowing for both linear and nonlinear activation functions that will ultimately set the power flow across the line. This framework enables a systematic study of how such simple control mechanisms and otherwise constant power flow$^1$ influence synchronization dynamics.
    
    To accurately capture large-scale network behavior, we perform extensive simulations on GPU-accelerated architectures (where applicable), allowing us to study the full AC network and directly compare it with DC-augmented configurations. Our results show that introducing threshold-based control mechanisms can significantly enhance phase synchronization stability and reduce cascade sizes. However, these benefits come at the cost of increased relaxation times, leading to slower damping of node-level frequency and phase fluctuations [3].
    
    References
    [1] María Martínez-Barbeito, Damià Gomila, Pere Colet, Julian Fritzsch, and Philippe Jacquod. “Transmission grid stability with large interregional power flows”. In: Phys. Rev. Res. 7 (1 2025), p. 013137. DOI: 10.1103/PhysRevResearch.7.013137. URL:https://link.aps.org/doi/10.1103/PhysRevResearch.7.013137.
    [2] ENTSOE. 28 April 2025 Blackout. 2026. URL: https://www.entsoe.eu/publications/blackout/28-april-2025-iberian-blackout/.
    [3] Kristóf Benedek and Géza Ódor. The effect of HVDC lines in power-grids via Kuramoto modelling. 2025. arXiv: 2512.24122 [physics.soc-ph]. URL: https://arxiv.org/abs/2512.24122.
    
    $^*$ Across HVDC lines, operators usually transmit a constant amount of power between dispatches, whereas through regular AC connections, we typically tie this amount to the phase angle differences.
    
    Speaker: Kristóf Benedek (Budapest University of Technology and Economics)
    
    GPUday26_short_benedek_kristof.pdf
- 12:30
  
  Lunch break
- Session III
  - 8
    
    CT Imaging with Helium and Carbon Ions
    
    The use of hadrons - such as protons, helium, and carbon ions—in radiotherapy requires highly precise Relative Stopping Power (RSP) maps of patient anatomy to minimize range uncertainties. Using the aforementioned hadrons for imaging before the treatment offers higher reconstruction quality and dosimetric advantage in comparison to conventional X-ray CT for this purpose. However, executing the simulations to generate data for the image reconstruction, the reconstruction of the RSP maps, investigating the intrinsic imaging capabilities and dose convergence of different particle species require immense computational resources.
    
    To overcome these computational bottlenecks, the entire data generation and processing pipeline of this study was executed utilizing the advanced high-performance GPU infrastructure of the Wigner Scientific Computing Laboratory (WSCLAB). We performed comprehensive GATE/Geant4 Monte Carlo simulations of a CTP404 phantom across $360^\circ$ projections for protons, helium, and carbon ions. The massively parallel architecture of the WSCLAB GPUs enabled the efficient generation of baseline high-statistics datasets and facilitated the iterative algorithm necessary to reconstruct the RSP maps.
    
    By leveraging this computing power, we successfully mapped the convergence thresholds required for stable RSP recovery. Our findings indicate that carbon-ion and helium-ion imaging can achieve RSP accuracy competitive with Dual-Energy CT, maintaining relative errors well below 1% for the vast majority of the materials. Furthermore, dosimetric scaling demonstrated that these high-fidelity maps can be obtained at competitive or significantly reduced absorbed doses compared to standard modalities.
    
    Speaker: Zsófia Jólesz
    
    CT Imaging with Helium and Carbon Ions - GPU Day 2026-3.pdf
  - 9
    
    FLORA
    
    FLORA: Flow-based Latent-informed Optimization for 3D proton-CT Reconstruction with Spatial Attention. A deep learning framework for conditioned image reconstruction developed for Proton Computed Tomography (pCT). The pipeline utilizes a Varriational Autoencoder-GAN approach to be able to learn biologically correct 3D CT reconstruction, while the latent Flow-matching enables us to condition in particle scattering.
    
    Speaker: Bence Dudás (Eötvös Loránd University)
    
    FLORA_GPU_DAY_05_29.pdf
    
    FLORA_GPU_DAY_05_29.pptx
  - 10
    
    REGINA: Regularized Encoder with Latent Cycle-GAN for In-vitro Neural Cell Perturbation Approximation
    
    Learning counterfactual representations for cellular perturbations is a fundamental challenge in representation learning, significantly hindered by the fundamentally unpaired nature of interventional data. Current state-of-the-art generative approaches (e.g., GEARS) circumvent this by relying heavily on domain-specific heuristics, such as masking the input space to a subset of highly variable features or injecting external knowledge graphs. These pre-processing steps inherently discard the global data manifold and mask subtle, rare distributional shifts. Here, we introduce REGINA, a fully data-driven framework that formulates perturbation modeling as an unpaired distribution matching problem. By coupling a Regularized Encoder with a Latent Cycle-GAN architecture, REGINA natively processes the complete, unmasked high-dimensional data vector. Our approach projects observations into a structured latent space where interventional shifts are simulated via conditional prompting, eliminating the need for paired ground truth or artificial dimensionality reduction. Empirical evaluations demonstrate that REGINA achieves competitive local target precision (Top-K Pearson correlation and directional error) against established baselines. More importantly, by retaining the full feature landscape, REGINA establishes superior performance in global distribution matching, yielding significant improvements in Wasserstein distance and Maximum Mean Discrepancy (MMD) while successfully preserving the signatures of rare subpopulations.
    
    Speaker: Regina Nora Fiam (Eötvös Loránd University)
    
    gpu day fiam regina -1.pdf
  - 11
    
    Deciphering the LILRB4 Immunosuppressive Landscape in Colorectal Tumors for De Novo Therapeutic Intervention
    
    In most types of cancer, immunosuppression limits an effective anti-cancer immune response. Leukocyte immunoglobulin-like receptor B4 (LILRB4) is an immune checkpoint inhibitor molecule that plays a role in various signaling processes contributing to tumor immune evasion. The aim of our research is to investigate this receptor and its family in colorectal cancer, with a particular focus on the interactions between immune cells expressing these receptors and the tumor microenvironment.
    
    Using single-cell sequencing data, we first confirmed that LILRB4 expression is highly localized within specific myeloid immune cell subsets. To ensure high-resolution identification, we employed a specialized transcriptomics language model utilizing a novel "gene sentence" approach for cell annotation. To understand how these identified populations are modulated within the tumor architecture, we analyzed the spatial overlap between LILRB4 and its putative ligands. By characterizing receptor-ligand engagement patterns and ranking interactions through specific metrics, we were able to pinpoint the most biologically relevant pairings within the colorectal microenvironment.
    Building on these identified interaction patterns, we sought to disrupt this inhibitory axis. We performed de novo protein design of a nanobody molecule optimized for high-affinity binding to the LILRB4 extracellular domain. By targeting the specific binding interface implicated in our spatial analysis, this nanobody is designed to competitively inhibit ligand engagement and reverse LILRB4-mediated immunosuppression.
    
    Identifying the expression patterns and spatial dynamics of these molecules provides a roadmap for understanding tumor-immune interactions and suggests that LILRB4-targeted nanobodies could serve as a potent tool in colorectal cancer immunotherapy.
    
    Speaker: Péter Hunyadi (Pázmány Péter University, Faculty of Information Technology and Bionics)
    
    GPU day 260528 LILRB4 final.pdf
    
    GPU day 260528 LILRB4 final.pptx
- 15:20
  
  Coffee break
- Session IV
  - 12
    
    Performance Analysis and Optimization of a Parallel GPU-accelerated Low-Temperature 2D Particle-in-Cell Plasma Simulation Code
    
    In this talk, we plan to report the results of our performance optimisation effort aimed at speeding up a GPU-accelerated 2D Particle-in-Cell plasma simulation code, following an international plasma simulation benchmarking effort of 19 leading plasma research groups. We studied the effects of memory management, data movements, the choice and implementation of the Poisson solver, the use of mixed precision computation and various kernel-level optimisation techniques. Finally, the optimised version was improved to support multi-GPU execution as well. The key steps of the optimisation, the performance critical factors and our decisions will be presented in detail in the talk. Compared to the original baseline version that executed the simulation in 10 days, the runtime was reduced to less than 10 hours on an NVIDIA A100 GPU. Taking into account the performance increase of each new generation of GPU cards (from P100 to A100) as well, the overall speedup achieved with the optimised version exceeds 30×. The multi-GPU implementation will further reduce the execution time; using 4 A100 GPUs, we expect the final runtime to be less than 3 hours (note: final steps of the implementation and testing are currently underway, exact timings will be presented at the conference), that will make our implementation the fastest Penning discharge simulation code available. This level of performance not only means that we can execute simulations within a few hours instead of several days or weeks on GPUs, but it also allows for tackling large and complex simulation problems that were previously thought to be impossible to execute.
    
    Speaker: Zoltan Juhasz (Pannon Egyetem)
    
    GPU_Day_2026_Juhasz.pdf
    
    GPU_Day_2026_Juhasz.pptx
  - 13
    
    Design and analysis of GPU-accelerated 2D Poisson solvers for plasma simulation
    
    Particle-in-Cell (PIC) simulation is an important tool in plasma science, where certain properties and behaviour can only be examined by simulations. Due to the large number of particles and simulation cycles, these simulations are extremely time-consuming and can be executed in acceptable time only with parallel implementations. A crucial step in the simulation is solving the Poisson equation to calculate the electric field that provides the basis of interactions among the particles. In this talk, we describe the design, implementation and performance optimisation of a multigrid iterative and a Fast Fourier Transform (FFT) based direct spectral solver for two-dimensional simulations. These algorithms aim to drastically improve the runtime performance of an already existing GPU-accelerated 2D PIC implementation. Compared with our original Discrete Fourier Transform based direct solver, the multigrid version achieved a near 10x speedup and the FFT version achieved 280x speedup for the frequently used grid size of 255×255 with similar results for other grid sizes up to 1023×1023. We present the performance analysis of the baseline implementation, the design details of our new implementations including the key decisions regarding algorithmic improvements and the used optimisation techniques for efficient use of the available GPU computational resources. In conclusion, the achieved runtime performance and numerical accuracy of the two implementations will be presented and compared against the baseline implementation.
    
    Speaker: Bálint Tóth (University of Pannonia)
    
    BalintToth_GPUDay_2026.pdf
  - 14
    
    Nuclear structure driven anisotropic flow coefficients in pO and OO collisions at the LHC energies
    
    One of the major open problems in the collider physics community is understanding the onset of quark–gluon plasma (QGP) signatures. Collisions of Oxygen nuclei provide a golden opportunity to probe the emergence of collective phenomena in collider experiments. Additionally, $^{16}\rm O$ nuclei are theorized to possess a clustered nuclear structure, where $\alpha$-particles occupy the corners of a regular tetrahedron. The anisotropic flow coefficients, which are key probes of collectivity, are sensitive to the geometry of the colliding nuclei. Therefore, this study focuses on anisotropic flow coefficients, namely elliptic flow ($v_{2}$) and triangular flow ($v_{3}$), and their fluctuations in OO and pO collisions using a hybrid framework (IP-Glasma + MUSIC + iSS + UrQMD).
    
    The hybrid framework incorporates multiple models to simulate the realistic space–time evolution of the collision system. However, components such as IP-Glasma and MUSIC are computationally intensive, requiring large CPU resources to solve the Yang-Mills dynamics and viscous hydrodynamic equations with appropriate initial and boundary conditions, motivating the use of CPU and GPU-accelerated workflows.
    
    We compare the anisotropic flow coefficients in pO and OO collisions, assuming a clustered nuclear structure, with those obtained using a Woods-Saxon nuclear distribution. The results suggest that fluctuation-related observables, i.e., $v_{2}$ fluctuations and $v_{3}$, are sensitive to the presence of clustered nuclear structure in OO collisions. Moreover, a characteristic peak in $v_{2}$ is observed, which scales with the parameters of the clustered nuclear structure. These effects are weaker in pO collisions due to the limited phase space available to translate initial spatial anisotropies into final-state flow observables.
    
    Speaker: Aswathy Menon K R (Indian Institute of Technology Indore)
    
    GPUDayWIGNER-AswathyMenon.pdf
  - 15
    
    Impact of nuclear geometry in symmetry plane correlations in OO and Ne–Ne collisions at the LHC
    
    The correlations among different symmetry planes, otherwise known as the symmetry plane correlations (SPCs), in heavy-ion collisions are driven by the corresponding participant plane correlations and are sensitive to the transport properties of the system formed. The participant plane correlations can vary with the fluctuating nuclear geometry and can therefore be influenced by the nuclear geometry of the collision species. These features of SPCs makes them one of the key probes to understand the impact of nuclear geometry in the final state of nuclear collisions.
    
    In this presentation, I study the symmetry plane correlations in OO and Ne–Ne collisions at $\sqrt{s_{\rm NN}}=5.36$ TeV using a multi-phase transport model. We use initial nuclear configurations from NLEFT and PGCM models. The study focuses in two specific SPCs, $\langle \cos[4(\psi_2 - \psi_4)]\rangle_{\rm GE}$ and $\langle \cos[6(\psi_3 - \psi_6)]\rangle_{\rm GE}$ as they are expected to be sensitive to the quadrupole and octupole deformations of $^{20}Ne$ and $^{16}$O nuclei, respectively. We observe a higher $\langle \cos[4(\psi_2 - \psi_4)]\rangle_{\rm GE}$ in Ne–Ne collisions than OO collisions, which hints at a higher quadrupole deformation of $^{20}Ne$ than $^{16}$O. In contrast, a higher $\langle \cos[6(\psi_3 - \psi_6)]\rangle_{\rm GE}$ in OO than Ne–Ne collisions can be an indication of higher octupole deformations in $^{16}$O nuclei due to the intrinsic tetrahedral geometry. We extend the study to tip-tip and body-body selected central collisions to ascertain these findings.
    
    Speaker: Suraj Prasad (HUN-REN Wigner Research Centre for Physics)
    
    Suraj_GPUDay2026.pdf
  - 16
    
    Towards Cost-Effective HEP Simulations Using GAN-Based Data Augmentation
    
    Modern high-energy physics analyses rely heavily on large-scale Monte Carlo (MC)
    simulations for machine-learning training, efficiency corrections, and systematic
    studies. For rare-signal workflows, obtaining sufficiently large reconstructed-level
    signal samples often require computationally expensive MC campaigns with large
    CPU and storage demands.
    This work explores the use of Generative Adversarial Networks (GANs) for
    reconstructed-level data augmentation in the ALICE experiment at CERN. The
    proposed approach learns the multi-dimensional distribution of reconstructed observables directly from MC and generates statistically consistent synthetic signal
    samples for downstream analysis workflows.
    The framework is validated through feature-distribution comparisons, correlation
    studies, Machine-learning-based classification, and signal extraction tests. The generated samples show good agreement with standard MC while significantly reducing
    the marginal cost of producing large reconstructed-level datasets. The method provides a complementary generative layer within the simulation-to-analysis workflow
    and demonstrates the potential of AI-driven augmentation for scalable MC-statistics
    production in future rare-signal analyses.
    
    Speaker: Anisa Khatun
    
    GPU-Day-2026_AK_V5.pdf
Friday 29 May
- Thu 28 May
- Fri 29 May
- Session V
  - 17
    
    Fermionic Born Machines: Classical training of quantum generative models based on Fermion Sampling
    
    Quantum generative learning is a promising application of quantum computers, but faces several trainability challenges, including the difficulty in experimental gradient estimations. For certain structured quantum generative models, however, expectation values of local observables can be efficiently computed on a classical computer, enabling fully classical training without quantum gradient evaluations. Although training is classically efficient, sampling from these circuits is still believed to be classically hard, so inference must be carried out on a quantum device, potentially yielding a computational advantage. In this work, we introduce Fermionic Born Machines as an example of such classically trainable quantum generative models. The model employs parameterized magic states and fermionic linear optical (FLO) transformations with learnable parameters. The training exploits a decomposition of the magic states into Gaussian operators, which permits efficient estimation of expectation values. Furthermore, the specific structure of the ansatz induces a loss landscape that exhibits favorable characteristics for optimization. The FLO circuits can be implemented, via fermion-to-qubit mappings, on qubit architectures to sample from the learned distribution during inference. Numerical experiments on systems up to 160 qubits demonstrate the effectiveness of our model and training framework.
    
    Speaker: Bence Bakó (Wigner RCP)
    
    bb_fbm_training.pdf
  - 18
    
    Generative modeling with Gaussian Boson Sampling: classically trainable Bosonic Born Machines
    
    Quantum generative modeling has emerged as a promising application of quantum computers, aiming to model complex probability distributions beyond the reach of classical methods. In practice, however, training such models often requires costly gradient estimation performed directly on the quantum hardware. Crucially, for certain structured quantum circuits, expectation values of local observables can be efficiently evaluated on a classical computer, enabling classical training without calls to the quantum hardware in the optimization loop. In these models, sampling from the resulting circuits can still be classically hard, so inference must be performed on a quantum device, yielding a potential computational advantage. In this work, we introduce a photonic quantum generative model built on parametrized Gaussian Boson Sampling circuits. The training is based on the efficient classical evaluation of expectation values enabled by the Gaussian structure of the state, allowing scalable optimization of the model parameters through the maximum mean discrepancy loss function. We demonstrate the effectiveness of the approach through numerical experiments on photonic systems with up to 805 modes and over a million trainable parameters, highlighting its scalability and suitability for near-term photonic quantum devices.
    
    Speaker: Zoltán Kolarovszki
    
    GBBM_presentation__GPU_day_2026_.pdf
  - 19
    
    Search-Driven Quantum Circuit Decomposition
    
    Quantum circuit decomposition under restricted hardware connectivity is fundamentally a search problem: the compiler must choose useful qubit partitions, map them to a target topology, and synthesize high-quality local decompositions without exploding routing cost. This talk presents two complementary advances for connectivity-aware quantum compilation. First, we introduce an all-partitions framework based on convex-set enumeration, followed by exact set cover solving to choose globally effective partitions. In practice, this turns partition selection into a solver-friendly optimization problem that scales far better than naive enumeration and substantially improves all-to-all decomposition, topology-aware compilation, and SeqPAM routing in BQSKit. Second, we present an OSR-guided graph-search method for small-block decomposition. Starting from the current best circuit, the search can insert CNOTs at any beneficial location, uses optimistic large-step updates when improvements appear, and combines Operator Schmidt Rank with a surplus-weighted tail term, $\kappa$, to break plateaus that rank alone cannot resolve. A minimum-CNOT certification step based on exhaustive search remains practical for $3$--$5$ qubit partitions and provides a strong local optimization primitive. Across benchmark circuits, these ideas reduce CNOT counts relative to existing BQSKit-based flows while also suggesting a natural path toward parallel and accelerator-friendly implementations.
    
    Speaker: Gregory Morse (Eötvös Loránd University and Wigner RCP)
    
    optimisticosr.pdf
    
    Sequential Quantum Gate Decomposer (SQUANDER)
  - 20
    
    Toward Autonomous Scientific Instrumentation: Real-Time AI Denoising and Control on FPGA–Groq Platforms
    
    This project develops a hardware-accelerated, low-latency inference framework for real-time denoising and signal reconstruction in high-throughput, noise-limited measurement systems. While motivated by X-ray Free Electron Laser (XFEL) imaging, the proposed approach is designed to be broadly applicable to a wide range of data-intensive scientific and industrial domains, including plasma diagnostics and control in fusion reactors, nonlinear optical systems, and high-field experimental platforms.
    
    At its core, the project implements a foundation-model-based denoising algorithm using a hardware–algorithm co-design strategy on Groq Language Processing Units (LPUs), enabling deterministic, high-throughput inference. To support real-time operation, the Groq hardware is tightly coupled with an FPGA-based front-end that ensures deterministic, continuous data streaming from sensor systems to the inference engine. This architecture minimizes data transfer overhead and enables fully pipelined processing under demanding acquisition rates.
    
    The main technical objectives include: (i) low-level optimization of neural network primitives—such as convolutions, normalization, and activation functions—tailored to the Groq execution model; (ii) development of a high-throughput streaming interface between FPGA-based data acquisition systems and Groq accelerators; (iii) integration of the denoising engine into closed-loop control frameworks, enabling real-time feedback and adaptive system steering; and (iv) systematic benchmarking of latency, energy efficiency, and reconstruction accuracy against conventional CPU and GPU implementations.
    
    The resulting prototype will demonstrate a generalizable architecture for embedding advanced machine learning inference directly into experimental and operational pipelines. In XFEL environments, this enables real-time image reconstruction and adaptive experiment control, while in fusion research it supports plasma state estimation and feedback-driven stabilization. More broadly, the approach establishes a scalable pathway toward autonomous, AI-enhanced instrumentation capable of self-optimization across diverse high-performance sensing and control applications.
    
    Speaker: Peter Rakyta (Department of Physics of Complex Systems, Eötvös Loránd University)
    
    rakyta_talk.pdf
- 10:20
  
  Coffee Break
- Session VI
  - 21
    
    Toolbox for HEP and sustainability of HEP event generators
    
    Tuning a Monte Carlo Event Generator requires many tools and a lot of en-
    ergy. For demonstrating what can be achieved with a toolbox—that contains all
    of the required packages and dependencies to work out-of-the-box—HIJING++
    is a perfect candidate for tuning. The tools—by default—can only be installed
    separately and maintaining them one-by-one is a tedious task. The toolbox in
    question is a docker image that even contains the benchmarking scripts for eval-
    uating the sustainability of the host PC’s CPU for tuning and running MC event
    generators. We show the evaluation of multiple CPUs and some of the latest
    tuning results of HIJING++ in this work.
    
    Speaker: Szabolcs Molnár (HUN-REN Wigner RCP)
    
    GPUday_SzabolcsMolnar.pdf
  - 22
    
    Mixed Hamming-packings for benchmarking QUBO solvers
    
    In a recent work (Naszvadi, Adam and Koniorczyk, Mathematics 2025, 13(16), 2633) we have introduced an ILP model for solving the code-theoretic problem of finding the maximal cardinality of codes with a minimum codeword Hamming distance. Our method is not based on algebraic structure of the alphabets, it is suitable for decomposing bigger problem instances into equivalent smaller ones, and can be rewritten to a quadratic binary unconstrained optimization (QUBO) problem in a straightforward manner. Owing to the recent development in hardware and software QUBO heuristics and exact solvers, our aim was to find a set of useful problems which can be suitable as a benchmark in the meantime. Our problem is well-studied in code theory, the relevant bounds are known, and the instances are often hard even in the case of small problem sizes. It also gives room for comparison of ILP solvers' behavior with QUBO solvers. Here we present an
    analysis of our problem instances when solving with exact and heuristic QUBO solvers.
    
    Speaker: Péter Naszvadi (Wigner RCP)
    
    naszvadi_gpuday26.pdf
  - 23
    
    Quantum Circuit and Tensor Network Simulation of the Acoustic Wave Equation
    
    We present a cohesive framework for simulating seismic wave propagation utilizing quantum computing paradigms and their classical tensor network equivalents.
    We detail a quantum circuit-based formulation for the explicit finite-difference time-domain (FDTD) solution of the two-dimensional acoustic wave equation and
    map this quantum architecture onto a tensor train representation, namely for Matrix Product State (MPS).
    The MPS solver enables deterministic simulation of large-scale wavefield dynamics on classical high-performance computing systems.
    By circumventing the quantum Fourier transform (QFT) overhead through direct spatial basis encoding and arithmetic shift circuits, we establish a robust algorithm.
    
    Recognizing current near-term hardware limitations for deep Linear Combination of Unitaries (LCU) sequences, we formally map this quantum architecture onto a Tensor Train (Matrix Product State) representation, enabling deterministic emulation of large-scale wavefield dynamics on classical high-performance computing systems.
    
    Speaker: Prof. Gabor Vattay (Eötvös Loránd University)
    
    GPUday.pdf
  - 24
    
    Cognitive algebra: contexts, concepts and knowledge space
    
    We introduce a minimal structural framework for cognitive representations based on the notion of context as a partition of the world state space. The framework treats invariance recognition and representative selection as fundamental operations and realizes them through a coupled Concept Graph and Procedure Graph. Together, these define a minimal cognitive algebra for constructing and transforming representations independently of sophisticated inference or learning mechanisms.ms.
    
    Speaker: Antal Jakovac (Wigner RCP, Department of Computational Sciences)
    
    GPUdays2026.pdf
  - 25
    
    Macroscopic quantum simulation of high-order harmonic generation in a gas jet, using upgraded 1d atomic model potentials
    
    We numerically investigate high-order harmonic generation (HHG) in a noble gas jet or cell with a supercomputer code [1] which computes the single atom response based on the one-dimensional (1D) time-dependent Schrödinger equation (TDSE) and couples it to the macroscopic propagation of the electromagnetic radiation, thus it enables to compare 1D TDSE-based HHG simulations with experimental results. The corresponding 1D atomic model potential is an important ingredient in this procedure. We defined upgraded 1D atomic model potentials earlier and showed that the agreement of the resulting single atom response with the 3D TDSE results is considerably improved [2, 3].
    
    In this contribution, we show that the upgraded 1D atomic model potentials provide better agreement between the simulated and measured HHG spectra in usual experimental scenarios, e.g. at the ELI user facility.
    
    Funding acknowledgement
    Krisztina Sallai was supported by the UNKP-23-3 New National Excellence Program of the Ministry of Human Capacities of Hungary. The ELI ALPS project (GINOP-2.3.6-15-2015-00001) is supported by the European Union and co-financed by the European Regional Development Fund. We acknowledge the Digital Government Development and Project Management Ltd. for awarding us access to the Komondor HPC facility based in Hungary.
    
    References
    [1] J. Vábek, T. Němec, S. Skupin, and F. Catoire, arXiv:2507.04115v1
    [2] Sz. Majorosi, M. G. Benedict, and A. Czirják, Phys. Rev. A, 98, 023401 (2018).
    [3] K. Sallai, Sz. Hack, Sz. Majorosi, and A. Czirják, Phys. Rev. A, 110, 063117 (2024).
    
    Speaker: Attila Czirják (ELI-ALPS, and University of Szeged)
    
    GPU-Day_Czirjak.pdf
- 12:30
  
  Lunch break
- Session VII
  - 26
    
    Cosmological simulations in pathological geometries and how to make them
    
    Cosmological N-body simulations are fundamental tools for studying the non-linear evolution of large-scale structure, yet the vast majority adopt periodic cubic ($\mathbb{T}^3$) boundary conditions. This choice breaks rotational invariance, prevents angular momentum conservation, and introduces artificial correlations at scales comparable to the box size. The StePS simulation framework and its companion initial condition generator stepsic overcome these limitations by compactifying open ($\mathbb{R}^3$) and cylindrical ($\mathbb{S}^1 \times \mathbb{R}^2$) domains via stereographic projection, providing an end-to-end pipeline for cosmological simulations that preserve the relevant continuous symmetries while maintaining radially varying mass resolution through a natural zoom-in configuration. The pipeline supports Lagrangian perturbation theory up to second order across spherical, cylindrical, and anisotropic slab geometries — geometries that no other cosmological N-body pipeline supports. The first $\mathbb{S}^1 \times \mathbb{R}^2$ $\Lambda$CDM simulation demonstrates that the cylindrical topology faithfully reproduces both linear and non-linear structure formation. This opens a pathway toward self-consistent studies of filamentary environments and anisotropic cosmological models in geometries that respect their intrinsic symmetries.
    
    Speaker: Balázs Pál (Wigner Research Centre for Physics)
    
    pal-balazs-gpu-day-2026.pdf
  - 27
    
    Modelling neutron stars in scalar tensor theories
    
    Scalar-tensor theories of gravity with a dynamical scalar field coupling non-minimally to matter via a conformal factor $A(\phi)$ pose computational challenges beyond standard general relativistic solvers. We present a fully numerical Python framework for constructing slowly rotating neutron star solutions in the massive scalar-tensor theory defined by the Einstein-frame coupling $\alpha(\phi) = \beta\phi$ with a dilaton potential, implemented within the Hartle slow-rotation expansion at first-order. The interior and exterior field equations for the metric potentials, scalar field, fluid pressure, and rotational drag function are cast as a coupled seven-dimensional first-order ODE system and integrated using an LSODA adaptive solver, which switches automatically between stiff and non-stiff regimes to handle the steep gradients introduced by the Yukawa-type scalar mass term. Surface-matching between domains is achieved through a Nelder-Mead shooting strategy that enforces asymptotic boundary conditions at spatial infinity, navigating the coexisting scalarized and general relativistic solution branches at fixed central energy density, with the APR equation of state interpolated via PCHIP in geometrized units throughout.
    
    For each converged solution, the framework extracts the gravitational mass from the asymptotic metric gradient, the physical Jordan-frame radius via conformal rescaling, and the moment of inertia from the asymptotic behaviour of the rotational drag ODE. This yields complete mass-radius and moment-of-inertia–mass relations across the parameter space of coupling constant and scalar field mass. Our results demonstrate that the scalar mass plays a decisive role in shaping neutron star structure, with both relations deviating significantly from general relativity, establishing this framework as a reliable tool for probing the strong-field phenomenology of massive scalar-tensor gravity.
    
    Speaker: Ms Ashika Achuthankutty (University of Szeged)
    
    STT_GPU_Ashika.pdf
  - 28
    
    Long-term decay-rate measurements in the Jánossy Underground Research Laboratory
    
    Speaker: Franciska Sprok (HUN-REN Wigner FK)
    
    GPUday2026_presentation.pdf
  - 29
    
    Microscopic structure of spacetime from neutron star oscillations
    
    In addition to the well-known one time and three infinite spatial dimensions of spacetime, many theories also consider the existence of extra dimensions. The Kaluza-Klein model expands spacetime with one spatial dimension, which is curled up in a microscopic circle, thus affecting particles on quantum mechanical scales. In neutron stars, where matter is subject to extremely strong gravitational effects, gravity could become comparable in strength to the other fundamental interactions and provide corrections to the equation of state. Thus low energy effects of quantum gravity could be present.
    
    The effect of a strong gravitational field on massive particles is studied, indeed, corrections to the uncertainty and the dispersion relations are considered. Understanding microscopical effects induced by the structure of spacetime is crucial to build consistent models of macroscopic phenomena, which can be compared to astronomical data e.g. through neutron star oscillations.
    
    A. Horváth, A. Wojnar, G.G. Barnaföldi: “The effects of strong gravity on the dispersion relation of massive particles in the Kaluza–Klein theory”, https://arxiv.org/abs/2510.16631
    
    Speaker: Anna Horváth
    
    AHorvath_GPU_2026.pdf
  - 30
    
    Machine Learning for Molecular Density Matrix Estimation: Achieving SAD-Like Performance with Minimal Model Complexity
    
    Self-consistent field (SCF) calculations remain the computational bottleneck in quantum chemistry workflows. The initial density matrix guess significantly impacts convergence speed, with the Superposition of Atomic Densities (SAD) being the de facto standard. We present a machine learning approach that achieves comparable performance to SAD using a remarkably compact model predicting only diagonal blocks of density matrices.
    
    Our transformer-based architecture employs 3.2 million parameters to predict atomic-block diagonal elements of density matrices for arbitrary molecular systems. The model combines rotary position-encoded attention with physics-informed multipole interactions and is trained on ~4.9 million quantum chemistry calculations spanning diverse molecular geometries (1-200 atoms) and electronic structure methods (HF, B3LYP, ωB97X-D). The architecture is molecule-independent and handles systems up to typical quantum chemistry scales (~300 atoms).
    
    We benchmark performance on a ~900 molecule subset of the GMTKN55 database, filtered to contain only elements H-Ar (periods 1-3). On average, SAD achieves 0.22 fewer SCF iterations than our model, demonstrating that the ML approach reaches near-parity with the established standard despite predicting only diagonal blocks. Notably, our model exhibits qualitatively different physical behavior: while SAD systematically underestimates electronic energy, our predictions overshoot the converged density, suggesting a more physically motivated initial electronic structure.
    
    These results demonstrate that a remarkably small neural network can match decades-optimized classical methods for SCF initialization, opening pathways for more sophisticated architectures to exceed SAD performance while maintaining computational efficiency for routine quantum chemistry applications.
    
    Speakers: Andras Horvath (Pázmány Péter Catholic University - Faculty of Information Technology and Bionics), Gábor János Tornai (StreamNovation Ltd.)
    
    presentation.pdf
- 15:40
  
  Coffee Break

Choose timezone

GPU Day 2026

HUN-REN Centre