High-Performance Solvers for Partial-Differential-Equations using a Code Generation Approach

  • Tuesday, 11. November 2025, 11:15
  • Mathematikon, Room 1/414
    • René Heß
  • Address

    INF205 Mathematikon, Room 1/414

  • Event Type

In this thesis, we leverage code generation to create high-performance matrix-free solvers and preconditioners for numerical simulations of partial differential equations (PDEs) employing discontinuous Galerkin (DG) methods.

The traditional way of solving PDEs after linearization involves assembling a linear system of equations and solving it using iterative or direct solvers.
This approach has a significant drawback:
The number of floating point operations performed per loaded byte is insufficient to maximize resource utilization on modern CPUs.
As a result, the CPU often idles while waiting for data, making the performance of solving the linear system of equations primarily constrained by the available memory bandwidth.
Matrix-free solvers overcome this limitation by not assembling the system matrix and doing the necessary computations on the fly.
This increases the computational intensity and enables the effective utilization of modern processors.

One key challenge of matrix-free methods is designing efficient and robust solvers and preconditioners.
This work focuses on DG methods and utilizes a matrix-free Krylov solver combined with a high-order matrix-free smoother and low-order algebraic multigrid subspace correction preconditioner.
This is a strong preconditioner suitable for solving complex problems with varying coefficients and avoids assembling the system matrix of the high-order DG space.
A high-performance implementation of this solver and preconditioner requires writing several PDE-dependant local integration kernels.

The main contribution of this work is using a code-generation approach to generate high-performance realizations of the necessary local integration kernels.
All integration kernels are generated from a single description of the PDE in a domain-specific language, making this approach applicable to various PDE problems.
We focus on high-performance implementations on structured grids exploiting the tensor product structure of the basis functions and quadrature points using a technique called sum-factorization and applying SIMD vectorization within the local integration kernels.
A detailed roofline performance analysis demonstrates that the approach can achieve a significant percentage of the CPU's peak performance and identifies further optimization opportunities.