Automated Partitioning of CUDA Kernels for Multi-GPU Systems

  • Date in the past
  • Wednesday, 4. September 2024, 13:00
  • INF 368, R.531
    • Lorenz Braun
  • Address

    INF 368
    R.531

  • Organizer

  • Event Type

This work shows the feasibility of automated partitioning of CUDA kernels for multi-GPU systems. The problem is approached by modeling the compute graph of selected applications. With the help of a simulator models are derived to predict performant partitionings. The problem of accurately predicting GPU kernel runtime is aided by a new compiler assisted method to profile GPU kernels. The GPU independent metrics of the profiler are used to develop a methodology for kernel runtime and power usage prediction. With the methodology four GPU benchmark suites are used to model time and power usage of GPU kernels on five different GPUs.