High-Performance Graph Processing
- Date in the past
- Wednesday, 24. April 2024, 13:00
- INF 368, room 531
- Matthias Hauck
Address
INF 368
Room 531 (5th floor)Organizer
Dekan
Event Type
Doctoral Examination
With the increasing importance of graphs and algorithm that use them in many new use cases and the increasing availability of data, the number of applications relying on a graph data model rises. Graphs are not only used in many sciences like biology, but also in the public and commercial sector to represent for example social, business or traffic networks. Applications use these graphs using queries that apply graph algorithms on them, while they are run on systems shared by multiple users.
For the performance this means that the overall system throughput is more important than the elapsed time of a single isolated graph query. The issue is here that parallelizing a graph algorithm efficiently is already complex due to the data dependent nature of many graph algorithm and the broad range of graph sizes and properties. Likewise, there has been a trend to more complex systems with an ever-increasing number of cores, with more complex cache and memory structures and technologies. In order to deal with this complexity, graph processing engines have been proposed that provide primitives for efficient parallel graph processing. Unfortunately, these systems typically focus on the execution of a single query at a time, so they do not try to improve the overall throughput during concurrent usage.
In this work, we investigate high performance throughput-oriented graph query processing on scale up systems. Therefore, we first investigate the behavior of a state-of-the-art graph processing engine using different graph sizes and algorithm when multiple instances of it are used concurrently on the same system. These experiments provide an insight in the potential, but also the issues of concurrent execution. Afterwards, we investigate parallel execution inside of a single query and in particular contention on atomic update operation as they are common in many graph algorithm. We propose here a simple buffering schema for atomic updates in order to reduce contention and analyze it in different scenarios that mimic common update pattern.
Finally, based on the previous investigations we propose a throughput-oriented runtime scheduler for graph queries that automatically controls the degree of parallelization to avoid inefficiencies. The underlying concept consist of multiple parts:
- sampling is used to determine graph properties,
- cost estimations and parallelization boundaries are derived from graph, algorithm, and system properties,
- suitable work packages are generated based on the cost and
- executed controlled by a runtime component that controls the parallelism.
We evaluate the proposed concept using different algorithms on synthetic and real-world data sets, with up to 16 concurrent sessions (queries). The results demonstrate a robust performance despite these various configurations that is always close to or even slightly ahead of manually optimized implementations.