Stencil Computation Auto-Tuning via Dataflow Graph Transformations

by Erik Lagercrantz

Institution: Uppsala University
Year: 2015
Keywords: Engineering and Technology; Teknik och teknologier
Record ID: 1364976
Full text PDF: http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-252358


Fine-tuning of optimizations such as loop tiling and array padding is typically required in order to achieve good performance in memory-intensive numerical applications on current computer architectures. Auto-tuning techniques can automate this traditionally laborious task and achieve good performance, but the techniques have so far been application-specific and difficult to adapt to new applications. This thesis investigates developing a framework and language supporting a more general purpose auto-tuner. A high level language capable of expressing programs that deal with multi-dimensional arrays was created to serve as a program representation for use with the auto-tuner. The language is based on the dataflow model where a program is represented as a directed graph describing the flow of data between operations, and uses a novel iteration model based on a hierarchy of flows leading to simple representations of certain loop tiling and parallelization transformations. The iteration construct proved useful by allowing composition of tiling and parallelization transformations, but may not have been a good choice since it made it difficult to validate transformation opportunities. The auto-tuner was evaluated by applying it to a simple stencil computation program. It was confirmed that memory access optimizations do matter for parallel stencil computations on mainstream multicore architectures, and that the impact of optimizations is architecture-dependent. Auto-tuning was shown to be a useful strategy for applying a range of parameterized optimizations. The presented auto-tuner lacks an algorithm for identifying valid optimization opportunities, and must be told what transformations to apply and where to apply them. Still, the use of a high-level dataflow language did simplify auto-tuning by removing the need for writing a program-specific code generator.