![]() |
Informatik-Vollmer | |
The goal of this dissertation was to develop methods to improve the runtime of parallel programs using compiler optimizations. To do so, 1. the theoretical data-flow analysis background for parallel programs was derived; 2. based on this, practical and efficient algorithms were developed to analyse parallel programs; and 3. it was shown how well known optimizing algorithms can be implemented in a compiler of parallel programming languages.
The three main results of the work were:
- the data-flow analysis theory for parallel programs (pDFA)
- data structures (pCFG, pSSA) and algorithms to perform the data-flow analysis
- implementation of two very effective optimization techniques for parallel programs: elimination of common subexpressions and lazy code motion.
We consider parallel programs with shared memory and interleaving semantics, for which we show how to construct for unidirectional bitvector problems optimal analysis algorithms that are as efficient as their purely sequential counterparts and can easily be implemented. Whereas the complexity result is rather obvious, our optimality result is a consequence of a new Kam/Ullman-style Coincidence Theorem. Thus, using our method, the standard algorithms for sequential programs computing liveness, availability, very busyness, reaching definitions, definition-use chains, or analyses for performing code motion, assignment motion, partial dead-code elimination or strength reduction can easily be transferred to the parallel setting at almost no cost.
Data flow analysis is the prerequisite for performing optimizations such as code motion of partially redundant expressions on imperative sequential programs. To apply these transformations to parallel imperative programs, the notion of data flow must be extended to concurrent programs. The additional parallel source-language features are: shared memory and nested parallel statements (PAR). The underlying interleaving semantics of the concurrently executed processes result in the so-called state space explosion which, on a first view, prevents the computation of the meet over all path solution needed for data flow analysis. For the class of bitvector data flow problems we can show that for the computation of the meet over all path solution, not all interleavings are needed. Based on this, we can give simple data flow equations representing the data flow effects of the PAR statement. The definition of a parallel control flow graph leads to the efficient extension of Killdal's algorithm to compute the data flow of a concurrent program. The time complexity is the same as for analysing a "comparable" sequential program.
Code motion is well-known as a powerful technique for the optimization of sequential programs. It improves the runtime efficiency by avoiding unnecessary recomputations of values, and it is even possible to obtain computationally optimal results, i.e., results where no program path can be improved any further by means of semantics preserving code motion. In this paper we present a code motion algorithm that for the first time achieves this optimality result for parallel programs. Fundamental is the framework of [Knoop, Steffen, Vollmer: Parallelism for Free: Efficient and Optimal Bitvector Analyses for Parallel Programs, TOPLAS, May 1996] showing how to perform optimal bitvector analyses for parallel programs as easily and as efficiently as for sequential ones. Moreover, the analyses can easily be adapted from their sequential counterparts. This is demonstrated here by constructing a computationally optimal code motion algorithm for parallel programs by systematically extending its counterpart for sequential programs, the busy code motion transformation of [Knoop, Rüthing, Steffen: Lazy Code Motion,PLDI 1992; Code Motion: Theory and Practice, TOPLAS 16(4), 1994]
In this paper we show how to construct optimal bitvector analysis algorithms for parallel programs with shared memory that are as efficient as their purely sequential counterparts, and which can easily be implemented. Whereas the complexity result is rather obvious, our optimality result is a consequence of a new Kam/Ullman-style Coincidence Theorem. Thus, using our method, the standard algorithms for sequential programs computing liveness, availability, very business, reaching definitions, definition-use chains, or performing partially redundant expression and assignment elimination, partial dead code elimination or strength reduction, can straightforward be transferred to the parallel setting at almost no cost.
Data flow analysis is the prerequisite of performing optimizations such as common subexpression eliminations or code motion of partially redundant expressions on imperative sequential programs. To apply these transformations to parallel imperative programs, the notion of data flow must be extended to concurrent programs. The additional source language features are: common address space (shared memory), nested parallel statements (PAR), or-parallelism, critical regions and message passing. The underlying interleaving semantics of the concurrently-executed processes result in the so-called state space explosion which, on a first view, prevents the computation of the meet over all path solution needed for data flow analysis. For the class of one-bit data flow problems (also known as bitvector problems) we can show that for the computation of the meet over all path solution, not all interleavings are needed. Based on that, we can give simple data flow equations representing the data flow effects of the PAR statement. The definition of a parallel control flow graph leads to an efficient extension of Killdal's algorithm to compute the data flow of a concurrent program. The time complexity is the same as for analyzing a "comparable" sequential program.
In this paper we show how to construct optimal bitvector analysis algorithms for parallel programs with shared memory that are as efficient as their purely sequential counterparts, and which can easily be implemented. Whereas the complexity result is rather obvious, our optimality result is a consequence of a new Kam/Ullman-style Coincidence Theorem. Thus, the important merits of sequential bitvector analyses survive the introduction of parallel statements.
Traditional data flow analysis methods are designed for sequential programs and may therefore fail when applied to control flow parallel imperative programs that share memory and are based on the MIMD computer model. Current approaches mostly use a copy-in/copy-out semantics, when entering/exiting a process, to model shared memory. To avoid this restriction, this paper extends the notion of program execution paths. Selecting some specific paths out of the set of all possible paths allows to give simple data flow equations which are proved to be equal to the meet over all path solution. Since these data flow equations are extensions of the sequential ones, they fit in very well with traditional optimization methods. An example shows that the code generator of a compiler as well as a reordering assembler needs this kind of data flow analysis to avoid unnecessary memory barrier instructions and produce correct instruction reorderings.