Structured Parallel Programming with Trees （木を用いた構造化並列プログラミング）

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Creative Repository of Electro-Communications Structured Parallel Programming with Trees （木を用いた構造化並列プログラミング） A dissertation by Sato, Shigeyuki （佐藤重幸） Submitted to the Department of Communication Engineering and Informatics in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in the subject of Engineering The University of Electro-Communications March 2015 Committee Professor Hideya Iwasaki Professor Rikio Onai Professor Tetsu Narumi Professor Masayuki Narumi Associate Professor Yasuichi Nakayama For refereeing this doctoral dissertation, entitled, Structured Parallel Programming with Trees. Copyright ⃝c 2015 Sato, Shigeyuki（佐藤重幸） All rights reserved. Abstract High-level abstractions for parallel programming are still immature. Computations on complicated data structures such as pointer structures are considered as irregular algorithms. General graph structures, which irregular algorithms generally deal with, are difficult to divide and conquer. Because the divide- and-conquer paradigm is essential for load balancing in parallel algorithms and a key to parallel programming, general graphs are reasonably difficult. However, trees lead to divide-and-conquer computations by definition and are sufficiently general and powerful as a tool of programming. We therefore deal with abstractions of tree-based computations. Our study has started from Matsuzaki’s work on tree skeletons. We have improved the usability of tree skeletons by enriching their implementation aspect. Specifically, we have dealt with two issues. We first have implemented the loose coupling between skeletons and data structures and developed a flexible tree skeleton library. We secondly have implemented a parallelizer that transforms sequential recursive functions in C into parallel programs that use tree skeletons implicitly. This parallelizer hides the complicated API of tree skeletons and makes programmers to use tree skeletons with no burden. Unfortunately, the practicality of tree skeletons, however, has not been improved. On the basis of the observations from the practice of tree skeletons, we deal with two application domains: program analysis and neighborhood computation. In the domain of program analysis, compilers treat input programs as control-flow graphs (CFGs) and perform analysis on CFGs. Program analysis is therefore difficult to divide and conquer. To resolve this problem, we have developed divide-and-conquer methods for program analysis in a syntax-directed manner on the basis of Rosen’s high-level approach. Specifically, we have dealt with data-flow analysis based on Tarjan’s formalization and value-graph construction based on a functional formalization. In the domain of neighborhood computations, a primary issue is locality. A naive parallel neighborhood computation without locality enhancement causes a lot of cache misses. The divide-and-conquer paradigm is known to be useful also for locality enhancement. We therefore have applied algebraic for- malizations and a tree-segmenting technique derived from tree skeletons to the locality enhancement of neighborhood computations. Quick Overview In Japanese 並列プログラミングでは高水準の抽象化が十分に確立しておらず，複雑な構造を用いる計算は，不規則アルゴリズムと一括りに扱われている．しかし，木構造を扱う計算は，自然な分割統治が定義できるために，系統的な抽象化を期待できる．そこで本研究は，木を用いる並列計算の抽象化を扱う．まず，松崎による木スケルトンの成果を出発点とし，実装面を充実させることで，その使いやすさを向上させた．しかし，実用性は十分ではなかったので，そこで得られた知見や培った技法を元に，2 つの問題領域に取り組んだ．1 つはプログラム解析領域において，構文主導型分割統治手法を開発した．もう 1 つは，空間上の近傍計算において，キャッシュ効率化手法を提案した． i Contents Abstract i Acknowledgments vii 1 Introduction 1 1.1 Background and Motivation . 1 1.2 Overview of This Dissertation . 2 1.3 Contributions and Organization of This Dissertation . 2 I Programming With Tree Skeletons 5 2 Tree Skeletons 7 2.1 What Is the Skeleton? . 7 2.2 Formalization with Data Structures . 8 2.3 Divide and Conquer on Segmented Trees . 10 2.3.1 Segmented Trees . 10 2.3.2 Properties of Segmented Trees . 11 2.4 Tree Contraction Operations . 11 2.5 Definitions of Tree Skeletons . 12 3 Interface Between Data Structures and Skeletons 17 3.1 Introduction . 17 3.2 Tree Skeleton Library . 18 3.3 Our Interface of Trees . 19 3.3.1 Iterators over a Tree . 19 3.3.2 Global Iterators and Local Iterators . 19 3.3.3 Tree Construction . 20 3.4 Our Implementation . 20 3.4.1 Template-based Implementation of Our Tree Interface . 21 3.4.2 Generic Implementation of Tree Skeletons . 23 3.4.3 Type Checking of Tree Skeletons . 23 3.5 Benefits of Our Design . 25 3.5.1 Specialization of Representation . 25 3.5.2 Multiple Views . 26 3.6 Preliminary Experiments . 27 3.6.1 Overhead of Our Interface . 27 3.6.2 Flexibility from Our Interface . 28 3.6.3 Avoidance of Data Restructuring . 28 3.7 Related Work . 29 3.8 Conclusion . 30 iii iv CONTENTS 4 Tree Skeleton Hiding 31 4.1 Introduction . 31 4.2 Deriving Operators from Multilinear Computation . 32 4.2.1 Multilinear Computation on Trees . 32 4.2.2 Auxiliary Operators in Multilinear Computation . 33 4.3 Proposed Parallelizer . 34 4.3.1 Fundamental Design . 34 4.3.2 Compiler Directives for Parallelization . 35 4.3.3 Algorithm Descriptions . 37 4.3.4 Implementation Overview . 38 4.4 Underlying Binary Tree Skeleton . 42 4.4.1 Definition . 42 4.4.2 Specialization to Multilinear Computations . 44 4.4.3 Library Implementation . 45 4.5 Preliminary Experiments on Parallel Execution . 46 4.6 Benefits of Hiding Tree Skeletons . 46 4.6.1 Entanglement of Operators for Tree Skeletons . 47 4.6.2 Trade-off between Generality and Simplicity . 47 4.6.3 Abstraction Layer between Specifications and Skeletons . 47 4.6.4 Implicitly Skeletal Programming on Trees . 48 4.7 Related Work . 48 4.8 Conclusion . 49 5 Limitations of Tree Skeletons 51 II Syntax-Directed Programming 53 6 Syntax-Directed Computation and Program Analysis 55 6.1 Motivation for Program Analysis . 55 6.2 High-Level Program Analysis . 55 7 Syntax-Directed Divide-and-Conquer Data-Flow Analysis 57 7.1 Introduction . 57 7.2 Formalization of Data-Flow Analysis . 58 7.3 Syntax-Directed Parallel DFA Algorithm . 60 7.3.1 Syntax-Directed Construction of Summaries . 60 7.3.2 Calculating Join-Over-All-Paths Solutions . 64 7.3.3 Construction of All-Points Summaries . 64 7.3.4 Interprocedural Analysis . 65 7.4 Elimination of Labels . 65 7.5 Experiments . 66 7.5.1 Prototype Implementations . 66 7.5.2 Experimental Setup . 66 7.5.3 Experimental Results . 67 7.6 Related Work . 68 7.7 Conclusion . 68 8 Syntax-Directed Construction of Value Graphs 69 8.1 Introduction . 69 8.2 Value Numbering and Value Graphs . 70 8.2.1 Value Numbering . 70 8.2.2 ϕ-Function . 72 8.2.3 Definition and Formalization of Value Graphs . 74 CONTENTS v 8.3 Construction Algorithm to the While Language . 74 8.3.1 Syntax-Directed Formulation . 75 8.3.2 Implementation Issues . 77 8.4 Taming Goto/Label Statements . 78 8.4.1 Difficulty of Goto/Label Statements . 78 8.4.2 Syntax-Directed Approach to Single-Entry Multiple-Exit ASTs . 79 8.4.3 Tolerating Multiple-Entry ASTs by Reduction . 82 8.5 Related Work and Discussion . 84 8.6 Conclusion . 86 9 Lessons from Syntax-Directed Program Analysis 87 III Programming With Neighborhood 89 10 Neighborhood Computations 91 11 Time Contraction for Optimizing Stencil Computation 93 11.1 Introduction . 93 11.2 Loop Contraction: Reduction of Iterations . ..

Load more