Efficient Engineering and Execution of Pipe-And-Filter Architectures
Total Page:16
File Type:pdf, Size:1020Kb
Efficient Engineering and Execution of Pipe-and-Filter Architectures Dissertation M.Sc. Christian Wulf Dissertation zur Erlangung des akademischen Grades Doktor der Ingenieurwissenschaften (Dr.-Ing.) der Technischen Fakultät der Christian-Albrechts-Universität zu Kiel eingereicht im Jahr 2019 1. Gutachter: Prof. Dr. Wilhelm Hasselbring Christian-Albrechts-Universität zu Kiel 2. Gutachter: Prof. Dr. Steffen Becker Universität Stuttgart Datum der mündlichen Prüfung: 19. Juli 2019 ii Zusammenfassung Pipe-and-Filter (P&F) ist ein wohlbekannter und häufig verwendeter Archi- tekturstil. Allerdings gibt es unseres Wissens nach kein P&F-Framework, das beliebige P&F-Architekturen sowohl modellieren als auch ausführen kann. Beispielsweise unterstützen die Frameworks FastFlow, StreamIT and Spark nicht mehrere Input- und Output-Ströme pro Filter, sodass sie kei- ne Verzweigungen modellieren können. Andere Frameworks beschränken sich auf sehr spezielle Anwendungsfälle oder lassen die Typsicherheit zwi- schen zwei miteinander verbundenen Filtern außer Acht. Außerdem ist eine effiziente parallele Ausführung von P&F-Architekturen weiterhin eine Herausforderung. Obwohl einige vorhandene Frameworks Filter parallel ausführen können, gibt es noch viel Optimierungspotential. Leider besit- zen die meisten Frameworks kaum Möglichkeiten, die einzig vorhandene Ausführungsstrategie ohne großen Aufwand anzupassen. In dieser Arbeit präsentieren wir unser generisches und paralleles P&F- Framework TeeTime. Es kann beliebige P&F-Architekturen sowohl model- lieren als auch ausführen. Gleichzeitig ist es offen für Modifikationen, um mit dem P&F-Stil zu experimentieren. Zudem erlaubt es, Filter effizient und parallel auf heutigen Multi-core-Systemen auszuführen. Umfangreiche Laborexperimente zeigen, dass TeeTime nur einen sehr geringen und in gewissen Fällen gar keinen zusätzlichen Laufzeit-Overhead im Vergleich zu Implementierungen ohne P&F-Abstraktionen erfordert. Außerdem weisen wir TeeTimes breite Anwendbarkeit und Erweiterbarkeit mit Hilfe von zahlreichen Fallstudien aus der Forschung und der Industrie nach. Wir bieten hierzu ein Replication-Package an, das die Daten und den Quellcode all unserer Experimente beinhaltet. So ermöglichen wir die Veri- fizierbarkeit sowie die Wiederholbarkeit unserer Experimente und erlauben das Nachvollziehen der Ergebnisse. Zudem bieten wir Referenzimplemen- tierungen für TeeTime in Java sowie in C++ als Open-Source-Software unter https://teetime-framework.github.io an. iii Abstract Pipe-and-Filter (P&F) is a well-known and often used architectural style. However, to the best of our knowledge, there is no P&F framework which can model and execute generic P&F architectures. For example, the frame- works Fastflow, StreamIt, and Spark do not support multiple input and output streams per filter and thus cannot model branches. Other frameworks focus on very specific use cases and neglect type-safety when interconnect- ing filters. Furthermore, an efficient parallel execution of P&F architectures is still an open challenge. Although some available frameworks can execute filters in parallel, there is much potential for optimization. Unfortunately, most frameworks have a fixed execution strategy which cannot be altered without major changes. In this thesis, we present our generic and parallel P&F framework TeeTime. It is able to model and to execute arbitrary P&F architectures. Simultaneously, it is open for modifications in order to experiment with the P&F style. Moreover, it allows to execute filters in parallel by utilizing the capabilities of contemporary multi-core processor systems. Extensive lab experiments show that TeeTime imposes a very low and in certain cases even no runtime overhead compared to implementations with- out framework abstractions. Moreover, several case studies from research and industry show TeeTime’s broad applicability and extensibility. We pro- vide a replication package containing all our experimental data and code to facilitate the verifiability, repeatability, and further extensibility of our results. Furthermore, we provide reference implementations for TeeTime in Java and in C++ as open source software on https://teetime-framework.github.io. v Preface by Prof. Dr. Wilhelm Hasselbring Software Engineering of parallel and distributed software systems poses specific challenges for software engineers. The engineered parallel and distributed software systems have to be correct as well as efficient and scalable. Architectural styles and design patterns provide proven solutions to recurring tasks in software development, including the development of such parallel and distributed software systems. Since three decades, this design knowledge is systematically documented in appropriate catalogs for reuse. The so-called “pipe-and-filter” style is one of the first architectural styles, documented early in the nineties. Even if this style can be considered a “clas- sic” style, it remains a great challenge to implement pipe-and-filter systems efficient and scalable on parallel and distributed hardware platforms. In this thesis, Christian Wulf designs, implements and evaluates the new, innovative TeeTime approach to realize efficient pipe-and-filter systems in an efficient way. Besides the conceptual work, this work contains a significant experimental part with an implementation and a multifaceted evaluation. This engineering dissertation has been extensively evaluated with various experiments, including data from an industrial system. Notably, this thesis already now has significant impact. The new Kieker monitoring analysis component has been migrated to TeeTime, and TeeTime is used in various follow-up projects. If you are interested in designing and implementing pipe-and-filter software systems, this is a recommended reading for you. Wilhelm Hasselbring Kiel, August 2019 vii Contents 1 Introduction1 1.1 Motivation and Problem Statement................1 1.2 Approach Overview and Contributions.............3 1.3 Preliminary Work..........................6 1.4 Document Structure........................ 12 I Foundations 17 2 The Evolution of the Pipe-and-Filter Style 19 2.1 The Early Days of Pipes and Filters............... 19 2.2 Dataflow Variations......................... 20 2.3 Pipes as First-Class Entities.................... 20 2.4 The Myth of Pipe Overhead.................... 22 2.5 Sharing State among Filters.................... 23 2.6 Execution of Pipes and Filters................... 24 2.7 Parallel Execution of Pipes and Filters.............. 24 2.8 Scheduling of Filters........................ 26 2.9 Error Handling........................... 28 3 Shared Queues 29 3.1 Performance Affecting Aspects.................. 30 3.2 Lock-based Queues......................... 31 3.3 Lock-free Queues.......................... 32 3.4 Hybrid Queues........................... 33 3.5 Optimization Techniques...................... 33 4 Design Patterns for Parallel Systems 39 4.1 The Pipeline Pattern........................ 39 ix Contents 4.2 The Task Farm Parallelization Pattern.............. 41 5 The Actor Model 45 5.1 Components............................. 45 5.2 Properties.............................. 46 5.3 Example Implementations..................... 47 6 Domain-Specific Languages 49 II The Pipe-and-Filter Framework TeeTime 53 7 Research Design 55 7.1 Research Scope........................... 55 7.2 Research Plan............................ 56 7.3 Research Methods.......................... 58 8 Framework Architecture 59 8.1 Architectural Drivers........................ 59 8.2 Framework Entities......................... 63 9 Executing P&F Systems 75 9.1 Execution Models.......................... 75 9.2 Parallelization via the Task Farm Stage............. 88 9.3 Distributed Execution of Stages.................. 99 9.4 Feedback Loops in P&F Configurations............. 102 10 Developing P&F Systems 105 10.1 Definition of Stages......................... 105 10.2 Definition of P&F Configurations................. 110 10.3 Debugging and Profiling of P&F Configurations........ 115 III Evaluation of TeeTime 127 11 Language Independence Evaluation 129 11.1 Methodology............................ 129 x Contents 11.2 Statistical Overview......................... 130 11.3 State of Development........................ 130 11.4 Summary............................... 134 12 Feasibility Evaluation 135 12.1 Case Study: Trace Reconstruction................. 135 12.2 Case Study: Quicksort....................... 139 12.3 Industrial Case Study: Extraction of User Behavior Profiles. 142 12.4 Case Study: Live Visualization of Performance Anomalies.. 146 12.5 Case Study: Distributed Execution................ 149 13 Performance Evaluation 155 13.1 Framework Overhead Evaluation................. 155 13.2 Scheduling Approach Evaluation................. 166 13.3 Parallelization Approach Evaluation............... 170 14 Reusability and Extensibility Evaluation 179 14.1 Methodology............................ 179 14.2 Reusable and Extensible Entities................. 180 14.3 Public and Private Entities..................... 181 14.4 Case Studies............................. 183 15 Related Work 185 15.1 Related Parallel P&F Frameworks................ 185 15.2 Related Distributed P&F Frameworks.............. 189 15.3 Parallel Execution Strategies for Stages............. 190 15.4 Related Architectural Styles.................... 191 IV Conclusions and Future Work 195 16 Conclusions 197 16.1 Modeling Concepts......................... 197 16.2 Execution Concepts........................