Data Provisioning in Simulation Workflows

Data Provisioning in Simulation Workflows Von der Fakultät Informatik, Elektrotechnik und Informationstechnik sowie dem Stuttgart Research Centre for Simulation Technology der Universität Stuttgart zur Erlangung der Würde eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigte Abhandlung Vorgelegt von Peter Reimann aus Stuttgart Hauptberichter: Prof. Dr.-Ing. habil. Bernhard Mitschang Mitberichter: Prof. Dr. rer. nat. Bertram Ludäscher Tag der mündlichen Prüfung: 15. Dezember 2016 Institut für Parallele und Verteilte Systeme (IPVS) der Universität Stuttgart 2017 3 Acknowledgements This thesis is dedicated to several people who have supported my work on it in various ways. First of all, I would especially like to thank my doctoral advisor, Prof. Dr.-Ing. habil. Bernhard Mitschang, who gave me the opportunity to work on this challenging topic in his research group. His constructive support, valuable advice, and many interesting discussions have significantly helped me to conduct my scientific research and to grow both as a researcher and as a person. Furthermore, my sincere thanks go to the co-reviewer, Prof. Dr. rer. nat. Bertram Ludäscher, for spending time reading this thesis and for giving helpful suggestions and comments. I have been working on this thesis while being employed as a scientific staff member at the Institute of Parallel and Distributed Systems and at the Cluster of Excellence in Simulation Technology (SimTech) of the University of Stuttgart. So, I would also like to thank the German Research Foundation (DFG) for financial support of the accompanying research project within the SimTech cluster. It was a pleasure and a great time working together and having stimulating discussions with many colleagues at the University of Stuttgart. Special thanks go to Holger Schwarz for his extensive and reliable collaboration, as well as for his continuous and patient effort to improve the quality of my research. Furthermore, I would like to thank several persons who have published papers with me. Besides Bernhard Mitschang and Holger Schwarz, this mainly includes Pascal Hirmer, Dimka Karastoyanova, Jan Königsberger, Frank Leymann, Jorge Minguez, Michael Reiter, Stefan Silcher, Tim Waizenegger, Matthias Wieland, and Sema Zor. Likewise, many other current and former colleagues of the department Applications of Parallel and Distributed Systems helped me via miscellaneous advises and discussions, in particular Nazario Cipriani, Christoph Gröger, Philipp Janowski, Carlos Lübbe, Sylvia Radeschütz, Oliver Schiller, Christoph Stach, and Marko Vrhovnik. On this note, I would also like to thank the plenty collaboration partners from the SimTech cluster. This especially holds for Jörg Fehr, Fabian Franzelin, Katharina Görlach, Michael Hahn, Dimka Karastoyanova, Robert Krause, Johannes Kästner, Frank Leymann, Dirk Pflüger, Michael Reiter, Syn Schmitt, David Schumm, Mirko Sonntag, Karolina Vukojevic- Haupt, and Andreas Weiß. Finally, particular thanks go to Pascal Hirmer, Oliver Kopp, and Holger Schwarz for spending their precious time proof-reading this document and for their valuable comments. 4 I would also like to acknowledge the work of several students of the University of Stuttgart. In particular, my thanks go to the following students for developing prototypes of the concepts and methods proposed by this thesis: Christian Ageu, Stavros Aristidou, Andreas Bohrn, Daniel Brüderle, Marzieh Dehghanipour, Alexander Gessler, Michael Hahn, Eva Hoos, Wolfgang Hüttig, Savas Kalyoncu, Christoph Müller, Henrik Pietranek, René Rehn, Simon Remppis, Victor Riempp, Michael Schneidt, Xi Tu, Patrick von Steht, Florian Wagner, and Firas Zoabi. In addition, I thank the administrations of the Institute of Parallel and Dis- tributed Systems and of the Cluster of Excellence in Simulation Technology for their support in organizational and technical issues. Special thanks go to Ralf Aumüller, Manfred Rasch, Annemarie Rösler, Eva Strähle, Barbara Teutsch, and Christine Well who always helped me in case of questions or problems. Further thanks go to Bertram Ludäscher again for hosting me during my visit at the University of Illinois at Urbana-Champaign and for working with me on aspects related to scientific workflows and provenance. I had a great time and learned much during this visit. Especially the following people cordially welcomed me and assisted me in many professional and organizational issues: Yang Cao, Janet Eke, Timothy McPhillips, Qian Zhang, as well as the whole GSLIS Help Desk team. Special thanks go to my new friends Ruth and David Krehbiel for accommodating me and thereby providing a pleasant personal environment. Finally, I want to express my sincere gratitude to the closest persons to whom this thesis is especially dedicated: my parents Christa and Gunter Reimann. They always let me go my way and likewise continuously supported and encouraged me during my work on this thesis. Thank you from the bottom of my heart! I would also like to thank the rest of my family and all my friends who were by my side during all the years. Each of them helped me a lot, even if they are not aware of it. Peter Reimann Stuttgart, December 20, 2016 5 Contents List of Abbreviations 11 Abstract 13 German Summary 17 1 Introduction 21 1.1 Running Example and Major Motivation............. 23 1.2 Challenges and State of Current Work.............. 26 1.2.1 Diversity of Available Data Provisioning Techniques... 26 1.2.2 Multiplicity and Complexity of Low-Level Data Manage- ment Operations...................... 27 1.2.3 Efficient Data Processing and Optimization....... 29 1.2.4 Data Quality........................ 29 1.2.5 Monitoring and Provenance Support........... 30 1.3 Contributions of this Thesis.................... 30 1.3.1 Diversity of Available Data Provisioning Techniques... 31 1.3.2 Multiplicity and Complexity of Low-Level Data Manage- ment Operations...................... 31 1.3.3 Efficient Data Processing and Optimization....... 36 1.4 Outline of this Thesis........................ 36 2 Background 39 2.1 Computer-based Simulations.................... 39 2.1.1 Simulation Models and their Implementation....... 41 2.1.2 Typical Phases of Simulation Workflows......... 45 2.2 Data Resources and Command Languages............ 47 2.2.1 File / Operating Systems.................. 48 6 Contents 2.2.2 Relational Database Systems............... 49 2.2.3 XML Database Systems.................. 52 2.2.4 Sensor Networks and Gateways to them......... 54 2.3 Workflows and Workflow Languages................ 57 2.3.1 Classification of Workflows................. 58 2.3.2 Workflow Languages and Workflow Systems....... 61 2.3.3 Web Services Business Process Execution Language... 65 3 Data Management in Simulation Workflows 71 3.1 Considered Example Simulations................. 72 3.2 Characteristics of Simulation Data................ 73 3.2.1 Provisioning of Input Data for Single Simulations.... 74 3.2.2 Data Exchange between Different Simulations...... 79 3.3 Basic Data Management Patterns................. 84 3.3.1 Data Transfer and Transformation Patterns....... 84 3.3.2 Data Iteration Patterns................... 86 3.4 Summary and Future Work.................... 88 4 Data Provisioning Techniques for Simulation Workflows 89 4.1 Classification of Data Provisioning Techniques.......... 91 4.1.1 Concepts of Data Provisioning Techniques........ 91 4.1.2 Assessment of the Proposed Classification Scheme.... 94 4.2 Comparison of Data Provisioning Techniques........... 96 4.2.1 Discussion of Relevant Comparison Criteria....... 97 4.2.2 Comparison Regarding Support of Data Management Pat- terns............................. 99 4.2.3 Comparison Regarding Non-Functional Issues...... 104 4.3 Guidelines for Choosing Appropriate Data Provisioning Techniques106 4.3.1 Discussion of Guidelines.................. 107 4.3.2 Evaluation of the Proposed Guidelines.......... 110 4.4 Implications for Simulation Workflow Systems.......... 115 4.4.1 Missing Features of Current Workflow Systems...... 116 4.4.2 Extended Simulation Workflow System.......... 119 4.4.3 Prototypical Implementation of the Extended Simulation Workflow System...................... 123 4.5 Summary and Future Work.................... 124 Contents 7 5 A Framework for Accessing External Data in Workflows 125 5.1 Related Work............................ 127 5.1.1 Data Provisioning Support of Workflow Systems..... 127 5.1.2 Simulation Data Management Systems.......... 128 5.1.3 Solutions to Data Integration or Data Exchange..... 130 5.1.4 Main Conclusions...................... 134 5.2 Generic Workflow Language Extensions for Data Management. 135 5.2.1 Basic Modeling Aspects.................. 135 5.2.2 Types of Data Management Activities.......... 137 5.3 Generic and Uniform Data Resource Access........... 139 5.3.1 Generic Data Access Operations.............. 140 5.3.2 Implementation of the SIMPL Core Operations..... 141 5.3.3 Metadata for Mappings to Access Mechanisms...... 144 5.4 Prototype and its Application to the Bone Simulation...... 146 5.4.1 Provisioning of Input Data for Single Simulations.... 146 5.4.2 Data Exchange between Different Simulations...... 148 5.5 Evaluation.............................. 150 5.5.1 Generic Data Management in Workflows......... 151 5.5.2 Diversity of Available Data Provisioning Techniques... 153 5.5.3 Multiplicity and Complexity of Low-Level Data Manage- ment Operations...................... 154 5.5.4 Efficient Data Processing and Optimization....... 156 5.5.5 Data Quality and Provenance Support.......... 157 5.6 Summary

Load more