Towards Scalable Model Indexing

Towards Scalable Model Indexing Konstantinos Barmpis Engineering Doctorate University of York Computer Science March 2016 Abstract Model-Driven Engineering (MDE) is a software engineering discipline promoting models as first-class artefacts of the software lifecycle. It offers increased productivity, con- sistency, maintainability and reuse by using these models to generate other necessary products, such as program code or documentation. As such, persisting, accessing, ma- nipulating, transforming and querying such models needs to be efficient, for maintaining the various benefits MDE can offer. Scalability is often identified to be a bottleneck for potential adapters of MDE, as large-scale models need to be handled seamlessly, without causing disproportionate losses in performance or limiting the ability of multiple stake- holders to work simultaneously on the same collection of large models. This work iden- tifies the primary scalability concerns of MDE and tackles those related to the querying of large collections of models in collaborative modeling environments; it presents a novel approach whereby information contained in such models can be efficiently retrieved, or- thogonally to the formats in which models are persisted. This approach, coined model indexing leverages the use of file-based version control systems for storing models, while allowing developers to efficiently query models without needing to retrieve them from remote locations or load them into memory beforehand. Empirical evidence gathered during the course of the research project is then detailed, which provides confidence that such novel tools and technologies can mitigate these specific scalability concerns; the results obtained are promising, offering large improvements in the execution time of certain classes of queries, which can be further optimized by use of caching and database indexing techniques. The architecture of the approach is also empirically validated, by virtue of integration with various state-of-the-art modeling and model management tools, and so is the correctness of the various algorithms used in this approach. 3 For my grandfather Costas 5 Contents Abstract 3 Contents 7 List of Figures 13 List of Tables 17 List of Algorithms 19 Listings 21 Author Declaration 25 1. Introduction 27 1.1. Overview of Model-Driven Engineering . 27 1.2. Overview of Versioning Systems . 28 1.3. Motivation and Research Hypothesis . 28 1.4. Research Results . 30 1.5. Thesis Structure . 30 2. Background 33 2.1. Model-Driven Engineering . 33 2.1.1. Modeling and Automated Model Management . 34 2.1.1.1. Modeling Languages . 34 2.1.1.2. Metamodeling Architectures . 34 2.1.1.3. Model Querying and Modification . 38 2.1.1.4. Model Transformation . 39 7 CONTENTS 2.1.2. MDE with Large-Scale Models . 41 2.1.2.1. Scalability . 41 2.1.3. Running Example . 43 2.2. Model Persistence and Versioning . 48 2.2.1. Model Persistence Formats . 48 2.2.1.1. XML Metadata Interchange (XMI) . 48 2.2.1.2. Relational Database Persistence { Teneo/Hibernate . 50 2.2.1.3. Document-based Persistence { Morsa . 53 2.2.1.4. Document-based Persistence { MongoEMF . 56 2.2.1.5. Graph-based Persistence { NeoEMF (Graph) . 57 2.2.1.6. Key-value-based Persistence { NeoEMF (Map) . 59 2.2.2. Model Versioning . 62 2.2.2.1. File-based Versioning . 63 2.2.2.2. Model-based Versioning . 67 2.2.3. Model Querying . 79 2.2.3.1. Querying Technologies . 80 2.2.3.2. Repository Querying . 84 2.2.4. Summary . 85 3. Analysis and Hypothesis 87 3.1. Analysis . 87 3.1.1. Model Indexing . 88 3.2. Research Hypothesis . 89 3.3. Research Objectives . 90 3.4. Scope . 91 4. Hawk: Scalable Model Indexing Framework 93 4.1. System Capabilities . 93 4.2. System Architecture . 94 4.2.1. System Components . 95 4.3. System Design . 98 4.3.1. Hawk Model Index Structure . 98 4.3.1.1. Back-end Persistence . 99 4.3.1.2. Background: Neo4J . 100 8 CONTENTS 4.3.2. Hawk Mapping Layers . 101 4.3.2.1. Model Layer . 101 4.3.2.2. Graph Layer . 103 4.3.3. Version Control Managers . 106 4.3.3.1. SVN Manager . 106 4.3.3.2. LocalFolder Manager . 106 4.3.3.3. Git Manager . 107 4.3.3.4. Workspace Manager . 107 4.3.4. Metamodel/Model Resource Factories . 107 4.3.4.1. EMF Resource Factories . 108 4.3.4.2. Other Resource Factories . 109 4.3.5. Metamodel/Model Updaters . 112 4.3.5.1. Metamodel Updater . 112 4.3.5.2. Model Updater . 113 4.3.6. Querying Hawk . 121 4.3.6.1. Native Querying . 121 4.3.6.2. Back-end Independent Navigation and Querying . 121 4.3.7. System Lifecycle . 128 4.3.7.1. Synchronization Procedure . 130 4.3.8. Advanced Features and Optimizations . 133 4.3.8.1. Derived Attributes . 133 4.3.8.2. Derived Attributes: Incremental Updating . 135 4.3.8.3. Database Indexing . 137 4.3.8.4. Querying an optimized Hawk Model Index . 138 4.3.9. Summary . 142 4.4. Implementation . 143 4.4.1. Eclipse Plugins . 143 5. Evaluation 149 5.1. Evaluation Strategy . 149 5.1.1. Correctness . 149 5.1.1.1. Index Content Correctness . 150 5.1.1.2. Hawk Validation Listener . 151 5.1.1.3. Query Correctness . 152 9 CONTENTS 5.1.2. Performance . 152 5.1.2.1. Query Performance . 152 5.1.2.2. Update Performance . 154 5.1.3. Tool Integration . 154 5.1.4. Architecture Evaluation . 154 5.2. Evaluation Benchmarks . 154 5.2.1. Grabats 2009 Case-Study . 155 5.2.1.1. Grabats Query . 156 5.2.2. The BPMN MIWG Test Suite Repository . 158 5.3. Evaluation Results . 159 5.3.1. Benchmarking of Model Insertion and Querying Using Native Java and EOL . 159 5.3.1.1. Model Insertion . 160 5.3.1.2. Query Execution Time and Memory Footprint . 160 5.3.1.3. Disc Space . 164 5.3.2. Benchmarking of Incremental Updating in Hawk . 167 5.3.2.1. Model Update Execution Time . 168 5.3.2.2. Derived Attribute Update Execution Time . 169 5.3.2.3. Threats to Validity . 170 5.3.3. Benchmarking of Derived and Indexed Attributes in Hawk . 171 5.3.3.1. Derived Attribute Definition . 171 5.3.3.2. Query Definition and Execution Time . 172 5.3.4. Benchmarking of Continuous Model Updates in Hawk . 175 5.3.4.1. Update Performance Results . 177 5.3.4.2. Update Validation Results . 178 5.4. Hawk Tool Integration . 181 5.4.1. Epsilon Integration . 181 5.4.2. Exposing Hawk as an EMF Resource . 181 5.4.3. Remote Query API (using Apache Thrift) . 182 5.4.4. EMF IncQuery Integration . 183 5.4.4.1. EMF IncQuery . 183 5.4.4.2. The Train Benchmark . 184 5.5. Additional Drivers for Hawk . 185 5.5.1. Alternate Version Control Managers . 186 10 CONTENTS 5.5.2. Alternate Model Factories . 186 5.5.3. Alternate Persistence Technologies . 186 5.6. Research Hypothesis Evaluation . 186 6. Conclusions 189 6.1. Summary . 189 6.2. Contributions of the Thesis Research . 189 6.2.1. Novel Tools and Techniques . 190 6.2.1.1. Heterogeneous Model Indexing Platform . 190 6.2.1.2. Incremental Updating of Model Indexes . 190 6.2.1.3. Incremental Updating of Derived Attributes . 191 6.2.2. Notable side-products . 191 6.2.2.1. Evaluation of Model Persistence Technologies . 191 6.2.2.2. Use of Derived and Indexed Attributes in Model Indexes 191 6.2.2.3. Scalable Model Querying . 192 6.3. Applications . 192 6.4. Future Work . 194 Appendices 197 A. Details on Hawk Interfaces 199 B. Model Mutation Operations 205 C. User Guide 209 Bibliography 219 11 List of Figures 2.1. The four layers of model abstraction, based on [10] . 35 2.2. The GOPRR metamodel, from [13] . ..

Towards Scalable Model Indexing

Pragmatic Version Control Using Subversion

Software Best Practices

IPS Signature Release Note V9.17.79

Trabajo De Fin De Carrera

Inequalities in Open Source Software Development: Analysis of Contributor’S Commits in Apache Software Foundation Projects

Collaboration Tools in Software Engineering Stepan Bolotnikov Me

CPS Release Notes

Design Options of Store-Oriented Software Ecosystems: an Investigation of Business Decisions ∗

Syncro SVN Client 8.1

Welcome-To-Reqtify-2016-FD01.Pdf

Version Control) Installation and Configuration Guide Version 8.0 and Higher

Flare Source Control Guide: Apache Subversion