Gremlin-ATL: a Scalable Model Transformation Framework Gwendal Daniel, Frédéric Jouault, Gerson Sunyé, Jordi Cabot

Gremlin-ATL: A Scalable Model Transformation Framework Gwendal Daniel, Frédéric Jouault, Gerson Sunyé, Jordi Cabot To cite this version: Gwendal Daniel, Frédéric Jouault, Gerson Sunyé, Jordi Cabot. Gremlin-ATL: A Scalable Model Transformation Framework. ASE2017 : 32nd IEEE/ACM International Conference on Automated Software Engineering, Oct 2017, Urbana-Champaign, United States. 10.1109/ASE.2017.8115658. hal-01589582 HAL Id: hal-01589582 https://hal.archives-ouvertes.fr/hal-01589582 Submitted on 18 Sep 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Gremlin-ATL: A Scalable Model Transformation Framework Gwendal Daniel Frédéric Jouault Gerson Sunyé Jordi Cabot AtlanMod Team TRAME Team AtlanMod Team ICREA Inria, IMT Atlantique, LS2N Groupe ESEO Inria, IMT Atlantique, LS2N UOC [email protected] [email protected] [email protected] [email protected] Abstract—Industrial use of Model Driven Engineering tech- as UML [9] class diagrams, code documentation, or quality niques has emphasized the need for efficiently store, access, metrics. However, these tools typically face scalability issues and transform very large models. While scalable persistence when the input code base increases because the underlying frameworks, typically based on some kind of NoSQL database, have been proposed to solve the model storage issue, the same modeling frameworks are not designed to store and transform level of performance improvement has not been achieved for the large models efficiently. model transformation problem. Existing model transformation Scalable modeling storage systems [10]–[12] have been tools (such as the well-known ATL) often require the input models proposed to tackle this issue, focusing on providing a solution to be loaded in memory prior to the start of the transformation to store and manipulate large models in a constrained memory and are not optimized to benefit from lazy-loading mechanisms, mainly due to their dependency on current low-level APIs offered environment with minimal performance impact. Relational and by the most popular modeling frameworks nowadays. NoSQL databases are used to store models, and existing solu- In this paper we present Gremlin-ATL, a scalable and efficient tions rely on a lazy-loading mechanism to optimize memory model-to-model transformation framework that translates ATL consumption by loading only the accessed objects from the transformations into Gremlin, a query language supported by database. several NoSQL databases. With Gremlin-ATL, the transformation is computed within the database itself, bypassing the While these systems have improved the support for man- modeling framework limitations and improving its performance aging large models, they are just a partial solution to the both in terms of execution time and memory consumption. Tool scalability problem in current modeling frameworks. In its support is available online. core, all frameworks are based on the use of low-level model Index Terms—ATL; Gremlin; OCL Scalability; Persistence handling APIs. These APIs are then used by most other Framework; model transformation; NoSQL MDE tools in the framework ecosystem to query, transform, and update models. These APIs are focused on manipulating I. INTRODUCTION individual model elements and do not offer support for generic Models are used in various engineering fields as abstract queries and transformations. views helping designers and developers understand, manipu- This low-level design is clearly inefficient when combined late, and transform complex systems. They can be manually with persistence framework because (i) the API granularity is constructed using high-level modeling tools, such as civil too fine to benefit from the advanced query capabilities of the engineering models [1], or automatically generated in model- backend, and (ii) an important time and memory overhead is reverse engineering processes such as software evolution tasks necessary to construct navigable intermediate objects needed [2] or schema discovery from existing documents [3]. to interact with the API. As shown in Figure 1, this is par- Models are then typically used in Model Driven Engineering ticularly true in the context of model transformations, which (MDE) processes that rely intensively on model transformation heavily rely on high-level model navigation queries (such as engines to implement model manipulation operations like view the allInstances() operation returning all instances of a extraction, formal verification or code-generation [4]. given type) to retrieve input elements to transform and create With the growing accessibility of big data (such as national the corresponding output elements. This mismatch between open data programs [5]) as well as the progressive adoption high-level modeling languages and low-level APIs generates of MDE techniques in the industry [6], [7], the volume and a lot of fragmented queries that cannot be optimized and diversity of data to model has grown to such an extent that computed efficiently by the database [13]. the scalability of existing technical solutions to store, query, To overcome this situation, we propose Gremlin-ATL, an al- and transform these models has become a major issue [8]. ternative transformation approach. Instead of translating high- For example, reverse engineering tools such as MoDisco [2] level model transformation specifications into a sequence of rely on MDE technologies to extract a set of high-level inefficient API calls, we translate them into database queries models representing an existing code base. These models are and execute them directly where the model resides, i.e. in the then operated by model transformations to create a set of database store. This approach is based on a Model Mapping artifacts providing a better understanding of the system, such that allows to access an existing database using modeling Model Transformation Modeling Database transformation rules and define libraries. The language defines Transformation Engine Framework three types of rules: (i) matched rules that are declarative API Call 1 and automatically executed, (ii) lazy rules that have to be … invoked explicitly from another rule, and (iii) called rules API Call 1 n which contain imperative code . ATL does not assume any order between matched rules, and keeps a set of trace links get(a ) 1 between the source and target models that are used to resolve get(a1, name) A.allInstances().name … elements, and set target values that have not been transformed get(a ) n yet. get(a , name) n The language also defines helpers expressed in OCL [15] Fig. 1. Model Transformation Engine and Modeling Framework Integration that are used to compute information in a specific context, provide global functions, and runtime attributes computed on the fly. OCL helpers can be invoked multiple times in a primitives, and on a Transformation Helper that lets modelers transformation, and are a good solution to modularize similar tune the transformation execution to fit their performance navigation chains and condition checking. needs. Finally, ATL programs are themselves described as models The rest of this paper is structured as follow: Section II that conform to the ATL metamodel. This feature allows to introduces the input transformation language and the output define high-order transformations, that take an ATL transfor- database query language of our approach and presents a run- mation as their input and manipulate it to check invariant ning example that will be used through the paper. Section III properties that should hold in the output model, infer type, or presents Gremlin-ATL and its key components, Section IV refine the transformation. Our approach relies on this reflective present how a transformation is executed from a user point feature to translate ATL transformations into efficient database of view. Sections V and VI present our prototype and the queries. benchmarks used to evaluate our solution. Section VII shows Note that in this paper we focus on ATL as our input an application example where Gremlin-ATL is used to specify language, but our approach can be adapted to other rule-based data extraction rules between two data sources. Finally, Sec- transformation languages, notably the QVT [16] standard. tion VIII presents the related work and Section IX summarizes the key points of the paper, draws conclusion, and presents our future work. II. BACKGROUND In this section we introduce the key features of ATL [4], the model transformation language we use as the input of our framework, and Gremlin [14], a multi-database graph traversal query language we use as our output language. We also present along the section a running example that is used through this article to illustrate the different steps of our approach. A. ATL Transformation Language (a) Metamodel (b) Instance Model In MDE, models are key elements in any software engineering activity. Models are manipulated and refined using model- Fig. 2. Example Metamodel and Model to-model transformations, and the final software artifacts

Load more