Delta July 5, 2018

Architectural Document

Version 1.0

Project Team D.P. van den Berg 0949036 R.T. van Bergen 0938857 D.J.C. Dekker 0936100 S. van den Eerenbeemt 0954445 J. Mols 0851883 B.F. Rongen 0858160 B.W.M. van Rooijen 0895073 R.P. Schellekens 0944330 A.A. Vast 0854060 G. Walravens 0904152 S. Wessel 0941508

Project Managers S.P.O. Oostveen A. Rajaraman

Project Supervisor dr. N. Zannone

Customer dr. L. Genga Abstract This is the Architectural Design Document (ADD) for the APD extension developed by Delta. The APD extension adds several features to the existing APD tool, such as user management, experiment management and project sharing. The architectural design decisions in this ADD satisfy the software requirements in the SRD [1]. This document complies with the Software Standard, as specified by the European Agengy [2]. Contents

1 Introduction 6 1.1 Purpose...... 6 1.2 Scope...... 6 1.3 List of definitions and abbreviations...... 6 1.3.1 List of definitions...... 6 1.3.2 List of abbreviations...... 8 1.4 List of references...... 8 1.5 Overview...... 9

2 System overview 10 2.1 Background...... 10 2.2 Context and basic design...... 10 2.3 Design decisions...... 10 2.3.1 Extending the APD tool...... 10 2.3.2 Programming language...... 11 2.3.3 Operating system...... 11 2.3.4 Model-View-Controller ...... 12 2.3.5 Client-server model...... 12 2.3.6 Web interface...... 12 2.3.7 Access control...... 13

3 System context 14 3.1 APD tool...... 14 3.2 MySQL...... 14 3.2.1 Implementation of MySQL...... 14 3.3 SMTP mail server...... 14 3.3.1 Implementation of the STMP mail server...... 15

4 System design 16 4.1 Design method...... 16 4.2 Decomposition...... 16 4.2.1 Back-end decomposition...... 16 4.2.2 Front-end decomposition...... 19 4.2.3 File structure...... 21 4.2.4 ...... 22

5 Components 24

6 Feasibility and resource estimates 25 6.1 Resource requirements...... 25 6.1.1 Using the web application...... 25 6.1.2 Hosting the Delta extension...... 25 6.1.3 Developing the Delta extension...... 25 6.2 Performance...... 26

7 Requirements traceability matrix 27 7.1 Software requirements to modules...... 27 7.2 Modules to software requirements...... 30 7.2.1 Back-end modules...... 30 7.2.2 Front-end modules...... 31 Document Status

General

Title of document Architectural Design Document Document identifier Delta.ADD/1.0 Version 1.0 Authors D.P. van den Berg R.T. van Bergen D.J.C. Dekker S. van den Eerenbeemt J. Mols B.F. Rongen B.W.M. van Rooijen R.P. Schellekens A.A. Vast G. Walravens S. Wessel

History

Version Date Author(s) Reason 0.1 2018-06-15 D.P. van den Berg Inital draft. S. van den Eerenbeemt J. Mols B.W.M. van Rooijen S. Wessel 1.0 2018-07-15 D.P. van den Berg Added section 4 and implemented S. van den Eerenbeemt feedback. J. Mols B.W.M. van Rooijen R.P. Schellekens S. Wessel

Delta Architectural Design Document 4 Document Change Records

Section Reason 4.2 Added component decomposition.

Delta Architectural Design Document 5 1 Introduction

1.1 Purpose

The Architectural Design Document (ADD) describes the fundamental design of the software that will be developed by the Delta team. In this document, an overview of the system is provided. This document also describes the design decisions that were made, and describes the decomposition of the software into several components. It is also specified which software requirements in the SRD [1] each component fulfills.

1.2 Scope

The APD tool is designed to extract anomalous patterns together with their correlations. These patterns are extracted from historical logging data from past process executions. Users can upload event logs and process models on which experiments can be run. The results of these experiments can be visualized after they are completed [3]. The purpose of the Delta extension is to extend the currently existing APD tool with several fea- tures. First of all, this extension will support multiple users, which means the tool will need to have proper user management controls. The user management system will be created and several improvements have to be made to the user interface in order to realize these goals. Next to this, of the results is important, as well as reporting on the progress of the experiments. The tool is currently able to discover patterns and subgraphs from a business process model and an event log. These patterns and subgraphs give information on the business process and anomalous traces in the event log. In the current version, however, comparing different result types is cumbersome. Delta will provide a more convenient way to manage experiments and their results as well as the projects they belong to. Furthermore, the administrator will be able to manage the users and the projects created by users, as well as monitor the user activity tracking of each individual user. These controls will also be provided by Delta.

1.3 List of definitions and abbreviations

1.3.1 List of definitions

Administrator A registered user with the highest available access rights who manages both the tool and its users.

Anomalous subgraph dis- The extraction of recurrent subgraphs involving one or more deviations covery from the process model [3].

APD tool The APD tool is an extension of the Esub tool designed to extract anoma- lous patterns together with their correlations. These patterns are ex- tracted from historical logging data from past process executions. Users can upload event logs and process models on which experiments can be run. After the experiments are completed, the tool supports the users in exploring the obtained results [3].

Business process A set of activities performed in an organization and technical environment that are coordinated to a certain product or service [4].

Child subgraph The child S of a subgraph S 0 is a subgraph which involves S 0 in its definition.

Component A part of a phase of an experiment.

Delta Architectural Design Document 6 Esub tool An online webtool supporting the visualization and exploration of the out- come of the frequent subgraph mining algorithm SUBDUE [5].

Experiment Both the anomalous subgraph discovery and partial order discovery to- gether.

Experiment log A file that tracks all activities performed within an experiment.

Experiment phase Anomalous subgraph discovery and partial order discovery are the two phases of one experiment.

Event log file A file that consists of traces [3].

Final Result The outcome of an experiment.

Graph .g file A file that collects multiple graphs, each involving a set of edges and vertices.

Intermediate result The outcome of either a component or an experiment phase.

Maximal subgraph A subgraph s is maximal in a set of subgraphs if there does not exist a subgraph s0 such that s0 is a supergraph of s.

Minimal subgraph A subgraph s is minimal in a set of subgraphs if there does not exist a subgraph s0 such that s0 is a subgraph of s.

Parent subgraph A subgraph S is a parent of another subgraph S 0 if S 0 is a child of S.

Partial order discovery An experiment phase creating patterns from anomalous subgraphs and partially ordering them based on their location in the log traces [3].

Petri net A mathematical model used for the specification and the analysis of parallel processes [6].

Process model A representation of the prescribed behavior of a business process [3].

Project A combination of an event log, a process model, and a unique project name. The project is stored together with any experiments run under that project name.

Project owner The user who created the project.

Registered user A user with a registered account on the APD tool.

Responsive A website is responsive when dynamic changes are made to the appear- ance of the site depending on the screen size and orientation of the device being used to view it [7].

Result Either an intermediate result or final result.

Subgraph A graph whose vertices and edges are subsets of the vertices and edges of another graph [8].

Supergraph A graph S is a supergraph of graph S 0 if the vertices and edges of S 0 are a subset of the vertices and edges of S.

Support The support of a subgraph/pattern is equal to the fraction of graphs which involve the subgraph/pattern at least once [3].

Delta Architectural Design Document 7 Synchronous function A task that have to be completed before a new task can be called.

Trace A trace in a business process model is a sequence of events generated during a process execution.

Unregistered user A user who does not have an account on the APD tool.

User A person who is currently using the APD tool or who has previously used the APD tool.

User activity Creating a project, viewing a project, viewing or downloading a project’s files, deleting project files, sharing a project, starting or stopping an exper- iment phase, viewing results and status of an experiment phase, logging in, or logging out.

User activity log A file containing information on the past user activities on the APD tool.

User tracking The act of tracking the behavior of the user on the APD tool in the form of the user activity log.

Valid email An email address of a registered user is valid when it exists and the user has access to it.

1.3.2 List of abbreviations

AJAX Asynchronous JavaScript and XML

URD User Requirements Document

SRD Software Requirements Document

ADD Architectural Design Document

MVC Model-View-Controller

1.4 List of references

[1] Delta. Software requirements document, version 1.00. Technical report, Eindhoven University of Technology, 2018. [2] ESA PSS-05-0 Issue 2. Software requirements and engineering process. Technical report, European Space Agency, 1991. [3] L. Genga, M. Alizadeh, D. Potena, C. Diamantini, and Nicola Zannone. APD tool: Mining anoma- lous patterns from event logs. CEUR workshop proceedings. 2017. [4] Wil MP van der Aalst and Christian Stahl. Modeling business processes: a petri net-oriented approach. MIT press, 2011. [5] C. Diamantini, L. Genga, and D. Potena. Esub: Exploration of subgraphs. In Proceedings of the BPM Demo Session, pages 70–74, 2015. [6] A. Finkel. The minimal coverability graph for Petri nets. In Proceedings of International Confer- ence on Application and Theory of Petri Nets, pages 210–243. Springer, 1991. [7] A. Schade. Responsive (RWD) and user experience. https://www.nngroup.com/ articles/responsive-web-design-definition/, 2014. Accessed: 2014-05-04. [8] P. Black. Subgraph. https://xlinux.nist.gov/dads/HTML/subgraph.html, 2004. Ac- cessed: 2018-02-23.

Delta Architectural Design Document 8 [9] Delta. User requirements document, version 1.00. Technical report, Eindhoven University of Technology, 2018. [10] Domenico Potena Claudia Diamantini, Laura Genga. Esub: Exploration of subgraphs. Technical report, Universitá Politecnica delle Marche, 2015. [11] Nikhil S Ketkar, Lawrence B Holder, and Diane J Cook. Subdue: Compression-based frequent pattern discovery in graph data. In Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, pages 71–76. ACM, 2005. [12] Domenico Potena Claudia Diamantini Nicola Zannone Laura Genga, Mahdi Alizadeh. Apd tool: Mining anomalous patterns from event logs. In Proceedings of the BPM Demo Track and BPM Dissertation Award, 2017. [13] Glenn E Krasner, Stephen T Pope, et al. A description of the model-view-controller user inter- face paradigm in the smalltalk-80 system. Journal of object oriented programming, 1(3):26–49, 1988. [14] Materialize front-end framework. https://materializecss.com. Accessed: 2018-05-15. [15] Google material design website. https://material.io/. Accessed: 2018-06-15. [16] Phpmailer. https://github.com/PHPMailer/PHPMailer/. Accessed: 2018-06-14. [17] Valgrind. http://valgrind.org/. Accessed: 2018-07-05. [18] Massif visualizer. https://github.com/KDE/massif-visualizer. Accessed: 2018-07-05.

1.5 Overview

The remainder of this document is composed of sections that describe the architectural design of the Delta extension. Section2 provides an overview of the system, together with all design choices that were made. For every design choice, a detailed description is presented, together with its alternatives and rationale. In Section3, the system context is described, by outlining the relations of the tool to external systems. Section4 contains the system design, listing the used as well as a description of the decomposition of the system into individual components. In Section6, an estimate is provided on the computer resources needed to build, operate, and maintain the software. Finally, in Section7, a requirements traceability matrix is provided. This matrix shows how the software requirements of the SRD [1] are linked to the components described in Section 4.2.

Delta Architectural Design Document 9 2 System overview

The APD tool and the Delta extension offer users a platform with which they can run experiments on their own process models and event logs. A general description of the APD tool and the Delta extension can be found in the User Requirements Document (URD) [9]. For a more in-depth de- scription of the relevant background of the Delta extension and the environment of the project, Section 2.5 of the URD [9] and Section 2.7 of the SRD [1] are recommended reading.

2.1 Background

In 2015, the Esub tool was created by a group of researchers, one of whom is now the product owner [10]. This tool visualizes large sets of subgraphs generated by frequent subgraph mining algo- rithms such as Subdue [11]. Two years later, the Esub tool was extended with a plugin for performing anomalous subgraph discovery and partial order discovery, which became the APD tool [12]. The Delta team will extend the APD tool with additional functionality and has improved the usability of the tool. The interface of the tool will be reworked to be more user friendly and to be more convenient to work with. This is done by providing a more streamlined workflow and by having clearly sectioned components, both visually and logically. Furthermore, the Delta extension will add functionality such that users are able to register for an account and manage their own projects and experiments. Moreover, functionality will be added such that users are able to share their projects with other users. In this sharing process, users can control the access rights that other users have on their projects. Additionally, the Delta extension will add functionality such that users can see the status of their experiment phases. Users will be able to see the progress of the experiment phase so far, any intermediate results, and whether any errors have occurred. Next to this, the Delta extension will make the running of experiment phases more modular, making it possible to skip several parts of the running if their intermediate reesults have already been uploaded by a user. Finally, the Delta extension will add more extensive possibilities to compare the results of different experiments.

2.2 Context and basic design

The APD tool is hosted on a server and users can access the tool via a web page. The web page is loaded on a local browser. This setup results in three main elements: the web interface, the Delta server, and the APD tool. When the web interface makes a request to the Delta server, a back-end interface located on the server will handle the requests instead of them going directly to the APD tool. This interface will send the requested data back to the front-end such that the web interface can update its view. The Delta server makes use of its own database to store users, projects, and experiments. The results of run experiments are separately stored in the databases of the APD tool, because the APD tool gets used by the Delta components as a black box internally. Figure1 gives an indication of the general system design.

2.3 Design decisions

In this section the design decisions by Delta are explained.

2.3.1 Extending the APD tool

The APD tool is already an existing application, and the goal of the Delta team is to introduce additional functionality. Delta will make as few changes as possible to the existing APD tool and use a black box separation model. The main reason for this choice is the complexity of the APD tool. Due to limited resources of Delta and the focus on new functionality, changing the existing software is beyond the scope of Delta. An additional benefit of this clear separation between the APD tool and the extension made by Delta is that both parts can be developed independently.

Delta Architectural Design Document 10 APD tool Databases

User Network or Web interface Internet Delta server

Delta database

Figure 1: System overview diagram

2.3.2 Programming language

The APD tool was written in PHP version 5.3, which is a deprecated version of PHP that reached end-of-life in 2014. Delta has numerous options in choosing a programming language: 1. Also use PHP 5.3 for the Delta extension. 2. Migrate to a more recent version of PHP 5. This includes updating the APD tool to a different version of PHP 5. 3. Migrate to PHP 7, which is currently the recommended version of PHP. This includes updating the APD tool to PHP 7. 4. Use a different programming language for the Delta extension. This means that we need to find a way to communicate with the old code in another programming language. If at all possible, we want to avoid using PHP 5.3. The fact that this version reached end-of-life and does not receive any more security updates makes it a unfavorable choice. We would prefer upgrading the tool to PHP version 7, as this is the recommended PHP version. It will increase the life span and maintainability of the tool. However, this involves migrating the existing APD tool source code since PHP 7 introduces breaking changes compared to PHP 5.3. The amount of changes that need to be made is of such a scale that it would require too much time to complete this task. In addition, the lack of (unit) tests on the original APD tool makes it hard to verify whether the migration would introduce unwanted behavior. Because of this, migrating to PHP 7 is outside the scope of Delta. Using a programming language different from PHP for the Delta extension is another option. The main advantage is that we have more freedom in choosing the best tools for the tasks that we need to complete. However, part of the functionality the Delta extension needs to include requires making changes to the original APD tool. Therefore, it is required to make changes to PHP code, and is thus easier if the Delta extension and the original APD tool are written in the same language. This leaves PHP 5 as the programming language for the Delta extension as the preferred option. To increase maintainability, we still want to migrate to the most recent version of PHP 5, which is PHP 5.6. Not all libraries that the original APD tool uses—in particular the Graphviz library—did not function properly with this version of PHP. Since alternatives for these libraries require too many changes to the original APD tool source code, using PHP 5.6 is not feasible for this project. The most recent version for which all dependencies are available is PHP 5.5, which is the version of PHP the Delta extension will use.

2.3.3 Operating system

The server on which the original APD tool is hosted uses CentOS 5 as operating system. This oper- ating system has currently reached end-of-life in 2017 and is not recommended to use for security

Delta Architectural Design Document 11 reasons. Migrating to a more recent operating system is preferred. Since we have to migrate, we also decided to use Ubuntu, since it is more frequently used. Unfortunately, the operating system still needs to support PHP 5.5. This means that it is not possible to use a more recent version than Ubuntu 14.04 LTS, as this version of Ubuntu is the most recent version which supports PHP 5.5. Therefore, Delta uses Ubuntu 14.04 LTS.

2.3.4 Model-View-Controller design pattern

For the architectural design, the Model-View-Controller (MVC) [13] design pattern is used. The MVC pattern distinguishes three interconnected components: Model Deals with the data and the corresponding logic and rules. View Displays the information output of the model. Controller Acts as an interface between input devices and the model and view. It transmits input to commands for the view or model. The main motivation for applying MVC design pattern is the clear separation between data, user interface and user interaction. This separation eases simultaneous development and code reuse. When the user interacts, the controller processes this action and the model will be modified ac- cordingly. When the model changes, the controller is again notified and updates the view. An alternative to the MVC design pattern is the Model-View-Presenter (MVP) pattern. This pattern lets the view and presenter communicate with each other through an interface. This does require the view to have some logic, which we want to avoid in the Delta extension. A complete decoupling between the view and the model is also difficult to achieve. The experiment results are products of the model which the view would have to present almost as-is. With the MVC pattern, this decoupling is not needed. The three components in the MVC pattern are represented by different parts of the Delta extension. The model is the part of the back-end which updates user, experiment, and project data whenever the controllers orders to do so. In other words, all data in the databases and components of the Delta extension which are able to change this data make up the model component. The controller component consists of all our back-end handlers that handle incoming requests made by the view component. For every request there is a handler that makes the right changes to the model. Re- quests can range from a registration request to a request for interacting with experiment results. The web pages and front-end scripts together form the view component, as these parts show in- formation and an interface to the user. This works efficiently for the Delta extension as the model almost never changes without the input from a user. This way, one functionality from the tool can usually be described by a request made by the view (a script calling to the back-end), a handler working on this request by either retrieving data from, or updating data in, the model.

2.3.5 Client-server model

The extension of the APD tool also makes use of a client-server model. A user uses the web inter- face (client) to interact with the tool. The Delta server handles all requests and delivers responses. The Delta server uses the standard HTTP protocol, which is why client-server model is inherently used. It is able to handle actions of multiple users at the same time and make collaboration between different users possible. One other alternative could be the peer-to-peer model, which decentral- izes the application. Aside from providing more redundancy, the peer-to-peer model also distributes the processing of data among the peers. The Delta extension would not benefit from this, since this introduces unnecessary complexity, and thus this model is not chosen.

2.3.6 Web interface

The web interface uses the Materialize library [14] as front-end framework. Materialize is based on Material Design by Google [15] and aims to have a responsive, user-friendly interface. This framework was chosen because the design corresponds best with our preferences. Two important

Delta Architectural Design Document 12 factors in this choice are the inherent responsiveness of the framework and the ease of developing with it. The web interface is designed as a single page application. There is one main html file which consists of the navigation menu and the title bar. Within this main file the content of other pages is loaded using Javascript. This type of design was chosen as all pages have the same layout, with a navigation menu and header, thus only loading the content for each page speeds up load times. This makes for a better user experience as a user can navigate the tool more quickly. For every page where information from the back-end is needed the web interface makes use of post requests via AJAX. By using AJAX calls the web interface no longer needs to be refreshed, instead only the updated sections of the page are reloaded. Using AJAX, the traffic payload between client and server will be reduced which will lead to improved performance. Every request via an AJAX call is sent to a specific handler in the back-end. When the requested data is sucessfully returned, the web interface is updated with the retrieved information.

2.3.7 Access control

For access control, we use the role-based model, often referred to as the role-based access control (RBAC). Each role can have zero or more access permissions assigned to it. Roles are hierarchical, meaning that one role can inherit permissions from other roles. Permissions can be categorized into project permissions, experiment permissions and general permissions. Project permissions include downloading files, deleting files, deleting the project, sharing the project and creating experiments within the project. Experiment permissions include running an experiment phase and viewing the results. General permissions include, for example, enabling or disabling user tracking. Roles include the administrator, unregistered user, registered user, project owner, project observer and project member. For example, project observers can only view the results of experiments in the project, while project members can also create and run experiments within this project. Hence, the project member inherits the permissions of the project observer. The project owner inherits the permissions of a project member, while also having additional permissions such as project deletion and project sharing. An alternative to the role-based access control model is using an access control matrix. This model uses a table with a row for each user and a column for each resource (for example an experiment). For every pair of user and resources, the matrix stores which operations are allowed. Although it is easy to use this system, the matrix is very hard to maintain as the hierarchy is not saved. Because user permissions are frequently changed, especially with the sharing of projects, maintainability is an important factor in this decision. Therefore, the access control matrix was not chosen. For more details on the user roles, see Section 2.4 of the URD [9].

Delta Architectural Design Document 13 3 System context

3.1 APD tool

The Delta extension of the APD tool is isolated, but it does communicate with the original tool. As described in Section 2.3.1, the original tool is used as a black box. This means that Delta uses a facade for all communication with the APD tool. The facade implements an interface, describing all functionality that is needed from the APD tool. It mainly contains functions for starting experiment phases, checking whether they are finished and viewing their results. Expanding subgraphs is also handled with the facade. The following APD tool functionality is implemented in the facade interface: • Start the execution of the anomalous subgraph discovery phase of a given experiment. • Start the execution of the partial order discovery phase of given experiment with a specified ordering relation threshold and a frequent itemset threshold. This phase can only be started after the anomalous subgraph extraction phase has finished. • Check if the anomalous subgraph extraction phase of a given experiment is finished. • Check if the partial order discovery phase of a given experiment is finished. • Retrieve the result of the anomalous subgraph discovery phase of a given experiment. • Retrieve the result of the anomalous subgraph discovery phase of a given experiment, where specified subgraphs are expanded. The result is returned as an SVG graph. • Retrieve the result of the anomalous subgraph discovery phase of a given experiment, where only the specified subgraphs are displayed. The result is returned as an SVG graph. • Retrieve the result of the partial order discovery phase of a given experiment. The result is filtered on a given support threshold and pattern type (minimal, maximal, or all). The result is returned as an SVG graph. • Retrieve the result of the partial order discovery phase of a given experiment, where specified patterns are expanded. The result is filtered on a given support threshold and pattern type (minimal, maximal, or all). The result is returned as an SVG graph.

3.2 MySQL

MySQL is a stable, free, and open source relational database management system. Delta uses this database system to store the user and project data. PHP 5.5 has a built in interface for MySQL servers. The Delta extension of the ADP tool uses this interface—mysqli—to query data from the database and to store new data in the MySQL database.

3.2.1 Implementation of MySQL

By preparing query statements according to the needs of a back-end request, the mysqli interface can be used to prepare queries. These queries are then executed, which can result in selection, insertion, modification, or deletion of entries in the database. When entries from the database need to be retrieved, these entries can also be stored in parameters by using the mysqli interface’s functions.

3.3 SMTP mail server

Some functionality of the Delta extension depends on being able to send emails. For this purpose, access to an SMTP mail server is required. A registered email address at the mail server is used via which Delta can send emails to the users of the tool.

Delta Architectural Design Document 14 The PHPMailer library [16] is used as an interface to the SMTP mail server. Using this library, emails can be sent using a secure protocol like ‘ssl’, while still running on the older PHP 5.5 version.

3.3.1 Implementation of the STMP mail server

To send an email with the STMP mail server an email is fully prepared before sending, similar to how MySQL queries are prepared before execution. An email’s properties, such as the sender, receiver, content, and title, are filled in. When the email object is complete, a call to the STMP mail server is made to send the created email. As previously mentioned, a secure protocol to send the email can be selected.

Delta Architectural Design Document 15 4 System design

This section describes the technical aspects of the Delta extension to the APD tool. Subsections contained here will describe the design method, the decomposition of the system, and the individual components of the system.

4.1 Design method

Only the design methods and architectural styles that apply to the Delta extension of the APD tool will be described here. In order to be able to use all functionality offered by the APD tool, an interface between the back-end and the APD tool is made. This reduces coupling between the APD tool and the Delta extension, thus allowing the APD tool to be used as a black box. The facade design pattern is used to realize this interface. A method was used to separate the Delta extension into several different modules. In doing so, coupling is reduced and cohesion is increased. Furthermore, the decomposition of the system increases maintainability and verifiability through running automated tests. The following section elaborates upon the different components in the decomposition of the Delta extension.

4.2 Decomposition

The decomposition of the Delta extension into individual components is based on the user require- ments specified in the URD [9] and the specific requirements specified in the SRD [1]. The back-end of the Delta extension was split into several components, based on the functionality the Delta ex- tension will add to the APD tool. The back-end has modules for the database and the database API, as well as for handling account management, user tracking and authentication. Furthermore, modules handling projects, experiments, experiment phases and components. Lastly, experiment results will be handled in a separate module. All these modules will be detailed in the section below. The front-end of the Delta extension was split based on the different pages the Delta extension web application will consist of. The front-end was split into a module for page navigation, the home page, the project page, the experiment results page and the compare results page. Furthermore, registration and login are handled in a separate module, just like the admin tools page. The request handler module was used as to handle the communication between the front-end and back-end modules. All front-end requests are sent to the request handler, which in turn communicates the request to the appropriate back-end module. Figures are used to describe the dependencies between modules. In each figure an arrow indicates that one module depends on the other, which means that in order for a module to function properly functionality from the other module is required.

4.2.1 Back-end decomposition

All modules of the back-end decomposition will now be described. Figure2 displays the commu- nication between the back-end modules. Note that all modules of the back-end are connected to the request handler module and the database API . In order to keep the figure readable, we have not included these modules. Keep in mind, however, that they are a part of the back-end.

APD module The APD module provides an interface between the Delta extension and the APD tool. The APD module will be used to start the execution of experiment components and provide component and experiment result data. Furthermore, the APD module will be used to retrieve data on expanding subgraphs and patterns in experiment results. This data can be displayed in the user interface of the Delta extension.

Delta Architectural Design Document 16 Back-end

Authentication module

User Tracking module Account module Project module

Experiment module Experiment result module

Experiment Phase module

Component module

APD module

Figure 2: Back-end communication

Delta extension APD tool APD module

Figure 3: APD module communication

Delta Architectural Design Document 17 Back-end

Database Database API

Figure 4: Request module communication

Front-end

Navigation Registration ...

Request handler

Account module Authentication ...

Back-end

Figure 5: Request module communication

Database The database component stores the Delta extension data according to the data model. The database management system is MySQL, which provides an interface to insert, manipulate and remove data.

Database API The database API provides interfaces used to retrieve, insert, update and delete the data in the Delta database. This API is used by all back-end components. Note that these connections were not drawn in Figure2 in order to keep the figure readable, but these connections do exist. Figure4 displays the communication between the database API and the database.

Request handler module The request handler module is responsible for handling data requests from the front-end by means of the database API. It is also responsible for the communication between the front-end requests and the back-end modules. Figure5 displays the communication between the front-end, the re- quest handler and the back-end modules. The request handler module is thus used by all back-end components. Note that these connections were not drawn in Figure2 in order to keep the figure readable, but these connections do exist.

Account module The account module implements functionality and interfaces related to account management. This includes the creation and deletion of user accounts, retrieving and editing account information of user accounts as well as changing and resetting user passwords. This module handles the consent acceptance of a user.

Authentication module The authentication module is responsible for checking the access rights of users. Furthermore, it handles everything with regards to the authentication of users, including logging in and out. The authentication module checks whether a user can create, view, edit, delete or run a given object by

Delta Architectural Design Document 18 means of authentication and permission checking. The sharing of projects is also handled with this module, in close relation with the project module.

User tracking module The user tracking module tracks the activities of all users of the tool. It handles both the retrieval and deletion of user activity logs and handles whether tracking is enabled or disabled for a specific user.

Project module The project module implements functionality and interfaces related to project management, includ- ing the creation and deletion of projects and project files. It also handles the creation of a batch of experiments within the project. Note that the demo project of the tool is also managed by this module. The calculation of the comparative results of several experiments is also handled by this module, in close relation with the experiment results module.

Experiment module The experiment module implements functionality and interfaces related to experiment manage- ment. This includes the creation and deletion of experiments as well as the retrieval and deletion of the experiment log.

Experiment phases module The experiment phases module implements functionality and interfaces related to experiment phase management. This includes the creation, running, cancelling and deletion of experiment phases, which includes handling the status of the experiment phase, as well as the handling of intermediate results uploaded by a registered user of the tool. The experiment phase log entries are also handled by this module. This module operates in close relation to the component module, when running an experiment phase.

Component module The component module implements functionality and interfaces related to components, which in- cludes the creation, running, cancelling and deletion of components. It also handles the status of the component. This module works in close relation with the APD module for the running of com- ponents and retrieving their results. It also handles the uploading of a custom component by an administrator of the tool.

Experiment results module The experiment results module handles requests with regards to experiment results. This includes the retrieval and deletion of experiment results, as well as the expansion, compression and filtering of provided subgraphs and patterns. Note that this module works in close relation with the APD module in order to achieve this. It handles the retrieval of specific information of subgraphs and patterns, such as the number of occurrences and support value of the subgraph or pattern, in close relation with the component module. Exporting subgraphs to PNG or .G files is also handled by this module. Lastly, it handles the retrieval of parents and children of a given subgraph.

4.2.2 Front-end decomposition

All modules of the front-end decomposition work in close relation with the request handler mod- ule of the back-end decomposition. All requests of front-end modules are communicated to the request handler module, which then transfers the request to the appropriate back-end module. All modules of the front-end decomposition will now be described. Figure6 displays the communica- tion between the front-end modules. Note that the navigation module uses all modules except the registration and login module, because there is one main html file which consists of the navigation menu and the title bar and within this main file the content of other pages is loaded using javascript.

Delta Architectural Design Document 19 Front-end

Registration module Login module Home module Project module Results module

Navigation module Compare resuls module

User Information module

Figure 6: Front-end communication

Navigation module The navigation module handles the visualization and form validation of the main html page, the side navigation menu and the breadcrumbs menu. Furthermore, it implements requests and handles requests with regards to the retrieval of the breadcrumbs navigation of a user and page navigation between pages of the Delta extension, such as the home page, project page and experiment results page. Moreover, it handles the logging out of a user. Note that in Figure6 the navigation module uses all modules except the registration and login module, because there is one main html file which consists of the navigation menu and the title bar and within this main file the content of other pages is loaded using javascript.

Registration module The registration module handles the visualization and form validation of the registration page. Fur- thermore, it implements functionality and handles requests with regards to the registration of user accounts.

Login module The login module handles the visualization and form validation of the login page and the forgot password page. Furthermore, it implements functionality and handles requests related to the log- ging in of a user. This also includes the resetting of a user’s password, and transferring requests to view the demo project to the project module of the front-end decomposition.

Home module The home module is responsible for the visualization and form validation of the home page, the modal for project creation and the modal for project sharing. Furthermore, it implements function- ality and handles requests related to the retrieval, sorting and filtering of the projects list available on the home page. Furthermore, it handles requests with regards to the creation, deletion, search- ing, sharing and file retrieval of projects.

Project module The project module is responsible for the visualization and form validation of a project page and the modals for experiment and experiment phase creation. Furthermore, it implements function- ality and handles requests related to the retrieval and filtering of the experiment list available on a project page, as well as handling requests with regards to the creation and deletion of experiments, the running and cancelling of experiment phases. It handles the retrieval and visualization of an experiment phase status. Moreover, it handles the downloading and deletion of the experiment log and other experiment files. Lastly, it communicates requests to view and compare experiment results to the results module and the compare experiments module, respectively.

Results module The results module is responsible for the visualization and form validation of a results page. Fur-

Delta Architectural Design Document 20 thermore, it implements functionality and handles requests related to the selection, deselection, expansion, compression, filtering and searching of subgraphs and patterns. Moreover, it handles the opening of subgraphs in another tab, the downloading of result files, and the indication of par- ents and children of subgraphs.

Compare experiments module The compare experiments module handles the visualization and form validation of the compare experiments page. It is responsible for retrieving the comparative results of several experiments as well as the retrieval and sorting of the experiment list displayed on the experiment page. It communicates requests to view and compare other experiment results to the project module.

User information module The user information module is responsible for the visualization and form validation of the user information page. Furthermore, it implements functionality and handles requests related to the retrieval and editing of a user’s account information, as well as the retrieval and deletion of the user’s activity log and the changing of a user’s password. Lastly, it handles requests for user account deletion.

Admin tools module The admin tools module is responsible for the visualization and form validation of the admin page. Furthermore, it implements functionality and handles requests related to the retrieval, creation and deletion of users of the tool as well as the uploading of custom components. Moreover, it handles requests with regards to the enabling and disabling of tracking a users activities, the retrieval and deletion of a users activity log and resetting the password of a user. Lastly, it handles the uploading of custom anomalous subgraph discovery algorithms.

4.2.3 File structure

The file structure of the project is given below. Files and code from the existing APD tool are separated as much as possible from the files and code added by Delta. root ...... Application root folder database...... Database structure files Esub...... Main project folder delta ...... Delta extension source apd ...... APD tool communication esub...... APD tool interface implementation files database...... Database API exceptions...... Custom exceptions files ...... User-uploaded project files handlers ...... Request handlers images...... Static image resources model ...... Data model definition pages ...... Front-end pages requests ...... View logic of the request handlers scripts ...... Static front-end JavaScript files session...... Session utilities settings ...... Application-wide settings styles...... Static front-end CSS files utils...... General utilities execution...... Utilities related to system command execution mail ...... Utilities related to email systems validation...... Data validation GraphManager ...... Original APD tool source lib ...... External libraries fonts...... External fonts mail...... Email libraries

Delta Architectural Design Document 21 node_modules mdi ...... Material design icon resources random_compat...... Password hashing compatibility library scripts ...... Vendor JavaScript files styles ...... Vendor CSS files test...... Test files for Delta extension apd ...... APD tool communication tests database...... Database API tests demofiles...... Demo project files (test resources) model ...... Data model definition tests requests...... View logic of the request handlers tests res...... Test resources session...... Session utilities tests settings ...... Application-wide settings tests testutils...... Test utility files utils...... General utilities tests execution...... Execution tests mail...... Email system tests validation...... Data validation tests

4.2.4 Database Design

The Delta extension uses a traditional relational database model, implemented with MySQL. We use the following tables: • user Basic user account information. • user_profile Optional user profile information. • user_profile_consent Records when and how the user gave consent for the privacy statement (required for GDPR compliance). • consent_text Records the versions of the privacy notice consent text used on the registration page. • reset_token Records password reset tokens for users, if a user requested such a token. • project Records project information. • experiment Records experiment information. • experiment_status_history Records status changes for the experiment phases. • experiment_status Lists the possible experiment phase status descriptions. • experiment_phase Lists the experiment phases. The database schema with indicated primary and foreign keys is depicted in Figure7.

Delta Architectural Design Document 22 User Profile

id : int(11) User Profile Consent

user_id : int(11) id : int(11) Consent Text first_name : varchar(64) consent_text_id:id last_name : varchar(64) user_id : int(11) id : int(11) organization : varchar(255) timestamp : timestamp text : text role : varchar(255) consented : tinyint(1) creation_date : timestamp country : varchar(64) consent_text_id : int(11) city : varchar(64) method : text address : varchar(255) heard_about : varchar(255)

user_id:id user_id:id

User Reset Token User Tracking id : int(11) token : varchar(32) user_id:id user_id:id id : int(11) username : varchar(64) user_id : int(11) user_id : int(11) password : varchar(256) creation_date : timestamp operation : varchar(255) email : varchar(64) used : tinyint(1) performed : timestamp is_admin : tinyint(1)

creator_id:id user_id:id owner_id:id

user_id:id

Experiment Project Project Observer id : int(11) project_id:id id : int(11) name : varchar(64) name : varchar(64) project_id : int(11) project_id : int(11) project_id:id project_id : int(11) creator_id : int(11) creator_id : int(11) user_id : int(11) creation_date : timestamp creation_date : timestamp

experiment_id:id

project_id:id

Experiment Status History Experiment Status Project Member id : int(11) status_id:id experiment_id : int(11) id : int(11) status_id : int(11) project_id : int(11) description : varchar(256) creation_date : timestamp user_id : int(11) component_id : int(11)

component_id:id

Component Experiment Phase phase_id:id id : int(11) id : int(11) name : int(11) name : varchar(256) phase_id : int(11)

Figure 7: Database structure

Delta Architectural Design Document 23 5 Components

This section is omitted.

Delta Architectural Design Document 24 CPU 1.3 GHz x86 or equivalent RAM 256 MB Software Google Chrome version 66 or later

Table 1: Estimated minimum resource requirements for using the web application.

CPU Dual core 2.0 GHz x86 or equivalent RAM 8 GB Storage 2 TB Operating System Ubuntu Server 14.04.4 LTS Software PHP 5.5.9 with modules json, mysql, tokenizer, and openssl, Apache 2.4.7, MySQL 14.14, Graphviz 2.36.0, Java Runtime Environment 1.8.0_171, SUBDUE 5.2.2, Autodue, libgv-php5 2.20.2

Table 2: Estimated minimum resource requirements for hosting the Delta extension.

6 Feasibility and resource estimates

This section gives an estimate for the required resources to run the Delta extension to the APD tool. Based on these requirements, an estimate for the performance of the Delta extension under these resources is given.

6.1 Resource requirements

In this section, a distinction is made between accessing the Delta extension as a user via the web browser, hosting the Delta extension on a web server, and developing for the Delta extension.

6.1.1 Using the web application

The requirements for accessing the web application as a user are given in Table1. The estimates are based on the minimum system requirements for the Google Chrome web browser application. Since most of the computational work is done on the server side, little extra memory and processing resources are needed for the actual Delta extension web pages.

6.1.2 Hosting the Delta extension

The requirements for hosting the Delta extension on a web server are given in Table2. The requirements of the hosting server are based on the estimated requirements of the existing APD tool, as a large part of server-side processing is performed. Therefore, we recommend at least a dual core processor, to allow for concurrent execution of the experiment phases. Since the experiment processing requires the loading of large graph and log files in memory, we also recommend 8 GB of available RAM. See Section 6.2 for more details on memory usage. The Delta extension of the APD tool enables users to upload their own project files, on which no specific size limit is determined. Based on existing experiments on the APD tool, an experiment with both phases executed will not exceed 100 MB in size. Therefore, based on the requirement that the Delta extension supports 200 registered users, we recommend a storage size of 2 TB.

6.1.3 Developing the Delta extension

The requirements for hosting the Delta extension on a web server are given in Table3. The development environment requirements are similar to the hosting environment requirements.

Delta Architectural Design Document 25 CPU Dual core 2.0 GHz x86 or equivalent RAM 8 GB Storage 64 GB Operating System Ubuntu Server 14.04.4 LTS Software All software required for the hosting environ- ment, including PHP modules dom, phar, xdebug, PHPUnit 4.8.36

Table 3: Estimated minimum resource requirements for developing the Delta extension.

Figure 8: Massif heap memory profiler visualization.

We include a number of extra tools related to running unit tests and a debugger to ease the devel- opment process.

6.2 Performance

Using the heap memory profiling software Massif, available in the Valgrind framework [17], we analysed the memory usage of processing a single experiment. We used a log file of 49 MB, con- taining 24 traces and 903 events. The output of Valgrind is visualized using Massif visualizer [18] and depicted in Figure8. We see a peak heap usage of approximately 100 MB for a single, relatively small experiment. Note that the Delta extension allows processing of multiple experiments (possibly by different users) in parallel. Also, the log files uploaded by users can vary in size and be well over multiple gigabytes. Taking these variables into account, we recommend a sizeable memory capacity for the system processing the experiments.

Delta Architectural Design Document 26 7 Requirements traceability matrix

7.1 Software requirements to modules

SR Back-end modules Front-end modules SR-1 Database API, Account module Registration module SR-2 Database, Account module, User tracking module SR-3 Database, Account module, Authentica- Registration module, Login module, User tion module information module SR-4 Database, Account module Registration module, Login module, User information module SR-5 Database, Account module, Authentica- Registration module, Login module, User tion module information module SR-6 Database, Account module Registration module, User information module SR-7 Database, Account module Registration module, User information module SR-8 Database, Account module Registration module, User information module SR-9 Database, Account module Registration module, User information module SR-10 Database, Account module Registration module, User information module SR-11 Database, Account module Registration module, User information module SR-12 Database, Account module Registration module, User information module SR-13 Database, Account module, User tracking Admin tools module module SR-14 Database, Account module Registration module, User information module SR-15 Authentication module Login module SR-16 Authentication module Navigation module SR-17 Database API, Account module Login module SR-18 Database API, Account module User information module SR-19 Database API, Account module User information module SR-20 Database API, User tracking module User information module SR-21 Database API, Account module User information module SR-22 Database API, User tracking module User information module SR-23 Database API, Project module Home module SR-24 Home module SR-25 Home module SR-26 Database API, Authentication module Home module SR-27 Database API, Authentication module Home module SR-28 Database API, Authentication module Home module

Delta Architectural Design Document 27 SR Back-end modules Front-end modules SR-29 Database API, Authentication module Home module SR-30 Database API, Account module Admin tools module SR-31 Database API, Account module Admin tools module SR-32 Database API, Account module Admin tools module SR-33 Database API, User tracking module Admin tools module SR-34 Database API, User tracking module Admin tools module SR-35 Database, User tracking module SR-36 Database, User tracking module SR-37 Database, User tracking module SR-38 Database, User tracking module SR-39 Database API, User tracking module User information module SR-40 Database API, User tracking module User information module SR-41 Database, Project module SR-42 Database, Project module Home module SR-43 Database, Project module Home module SR-44 Database, Project module Home module SR-45 Database, Project module Home module SR-46 Project module Home module SR-47 Project module Home module SR-48 Database API, Experiment module Project module SR-49 Project module Project module, Compare experiments module SR-50 Project module SR-51 Project module SR-52 Database API, Project module Home module SR-53 Database, Experiment module SR-54 Database, Experiment module Project module SR-55 Database, Experiment module Project module SR-56 Database, Experiment module Project module SR-57 Experiment module Project module SR-58 Database API, Experiment module Project module SR-59 Database, Experiment module SR-60 Database API, Experiment module Project module SR-61 Database, Experiment phases module SR-62 Database, Experiment phases module SR-63 Database, Experiment phases module SR-64 Database, Experiment phases module SR-65 Database, Experiment phases module Project module SR-66 Database, Experiment phases module Project module SR-67 Database, Experiment phases module Project module

Delta Architectural Design Document 28 SR Back-end modules Front-end modules SR-68 Database, Experiment phases module Project module SR-69 Database, Experiment phases module Project module SR-70 Database, Experiment phases module Project module SR-71 Database, Experiment phases module Project module SR-72 Database API, Experiment phases mod- Project module ule SR-73 Experiment phases module Project module SR-74 Experiment phases module Project module SR-75 Database API, Experiment phases mod- Project module ule SR-76 Database, Experiment phases module SR-77 Database, Experiment phases module Project module SR-78 Database, Experiment phases module Project module SR-79 Database, Experiment phases module Project module SR-80 Database, Experiment phases module Project module SR-81 Database, Experiment phases module Project module SR-82 Database API, Experiment phases mod- Project module ule SR-83 Experiment phases module Project module SR-84 Experiment phases module Project module SR-85 Database API, Experiment phases mod- Project module ule SR-86 Database, Component module SR-87 Database, Component module Project module SR-88 Database, Component module SR-89 APD module, Component module SR-90 APD module, Component module SR-91 Database API, Component module SR-92 Database, Component module SR-93 Database, Component module Project module SR-94 Database, Component module SR-95 Database, Component module Admin tools module SR-96 Component module SR-97 Component module SR-98 Database API, Component module SR-99 Database, Experiment results module SR-100 Database, Experiment results module Results module SR-101 APD module, Experiment results module Results module SR-102 APD module, Experiment results module Results module SR-103 APD module, Experiment results module Results module SR-104 Experiment results module Results module SR-105 Experiment results module Results module

Delta Architectural Design Document 29 SR Back-end modules Front-end modules SR-106 Experiment results module Results module SR-107 Experiment results module Results module SR-108 Experiment results module Results module SR-109 Database API, Experiment results module SR-110 Database, Experiment results module SR-111 Database, Experiment results module Results module SR-112 Database, Experiment results module Results module SR-113 APD module, Experiment results module Results module SR-114 Database API, Experiment results module

7.2 Modules to software requirements

7.2.1 Back-end modules

Module Software requirements APD module SR-89, SR-90, SR-101, SR-102, SR-103, SR-113 Component module SR-86, SR-87, SR-88, SR-89, SR-90, SR-91, SR-92, SR-93, SR-94, SR-95, SR-96, SR-97, SR-98 Authentication module SR-3, SR-5, SR-15, SR-16, SR-26, SR-27, SR-28, SR-29 Project module SR-23, SR-41, SR-42, SR-43, SR-44, SR-45, SR-46, SR-47, SR-49, SR-52 Experiment results module SR-99, SR-100, SR-101, SR-102, SR-103, SR-104, SR-105, SR-106, SR-107, SR-108, SR-109, SR-110, SR-111, SR-112, SR-113, SR-114 User tracking module SR-2, SR-13, SR-20, SR-22, SR-33, SR-34, SR-35, SR-36, SR-37, SR-38, SR-39, SR-40 Database SR-2, SR-3, SR-4, SR-5, SR-6, SR-7, SR-8, SR-9, SR-10, SR-11, SR-12, SR-13, SR-14, SR-35, SR-36, SR-37, SR-38, SR-41, SR-42, SR-43, SR-44, SR-45, SR-53, SR-54, SR-55, SR-56, SR-59, SR-61, SR-62, SR-63, SR-64, SR-65, SR-66, SR-67, SR-68, SR-69, SR-70, SR-71, SR-76, SR-77, SR-78, SR-79, SR-80, SR-81, SR-86, SR-87, SR-88, SR-92, SR-93, SR-94, SR-95, SR-99, SR-100, SR-110, SR-111, SR-112 Account module SR-1, SR-2, SR-3, SR-4, SR-5, SR-6, SR-7, SR-8, SR-9, SR-10, SR-11, SR-12, SR-13, SR-14, SR-17, SR-18, SR-19, SR-21, SR-30, SR-31, SR-32 Database API SR-1, SR-17, SR-18, SR-19, SR-20, SR-21, SR-22, SR-23, SR-26, SR-27, SR-28, SR-29, SR-30, SR-31, SR-32, SR-33, SR-34, SR-39, SR-40, SR-48, SR-52, SR-58, SR-60, SR-72, SR-75, SR-82, SR-85, SR-91, SR-98, SR-109, SR-114 Request handler module Experiment phases module SR-61, SR-62, SR-63, SR-64, SR-65, SR-66, SR-67, SR-68, SR-69, SR-70, SR-71, SR-72, SR-73, SR-74, SR-75, SR-76, SR-77, SR-78, SR-79, SR-80, SR-81, SR-82, SR-83, SR-84, SR-85 Experiment module SR-48, SR-53, SR-54, SR-55, SR-56, SR-57, SR-58, SR-59, SR-60

Delta Architectural Design Document 30 7.2.2 Front-end modules

Module Software requirements Results module SR-100, SR-101, SR-102, SR-103, SR-104, SR-105, SR-106, SR-107, SR-108, SR-111, SR-112, SR-113 Project module SR-48, SR-49, SR-50, SR-51, SR-54, SR-55, SR-56, SR-57, SR-58, SR-60, SR-65, SR-66, SR-67, SR-68, SR-69, SR-70, SR-71, SR-72, SR-73, SR-74, SR-75, SR-77, SR-78, SR-79, SR-80, SR-81, SR-82, SR-83, SR-84, SR-85, SR-87, SR-93 Compare experiments module SR-49 Registration module SR-1, SR-3, SR-4, SR-5, SR-6, SR-7, SR-8, SR-9, SR-10, SR-11, SR-12, SR-14 Login module SR-3, SR-4, SR-5, SR-15, SR-17 Home module SR-23, SR-24, SR-25, SR-26, SR-27, SR-28, SR-29, SR-42, SR-43, SR-44, SR-45, SR-46, SR-47, SR-52 Navigation module SR-16 User information module SR-3, SR-4, SR-5, SR-6, SR-7, SR-8, SR-9, SR-10, SR-11, SR-12, SR-14, SR-18, SR-19, SR-20, SR-21, SR-22, SR-39, SR-40 Admin tools module SR-13, SR-30, SR-31, SR-32, SR-33, SR-34, SR-95

Delta Architectural Design Document 31