<<

FACULDADEDE ENGENHARIADA UNIVERSIDADEDO PORTO

A Container-based architecture for accelerating tests via setup state caching and parallelization

Nuno Miguel Ladeira Neto

Mestrado Integrado em Engenharia Informática e Computação

Supervisor: João Miguel Rocha da Silva, PhD

July 30, 2019

A Container-based architecture for accelerating software tests via setup state caching and parallelization

Nuno Miguel Ladeira Neto

Mestrado Integrado em Engenharia Informática e Computação

Approved in oral examination by the committee:

Chair: Prof. João Correia Lopes External Examiner: Prof. José Paulo Leal Supervisor: Prof. João Miguel Rocha da Silva

July 30, 2019

Abstract

In software testing, the scope of the tests can be represented as a pyramid, often called the “Test Pyramid” [31], which categorizes tests along 3 levels: Unit, Service, and End-to-end. The latter traverses the entire system attempting to simulate real user interaction and are the focus of this dissertation. The referred problem in this project will target the long execution times typically associated with end-to-end tests. This challenging and interesting problem was encountered during the de- velopment of Dendro [7], a data management platform for researchers, developed at FEUP Info- Lab [13]. With over 2000 end-to-end tests and 4 hours to complete the pipeline, the developers found it hard to get quick feedback on their work and to integrate with CI tools like Travis.CI which typically have a timeout period of 1 hour. Several tools have been developed in an attempt to optimize the execution times of tests and builds. Like: “CUT” [12]; “A Service Framework for Parallel Test Execution on a Developer’s Local Development Workstation” [35]; “Cloudbuild" [11]; and “Bazel” [2]. Although the first two target unit testing and the last two were designed to optimize builds, they all provide interest- ing solutions regarding time optimization. Some of these design solutions can be considered and adapted for optimizing local end-to-end execution. The approach implemented will: 1. accelerate end-to-end tests by using a setup caching mech- anism. This will prevent repetition patterns at the beginning of every end-to-end test by creating and saving the state previously; 2. parallelize the tests by instantiating multiple namespaced en- vironments. This would take advantage of the currently under-utilized CPU and I/O resources, which can only run a single build at a time. This dissertation aims to: develop or deploy a framework to improve the execution times of end-to-end tests; successfully completing the test stage under 1 hour so that common CI tools do not report timeout failures; publish a research paper on container-driven testing. The evaluation will be carried out through experiments with the large set of end-to-end tests for Dendro, running on the same hardware, and comparing the execution times with and without the developed solution. After a successful implementation, the tests proved that it is possible to reduce execution times by 75% only by implementing setup-caching. By introducing parallelization with 4 instances, the times can be further reduced by 66%. This solution converts a conventional run that takes 4 hours and 23 minutes to a run that takes 22 minutes with setup-caching and 4 parallel instances.

Keywords: DevOps, software testing, pipelines, end-to-end tests, setup caching, parallelization, containers,

i ii Resumo

Nos testes de software o escopo dos testes pode ser representado por uma pirâmide, também vulgarmente conhecia como a “Pirâmida de Testes” [31], a qual categoriza os testes em de 3 níveis: Unitários, Serviço e End-to-End. Estes últimos percorrem todo o sistema, procurando simular interacções reais do utilizador. Serão o foco desta dissertação. O problema referido neste projecto tem como objetivo o problema do longo tempo de exe- cução tipicamente associada aos testes end-to-end. Este desafiante e interessante problema surgiu aquando do desenvolvimento do Dendro [7], uma plataforma de gestão para investigadores, de- senvolvida no Infolab [13] FEUP. Com cerca de 2000 testes end-to-end e 4 horas para concluir a pipeline, os programadores começaram a ter dificuldades para ter rápido feedback no trabalho desenvolvido. Para além disso, a integração com plataformas de CI ficaram cada vez mais difíceis visto que tipicamente têm um intervalo de timeout de cerca de 1 hora. Várias ferramentas foram desenvolvidas numa tentativa de otimizar os tempos de execução de testes e builds. Por exemplo: “CUT” [12]; “A Service Framework for Parallel Test Execution on a Developer’s Local Development Workstation” [35]; “Cloudbuild" [11]; e “Bazel” [2]. Apesar de as duas primeiras terem como alvo testes unitários e as duas últimas terem sido desenhadas para otimizar builds, todas propõem soluções interessantes para otimizar tempos de execução. Algumas delas podem ser consideradas e adaptadas para otimizar execução local de testes end-to-end. Espera-se que a abordagem consiga: 1. acelerar os testes end-to-end através do uso do mecan- ismo de setup caching. Isto irá prevenir a repetição de padrões no início de cada teste através da persistência do estado numa fase prévia. E 2. paralelizar os testes através do instanciamento de containers nomeados. Isto tirará vantagem de possíveis recursos de CPU e I/O não utilizados, os quais, neste momento, só executam uma build de cada vez. Esta dissertação procura: desenvolver e criar uma framework para melhor a execução local dos testes end-to-end; Completar com sucesso a fase de teste em menos de 1 hora de modo a que as típicas ferramentas de CI não reportem falha por timeout; publicar um artigo relacionado com execução de testes orientado a containers. A avaliação vai ser executada através de várias experiências tendo como base a grande quanti- dade de testes end-to-end do Dendro. As experiências irão correr na mesma máquina e comparar os tempos de execução com e sem a solução desenvolvida. Depois de uma implementação bem sucedida, os testes provaram que é possivel reduzir os tempos de execução em cerca de 75% apenas por implementar setup-caching. Ao introduzir par- alelismo com 4 instâncias, os tempos são reduzidos em mais 66%. A solução converte uma ex- ecução convencial que demora 4 horas e 23 minutos numa que demora apenas 22 minutos com setup-caching e 4 instâncias paralelas.

Palavras-chave: DevOps, teste de software, pipelines, testes end-to-end, caching de setup, par- alelização, containers, Docker

iii iv Acknowledgements

In this section, I will give my acknowledgements to everyone that helped during the course of the last five years and this dissertation. First I would like to give a special thanks to my supervisor, João Rocha da Silva. He is an example of what a true supervisor should be. Always available to help, charismatic and organized. Throughout the dissertation process, he provided multiple enlightening meetings that were full of creative ideas and solutions. The results of this dissertation would not be the same without his guidance. I would also like to thank my family. Even though they do not fully understand the topics of the course and my struggles, they were always supportive and never stopped believing in me. Without their inexhaustible support, I wouldn’t be here in the first place. Finally, I would like to thank all my former colleagues. Those colleagues, which are now close friends, provided a pleasant environment—both academic and personal—where we shared both our problems and happiness. It was key to look forward and finish this cycle.

Nuno Neto

Funding Acknowledgments

This work is financed by the ERDF – European Regional Development Fund through the Operational Programme for Competitiveness and Internationalisation - COMPETE 2020 Programme, and by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia, within project POCI-01-0145-FEDER-016736.

v vi “It is hard to fail, but it is worse never to have tried to succeed”

Theodore Roosevelt

vii viii Contents

1 Introduction1 1.1 Context ...... 1 1.2 Motivation ...... 1 1.3 Objectives ...... 2

2 Background and State of the Art5 2.1 Software quality assurance ...... 5 2.1.1 Levels of tests ...... 6 2.1.2 Black box testing vs. white box testing ...... 7 2.1.3 The test pyramid ...... 8 2.1.3.1 Unit tests ...... 8 2.1.3.2 Service tests ...... 8 2.1.3.3 End-to-end tests ...... 9 2.1.3.4 Test environment ...... 9 2.1.4 Structure of a test ...... 9 2.1.5 TDD and BDD ...... 10 2.1.6 Continuous integration ...... 10 2.1.7 Continuous delivery ...... 12 2.1.8 Continuous deployment ...... 12 2.1.9 Pipelines ...... 12 2.2 Virtual environments for automated testing and deployment ...... 12 2.2.1 architecture ...... 13 2.2.2 Container-based architecture ...... 15 2.2.2.1 Orchestration ...... 16 2.2.2.2 Applications ...... 16 2.2.3 Virtual machines vs. containers ...... 18 2.3 State of the Art ...... 19 2.3.1 Discussion ...... 23

3 Approach 25 3.1 Setup caching ...... 25 3.1.1 Test execution behaviour with setup caching with 3 distinct tests . . . . . 31 3.2 Test parallelization ...... 35 3.3 Integration tool ...... 39

4 Implementation 41 4.1 Goal, tools and frameworks ...... 41 4.2 Docker-mocha ...... 43

ix x CONTENTS

4.2.1 Tests and setups file ...... 44 4.2.2 Compose file ...... 45 4.2.3 Execution and options ...... 45 4.3 Architecture ...... 47 4.3.1 Runner ...... 47 4.3.2 Manager ...... 49 4.3.3 DockerMocha ...... 49 4.3.4 NoDocker ...... 50 4.3.5 Other ...... 50 4.4 Class execution architecture ...... 50 4.5 Graph extraction ...... 51 4.6 Issues ...... 51 4.6.1 Docker for Windows and macOS ...... 51 4.6.2 Ambiguous networks ...... 55 4.6.3 No-volumes images ...... 55

5 Validation and Results 57 5.1 Preliminary test prototype ...... 57 5.2 Dendro: a research data management platform ...... 58 5.2.1 Dendro technology stack ...... 58 5.2.2 Current CI pipeline ...... 60 5.3 A preliminary benchmark ...... 60 5.4 Evaluation experiment ...... 61 5.5 Results ...... 63 5.5.1 Total execution time ...... 64 5.5.2 CPU usage ...... 65 5.5.3 Memory usage ...... 66 5.5.4 Disk read ...... 67 5.5.5 Disk write ...... 68

6 Conclusions 71

A Appendix 73

References 81 List of Figures

2.1 Test Levels [42]...... 7 2.2 Test Pyramid by Sam Newman [31, p. 234] ...... 8 2.3 Structure of a test by Gerard Meszaros [27]...... 9 2.4 Typical build pipeline ...... 13 2.5 Comparison between Virtual Machines and Docker Container based architecture [16] 18 2.6 Comparison the different existing solutions for test execution optimization . . . . 24

3.1 Example of test dependency tree ...... 26 3.2 Environment State before t1 ...... 31 3.3 Environment State before t4 ...... 33 3.4 Environment State before t2 ...... 33 3.5 Estimated end-to-end tests without state caching ...... 36 3.6 Estimated end-to-end tests with state caching ...... 36 3.7 Environment before parallel approach ...... 37 3.8 Environment after parallel approach ...... 38 3.9 Expected CPU usage after parallelization ...... 38

4.1 Docker-Mocha Architecture ...... 47 4.2 Setup Class ...... 50 4.3 Graph Example ...... 52 4.4 Docker in Windows Architecture ...... 53

5.1 Application Prototype Dependencies ...... 57 5.2 The current CI pipeline for Dendro ...... 60 5.3 Specifications of the test machine ...... 61 5.4 Total dependencies passing ...... 62 5.5 Hardware specifications of the test machine ...... 63 5.6 Software Configuration ...... 63 5.7 Time Gains ...... 65 5.8 Average CPU usage throughout the test runs ...... 67 5.9 Memory Used ...... 67 5.10 Disk Read ...... 68 5.11 Disk Write ...... 69

A.1 Docker-Mocha Class ...... 80

xi xii LIST OF FIGURES List of Tables

4.1 Additional parameters for each setup ...... 44 4.2 Additional parameters for each test ...... 44 4.3 Additional flags for docker-mocha ...... 46 4.5 Environment and Example for the modes ...... 48

5.1 Evaluation Scenario with Results ...... 63

A.1 Functions of the Manager ...... 73 A.2 Fields of Docker Mocha ...... 76 A.5 Methods of DockerMocha ...... 77 A.3 Possible arguments in docker-mocha calls ...... 79

xiii xiv LIST OF TABLES Abbreviations

API Application Programming Interface ASE Automated Software Engineering Conference BDD Behaviour-Driven Development CD Continuous Delivery CDdep Continuous Deployment CI Continuous Integration CPU Central Processing Unit DSL Domain Specific Language DevOps Development Operations HDD Hard Disk Drive I/O Input and Output IoT Internet of Things JSON JavaScript Object Notation NPM Node Package Manager OS PTP Predictive Test Prioritization SDK Software Development Kit SSD Solid-State Drive TDD Test-Driven Development UI User Interface UI User interface YAML YAML Ain’t Markup Language

xv

Chapter 1

Introduction

This chapter is dedicated to introducing this dissertation. First, it will be presented the context where it is inserted, followed by the motivation behind it and the objectives.

1.1 Context

This dissertation can be placed in software engineering research. More specifically, in the soft- ware debugging sub-area. It is also inserted in an area of engineering software called “DevOps” (Development Operations) [8]. The main goal of DevOps is to support all software development phases with automation and monitoring practices.

1.2 Motivation

With the ever-increasing complexity of software development, the automation of builds and the presence of a comprehensive test suite is essential. Not only because it is increasingly necessary to verify software reliability, but also because developers should maintain software quality at higher levels of development complexity. Testing is an essential aspect of software production. Not only it allows the developers to understand the isolated bugs that they introduce but it also allows them to understand how reliable and sturdy the system is. This is only possible if the software is covered by a good test suite. Inside these test suites, there are many types of tests: tests with a high scope and test which are very isolated and fast (Unit tests). The higher-scoped tests (also known as End-to-end tests) validate the entire scope of a system. They usually simulate entire user interactions. A consequence of this high scope is the execution time, they tend to consume a considerable amount of time. The good practices of software engineering state that in a test suite there should plenty of unit tests, and only the essential amount of End-to-End tests. If the development feedback cycle starts to suffer due to the amount of time the End-to-End tests consume, then they should be converted and replaced with more lower scope tests [31, p. 240]. But what if the developers need to have

1 2 Introduction a high amount of End-to-End tests anyway? What if they need to have this high quantity of high scope tests to validate the reliability of the system? What tools are there to allow the developers to locally optimize the execution of end-to-end tests? This work tackles the optimization of local builds where end-to-end tests are numerous and take a long time to run. Our hypothesis can be expressed along the following lines.

It is possible to reduce the overall execution time of service and end-to-end tests through setup state caching and parallel execution mechanisms, provided by sets of isolated containers.

To prove the previous hypotheses, the following research questions should be proved:

1. RQ1: Is it possible to reduce build and test execution times by using containers with state caching and/or parallelization?

2. RQ2: How does the execution time scale with the number of parallel test instances?

With the research question1 it is intended to understand it is possible to reduce the time it takes to execute the test phase of software projects with the introduction of the two mechanisms enumerated. With the research question2 it is intended to understand if the parallelization mechanism scales linearly with a negative slope or with an inverted yield by incrementing the number of parallel instances.

1.3 Objectives

In order to solve the given problem, different approaches will be explored and combined to create a complete solution. The different objectives are listed below:

1. Accelerate end-to-end tests by using a setup caching mechanism.

2. Parallelize the tests to take advantage of unused CPU and I/O resources in order to speed up builds.

3. Ensure state isolation of separate tests implementing each in different microservice groups.

The first objective relates to the optimization of end-to-end tests using a setup caching ap- proach. When starting an end-to-end test, there is usually a setup phase in which several services need to be initialized and the system put into a certain state—a process that can consume a con- siderable amount of time. Currently, this phase is constantly repeated for each test. What the Setup Caching approach will do is to create a state and save it when the setup phase of a test is concluded. The next tests will verify if the given state exists and load it if so. Using this method, the setup phase can be reduced by simply loading saved states. Repeating this process for each test allows saving a lot of time. 1.3 Objectives 3

The second objective is to take advantage of unused CPU and I/O resources by using a parallel approach. When running tests, usually only a single process is created. It is in this process that the sequence of tests will be executed. However, this process only runs in one CPU core. Multiple other CPU cores are probably available. If these CPU cores are used to run other test sequences then the overall time of the test execution can be reduced. The third objective is to use containers to isolate tests. When running tests in parallel, concur- rency problems and race conditions might become a serious issue. One way to solve this drawback is to use containers. Containers allow the isolation of any service from the current machine. By initializing multiple containers and running the tests in each isolated container, no more concur- rency problems or race conditions should occur. Also, using containers it is a good way to manage the multiple stages of the build pipeline. 4 Introduction Chapter 2

Background and State of the Art

This chapter is an overview of existing works that relate to the subject of this dissertation. The research that supports this state of the art was structured based on the initial concept idea of this dissertation. The main focus of this dissertation is the implementation of setup caching and parallelization in the test phase of a given project pipeline. The first section will discuss if it is possible to obtain software quality assurance when applied in Agile Methods. Different standards will be presented and discussed. Starting with levels of testing, followed by a comparison between Black Box testing and White Box Testing. Next, the concept of test pyramid complemented with the types of tests, according to their scope is introduced; the structure of a test, as well as the differences between Test Driven Development and Behaviour Driven development is discussed next. Finally, the concepts of Continuous Integration, Continuous Delivery, Continuous Deployment and Build pipelines are introduced at the end. The second section will discuss different infrastructures for automated testing and deployment. Each type of infrastructure will be supported with multiple platforms and solutions supporting the given architecture. The third section focuses on the main approaches used in order to optimize test execution. First, research of different scientific papers will be discussed, followed by different existing tools. After that, we present a comparison table with an evaluation and comparison between all studied solutions. A discussion section is included at the end, which lists the multiple conclusions derived from this research.

2.1 Software quality assurance

The rate at which software is produced nowadays creates a necessity to ensure software qual- ity. Sequential methods, such as Waterfall, can not provide support for continuous integration, development and delivery when creating software. To cope with rapidly changing requirements, agile methods were introduced, aiming to accelerate software development. However, with this speed-up comes the necessity of equally faster software validation processes.

5 6 Background and State of the Art

The article “Software Quality and Agile Methods” [28] identifies multiple techniques that are used to ensure software quality in agile methods. These methods belong in different development phases. For the Requirements and analysis phase, techniques like System Metaphor, Architectural Spike and On-site customer feedback can be used. In the implementation phase, different tests can be written, or, in order to directly review the code, techniques like Refactoring, Pair Programming and Stand Up meetings help to identify early bugs. Finally, in the integration phase using Contin- uous Integration and Acceptance Testing techniques can be used to test the overall quality of the system before a release or a direct integration with existing instances. After a release, there are other ways to certify software quality. Having, for example, direct Customer Feedback can be used to understand customer needs, missing features or unplanned interactions. In the context of this dissertation, the main focus is the Integration Phase, and more specifically Continuous Integration techniques, which will be addressed later in section 2.1.6. Although the topic of this dissertation is not primarily about the importance of software quality assurance, it is essential to understand that it is because of it that the problem addressed in this dissertation exists. To ensure software quality, developers are often faced with other problems— particularly, the validation phase itself, which can take a considerable amount of time. Such a long time between test execution is undesirable, as it brings unforeseen consequences like not having feedback in good time or breaking the DevOps continuous integration and development pipeline. It is also important to address that Agile Methods differ from Waterfall methods by iteratively repeating the implementation and integration phases. While in Waterfall methods these phases are sequential, in Agile they can be repeated multiple times in order to adjust the changing require- ments.

2.1.1 Levels of tests

The V-model [42] defines multiple levels of tests when validating software. The model integrates different levels for the development phase, but this section will focus on the Testing phase only. The V-Model is a standard model to follow when using Waterfall Methods. However, the Testing phase of the V-Model can be applied both in Waterfall and Agile Methods; the only difference is that in Waterfall this phase will only occur once. In Agile, however, this process can be repeated multiple times along the multiple development iterations. An illustration of the V-model can be seen in figure 2.1. The arrow defines the elapsed time along with the implementation phase. It starts with the creation of Component Tests all the way to the Acceptance Tests. Next, we will detail each of these levels of tests. Component Tests test the software units. They can also be defined as Unit tests. Their main focus is to test the software methods, modules or classes. Integration Tests test the integration between multiple subsystems. Their main focus is to validate whether the produced software is able to integrate with other software modules in order for them to cooperate with each other without introducing faults. System Tests test the overall system. Their main focus is to validate the correct bahaviour of the system after every subsystem is successfully integrated. Inside System testing, there are several 2.1 Software quality assurance 7

Figure 2.1: Test Levels [42] techniques that allow the validation of the system. One of them is Regression testing. Regression testing is a common technique that will run previously executed tests in order to verify if the new changes did not introduce bugs. Acceptance Tests cover the release of the system. They can be carried by the customer and their main focus is to validate if the released software is up to the customer requirements and needs.

2.1.2 Black box testing vs. white box testing

Black box testing, also known as Specification-Based testing [17], is a type of test that hides the test code from the user by simply feeding it with input and analyzing the output. Using black-box testing has some advantages such as being independent of how software is implemented and that the testing can occur in parallel with the implementation. However, it also brings some disadvan- tages such as the existence of redundancies between test cases. White box testing, also known as Code-Based testing [17], is another type of test case. This one allows the tester to see the implementation that would be inside a black box test. By using this method, it is possible to obtain test coverage metrics. However, using a white-box method, the relation between the implementation and tests is no longer independent. Both methods can be used in different perspectives and contexts, if there is a need to create software-independent tests, black-box testing is the best approach. If the intent of the test is to analyze the system implementation, then white-box testing is the best approach. There is no relation between these two testing methods and the types of test referred in the test pyramid. 8 Background and State of the Art

2.1.3 The test pyramid

In software testing, the scope of the tests can be represented as a pyramid, otherwise called as the “Test Pyramid” [31]. This pyramid categorizes tests along 3 levels. On the bottom reside the tests with more isolation. On the top, tests with more confidence in the system. A representation of this pyramid can be observed in figure 2.2. In the next sub-sections, it will be discussed in detail each level of the pyramid.

Figure 2.2: Test Pyramid by Sam Newman [31, p. 234]

2.1.3.1 Unit tests

This type of test resides in the lowest level of the pyramid. It provides the highest level of isolation, helping pinpoint the root cause of unexpected behaviours. They typically test a function or a method very quickly. These tests exist to give very fast feedback and they help to find the majority of bugs given their isolation and small scope. They are usually written by the same developer that designed the function/method. These tests can also be seen as validating small parts of a given service. They do not test services, but only some of its isolated and small components.

2.1.3.2 Service tests

A service test, as the name implies, tests an entire service by bypassing the UI (User Interface) layer. They provide a higher level of abstraction but less isolation than unit tests. The main purpose is to validate and find bugs in a single service, it might prove useful when testing against a multiple service platform. In terms of performance, they can be as quick as unit tests, depending on how simple the service is. E.g: it can be a simple service; a service running through a network; a service using a database; a service being virtualized or containerized, etc. 2.1 Software quality assurance 9

2.1.3.3 End-to-end tests

End-to-end tests are tests running against an entire system, attempting to simulate real user inter- actions. When running end-to-end tests it is necessary to set up the entire system including setting up the services, databases, UI’s, etc. When an end-to-end test passes, there is a high level of confidence the system works. However, when it fails, it is more difficult to identify the root cause of the malfunction. Before the actual tests and assertions are verified, these tests tend to take the highest amount of time to execute. It is recommended to reduce the number of these tests since they might slow the development pipeline.

2.1.3.4 Test environment virtualization

It is common practice in Software Testing to isolate or virtualize the testing environment, and there are multiple reasons for this tendency [31]. The first is the possibility of tests creating defects in particular running environments. They might even fail because they are running in a particular environment that is not into accordance with the test environment specifications. Another reason is the need to run the same tests across multiple operating systems. For this, virtualization and isolation strategies are usually required by the developers or De- vOps team to run this kind of tests.

2.1.4 Structure of a test

A test can be seen as a series of phases in order to report the validation status of a given feature. The author Gerard Meszaros in his book “Xunit Test patterns” talks about the Four-Phase Test Pattern [27]. This pattern is displayed in Figure 2.3.

Figure 2.3: Structure of a test by Gerard Meszaros [27]

The initial phase, Setup, is tasked with mounting the required specifications of the test envi- ronment. The Exercise phase will execute the given test in the previously configured environment. The Verify phase is where the outcome is validated with the expected values; finally, the last phase, Teardown is a shutdown phase where the original state of the system, prior to the test, is restored. 10 Background and State of the Art

2.1.5 TDD and BDD

Test-driven development (TDD) is a testing methodology that developers can adopt when building software. Instead of writing code, they first write a test that initially fails (since the code is not written yet) describing the given feature. Next, they write enough code to make it pass. That way, the developers will be encouraged to write clear, better-designed, easier-to-maintain code with lower defect counts. However, there are some disadvantages with TDD such as developers having difficulty knowing where to start; developers becoming focused on small details; and the number of unit tests becomes difficult to maintain [41,3]. Behaviour-Driven Development (BDD) is a set of practices that help software development teams create and deliver better software and faster. It is a common language based on structured sentences that aim to simplify the communication between the development team and other mem- bers like customers or project owners. BDD has several advantages, such as the reduction of waste by focusing on developing the needed features only. Another feature is the reduction of costs given the reduction of bugs discovered by tests. Additionally, it is possible to introduce easier and safer changes given the easier communication with non-developer personal. This ultimately leads to faster releases. Some disadvantages are present when using BDD as well: it needs high business commitment and cooperation; it is suited mostly for Agile development; does not work well when the requirements analysis is done by the business team alone; might lead to difficult to maintain tests [41]. The use of TDD or BDD in a project depends on the team, the project itself and the additional people associated with the project. They must understand the advantages and disadvantages of both processes and create trade-offs between them in order to understand if any of them adds value to the project development.

2.1.6 Continuous integration

Continuous integration (CI) is a set of techniques that support software development and deploy- ment and is especially relevant in agile environments. It provides support to teams with a consid- erable member size, high project complexity and rapidly-changing requirements. By building, testing and integrating the newly added or changed code to the version control repository, continuous integration provides fast feedback and synchronizes the whole development team about the current state of the project. In order for this process to work, it is usually backed up by a remote server, also known as the integration server. It is this server that is responsible for checking out the new version, building and testing it [8]. The following sections describe some of the existing Continuous Integration tools.

Codacy

Codacy 1 is a continuous integration that automatically reviews code, it provides feedback regard- ing security, code coverage, code duplication and code complexity. It works in an online platform

1Codacy https://www.codacy.com/ 2.1 Software quality assurance 11 with support for cloud services and 28 different languages. Besides the free Startup version, it also has available a Pro and Enterprise version with additional features.

Travis.ci

Travis.ci is an open-source, free service to test and deploy projects providing full continuous integration and deployment support. By updating the project to a new version into a given ver- sion control repository, travis.ci can be triggered in order to build and test the new changes [4]. If all stages succeed, the new code can be merged, deployed and reported using additional ser- vices. Some services available include virtualization, containerization and pre-installed databases. An additional Rest API is available in order to retrieve related information regarding builds and pipeline status.

Teamcity

Teamcity is a continuous integration and deployment server developed by JetBrains2. It supports several features like automatic detection of tool versions, framework testing support, code cover- age, static code analysis, etc. Provides additional interoperability between different version control services and is able to integrate between other 3rd party tools like cloud providers [25]. It has a free service, but, additional paid licenses are available which provide other premium features.

Bamboo

Bamboo 3 is a continuous integration and deployment server developed by Atlassian4. It supports multi-stage build plans, triggers and agent assignment. It can run tests in parallel and automate process providing faster feedback. Finally, it is also able to deploy the project automatically between several distributors. Only paid subscriptions are available in order to use Bamboo.

Jenkins

Jenkins is a free and open-source automation server providing hundreds of plugins to support building, deployment and automating projects. It has great support for multiple Operative Sys- tems and hardware, proving that it is an excellent solution when one of the problems is dynamic requirements. It is able to distribute work across multiple machines allowing tests, builds and deployment to be faster. In order to define pipelines and other tasks, the users can make use of the Jenkins DSL or Groovy languages. These languages allow Jenkins to encapsulate specific functionalities by parsing unique keywords [23]. In order to write the pipelines, the users must create and define them in a Jenkinsfile or im- port other external files written using the Jenkins DSL. By using external files, users can manage multiple jobs, track history, check code differences, etc.

2JetBrains https://www.jetbrains.com/ 3Bamboo https://www.atlassian.com/software/bamboo 4Atlassian https://www.atlassian.com/ 12 Background and State of the Art

With Jenkins, the pipelines can be easily structured into multiple steps and the users can be notified of their status via integration tools. Additional support for exception handling is available with simple try-catch-finally blocks. Jenkins also provides an intuitive and simple to use interface called Blue Ocean. In Blue Ocean, the graphical interface displays the pipeline structure, current stage and progress. It pro- vides logs in order to be easier to track the information regarding the progress of the pipelines. And, additionally, a basic visual editor is available as well.

2.1.7 Continuous delivery

Continuous Delivery (CD) is a set of principles that complement continuous integration. Instead of only making sure the software can be integrated and merged with a previous version, continuous delivery allows the new changes to be deployed and create a new version to be released [8].

2.1.8 Continuous deployment

Continuous Deployment (CDep) is the complement of continuous integration and continuous de- livery. While CI only assures that the new software can be integrated with a previously exist- ing version and Continuous Delivery only assures the new changes can be deployed, Continuous Deployment is the act of actually deploying the new changes into a running and production in- stance [8].

2.1.9 Pipelines

A pipeline [31, p. 217–224] is a process, made up of different stages, running as a dependency net, with parallel and sequential steps. Each stage has a different job assigned. It is very common to see a deployment pipeline in current software development projects. Current Continuous Integration, Development and Deployment tools are very useful when work- ing with considerable large teams and/or under agile methodologies. A basic deployment pipeline typically comprises 3 distinct stages. The Build Stage, where the current developed project is built and checked for setup errors: the Test stage, were all the project undergoes a considerable amount of tests of a different kind. These are the tests that will check the interaction with the software in order to detect implementation errors; finally, the Deployment stage is where the current version of the project is deployed to a machine ready to be used for the developers and end-users. It can also be used to upload to other build version control platforms. In Figure 2.4 it is possible to see a representation of a typical Build Pipeline, with a beginning and an end. Also, with a build, test and deploy stage.

2.2 Virtual environments for automated testing and deployment

This section will provide detail on several infrastructures and approaches that can be used to automatically test and deploy software. 2.2 Virtual environments for automated testing and deployment 13

Figure 2.4: Typical build pipeline

Local deployment is the easiest way of deploying software. Usually, it is only necessary to create shell scripts or even manually run tests and manually start the system. This method also works for remote machines, usually via Secure Shells. This practice, although trivial and easy to use, is not recommended when working with complex projects. Using this mechanism might ultimately result in the invalidation of the machine for other configurations, builds or setups. By installing new features and setting up the system, it is altering the operating system layer. The only solution in many cases is the re-installation of the entire operating system and consecutive installation of the platform and associated services from scratch. Another problem when using local deployment is the fact that the deployment will become dependent on specific hardware and software configurations. In this case, it will depend on the software and hardware where the application is being deployed. Eventually, it will become harder to maintain the solution and deploy it across different environments. Besides local deployment, another common solution is Virtual deployment. This solution al- lows the deployment software in disposable scenarios that are independent of the operating system and the physical hardware that they run on. There are two architectures that support Virtual de- ployment: Virtual Machine Architecture and Container-Based Architecture.

2.2.1 Virtual machine architecture

A virtual machine is a software that emulates an entire operating system on top of another oper- ating system. In the topic of virtualization, there are two concepts that need to be clarified: host and guest [31, p. 217–224]. The host is the native operating system that is installed and runs on the physical hardware. This “host” allows the instantiation of multiple “Guests”. Guests are also Operating Systems. However, instead of being originally installed in the physical machine, they are being emulated in order to run on top of the native operating system, hence the name, “Virtual Machines”. Another concept that needs to be clarified is the term: instance. When talking about an instance, it refers to a currently running virtual machine. Hosts can have multiple Guests instantiated and supported via a module called [31, p. 217–224]. The Hypervisor can be seen as the supervisor of the multiple virtual machine in- stances. It is the Hypervisor that controls and manipulates the Guests and it is also the Hypervisor that is responsible for mapping resources from the physical machine to the virtual machine. Instead of having different machines with native installations or having other operating sys- tems accessible via dual-boot methods, virtual machines make it possible to run and manage dif- ferent operating systems independently of the physical hardware configuration. 14 Background and State of the Art

This infrastructure is very useful when wanting to use different operating systems in the same machine and at the same time. Also, any alteration done inside the virtual environment will not af- fect the host. A virtual machine provides the entire software support as if it was originally installed in a machine. It supports features like desktop environments, peripherals, hardware configuration access, etc. However, given the architecture nature of these virtual machines, they can consume considerable amounts of resources and time to start, run and execute. Virtual Machines are interesting for continuous integration because they allow the CI process to become hardware and software independent. By using virtual machines as disposable build and test environments, the developers can use any machine that supports virtualization to install the test environment or deployment instance. This way, they become independent of the native operating system and do not harmfully modify the host. In order to instantiate a virtual machine, a lot of resources must be allocated. Memory is immediately allocated and reserved for that instance, resulting in a shortage of memory capacity for the host and other possible Guests to work with. In a virtual machine, the entire operating system kernel is emulated resulting in higher processing load. In terms of storage, it is very similar to the memory process. The only difference is that storage becomes indefinitely allocated to that instance, even if is not occupied. Some virtual machines also emulate complete Desktop environments resulting in allocation for graphical resources and video memory.

Virtualbox

Virtualbox is a free open-source solution for virtualizing different Operating Systems. Developed by Oracle 5, Virtualbox provides several features and supports virtualization for the most common operating systems. It provides high performance for both home and enterprise solutions [36].

Vagrant

Vagrant is a tool for building and managing virtual machines. It is free and open-source software developed by HashiCorp 6. It focuses on automation and attempts to improve setup time and production parity. It also provides portable work environments built on top of industry-standard technology [14].

VMware Workstation

VMware Workstation is a tool that provides and virtualization of software and services. It is developed by VMware 7 and distributed in two different versions: VMware Work- station Player, free and allowing only one virtual machine to be instantiated; VMware Workstation Pro, a paid solution with unlimited options and features. Besides creating and managing virtual machines it is also able to manage servers and virtual machines hosted in the cloud [46].

5Oracle https://www.oracle.com/index.html 6HashiCorp https://www.hashicorp.com/ 7VMware https://www.vmware.com/ 2.2 Virtual environments for automated testing and deployment 15

2.2.2 Container-based architecture

Containers are the main approach to support microservices as a platform. A container is also a virtual machine but in a lower level of virtualization. Many simplifications are visible when comparing a Virtual Machine and Containers. The first visible simplification is the elimination of the Hypervisor. Instead of having the Hypervisor controlling and allocating resources, con- tainers initialize their own process space and live there. For the resource allocation process, it is the responsibility of the native operating system kernel. This simplification and relocation of responsibilities allow containers to have resources dynamically allocated. The memory no longer stays immediately allocated and is able to scale with the container needs. The same happens with storage as well [31, p. 217–224]. Usually, a container does not emulate an entire operating system, it reuses the native operating system kernel and it will only emulate what it is required in order to complete the task. Some features such as the desktop environment are not supported. This results in a middle ground be- tween a local machine and a virtual machine. A container provides the isolation and replaceability required for some operations that are also provided by the virtual machines, but it also enables the container to be very efficient in terms of resources and time, much like a local machine [31, p. 217–224]. In figure 2.5 it is displayed a comparison between standard virtualization (Virtual Machines) and container-based virtualization. By the figure alone it is possible to understand just how simplified the container-based architecture is in comparison with standard virtualization.

Kubernetes

Kubernetes is an open-source system designed to automatically deploy, scale and manage con- tainerized applications, providing a software platform for a container-based architecture. It was originally developed by Google and is constantly being improved by the large community around it. One of the benefits of using Kubernetes is speed. Container software like Kubernetes needs to ensure that tools that move quickly also need to stay available for possible interaction. Another benefit is being able to scale without the need to increase the development operations team size. It is highly flexible; being able to run locally or in an enterprise environment. They can be inte- grated very easily with common continuous integration tools and have another important feature: the self-healing system. Kubernetes containers are continuously taking action to make sure they match the desired state every instant [15].

Docker

Docker is another tool that provides a solution for container-based architectures. It is developed by Docker, Inc.8 which provides a free open-source version, and an additional paid version, Docker Enterprise, aimed at large corporations. By using Docker developers can create their containers more easily using pre-existing container images available in the huge repository of Docker Hub.

8Docker, Inc https://www.docker.com/ 16 Background and State of the Art

Docker increases productivity and reduces the time it takes to bring applications to the market. It is also easily integrated with common continuous integration tools and runs in every standard operating system [39, 24,1][31, p. 217–224]. An important aspect when using Docker is the familiarization with Dockerfiles. A Dockerfile is a file that describes the steps required to create the image in question. Writing a Dockerfile usually starts by specifying the image, version and revision, in which the new image will be based from. Additional parameters allow to register environment variables, specify different users, change the working directory, run shell commands, etc [24].

2.2.2.1 Orchestration

“Orchestration is the automated configuration, coordination, and management of computer sys- tems and software” [10]. This kind of tool is usually used when automating tests and deployment. They are tools that instantiate and configure all the services needed in order to run the platform and tests in order to communicate with each other. These configurations can be port numbers, I/O locations, virtual machines or container initialization. They can be seen implemented in multiple solutions, being one of them, the container-based architecture solutions.

Vagrant orchestrate Vagrant orchestrate9 is a Vagrant plugin that allows orchestration deploy- ments to already provisioned servers. To use this plugin the developers are required to write a file in which they specify all the configuration desired to instantiate the platform using the different services. It supports a cross-platform deployment environment.

Docker Compose Docker Compose is a tool for defining and running multi-container Docker applications in a coordinated manner. The users are required to write a configuration file in which they specify all the configurations needed for the different services as well as links in order to communicate between them [39].

2.2.2.2 Applications

In this paragraph, it will be discussed multiple real applications for using containers. First, the concept of microservices will be presented, which is one of the most common applications of containers. Then other research documents will be presented that makes use of container-based architectures to solve multiple problems. Given the high level of virtualization needed to instantiate a virtual machine, it might be inef- ficient to run the continuous integration and deployment process using virtual machines. Another concept called “Microservices” helps to surpass this inconvenient. When working with microser- vices problems like resource allocation and full kernel emulations cease to exist. They can be seen as simple processes that only run the needed emulated operating system components in order to make a given service to work [31, p. 15–34].

9Vagrant Orchestrate https://github.com/Cimpress-MCP/vagrant-orchestrate 2.2 Virtual environments for automated testing and deployment 17

“Microservices are small, autonomous services that work together” [31, p. 15–34]

Microservices, as the name implies, are small services that interact and work together. They are usually very resilient, scale very well, are easy to deploy, help maximize organizational align- ment and are an easy solution to implement when considering the replacement of a previous exist- ing monolithic one. They have two key characteristics: they are small—in order to focus on doing one thing well—and are completely autonomous [31, p. 15–34]. When referring to microservices as being small it is important to understand what is small in this context. It depends a lot regarding the service that is being considered. A good implementation of a service downsizing (into multiple microservices) happens when each microservice starts to gain some independence and becomes easier to maintain. When the management complexity of having several moving parts distributed between several microservices increases, then the desired limit was reached [31, p. 15–34]. Another key characteristic of microservices is their isolation. This is an important key to describing a microservice. Typically, they can be deployed as an isolated service or might be their own operating system process. They typically have an exposed application programming interface (API) in which they use network calls in order to communicate with each other. Isolation is also very important in a microservice. A microservice must have the ability to deploy and change itself without changing anything else [31, p. 15–34]. Studies have been carried out to determine if it is possible to leverage the microservice ar- chitecture with Docker [26]. Despite identifying some limitations, this research concluded that Docker, when combined with with other tools, is able to improve efficiency. An interesting idea to use containers is to implement an Elastic Cloud platform. DoCloud [18] attempts to create one using Docker and is composed of several sub-systems, including a Load Balancer, the Monitor and Provisioning module, and the Private Docker Registry. This solution proved to be quite useful when implemented in platforms that are very inconstant regarding their users’ needs. In particular, it scales very well in cases of daily peaks or unexpected peaks. It also proved to work very well in an environment where resource requirements remain stable. Another document [37] shows an application of Docker in a distributed IoT system. Several low-level hardware components with support for Docker virtualization create a complex system that is able to provide reliability, system recovery and resilience. Another example of a real application that makes use of Docker is Dockemu [44]. Dockemu is a network emulation tool that uses Docker as the virtualization framework. The tool is able to emulate both wired and wireless networks. Although there is still room for improvements, it turned out to be efficient and accurate given the Docker virtualization architecture. The Docker application described in [35] is relevant to the context of this dissertation. It is a framework that runs unit tests in a parallel environment by using isolated Docker containers. This document will be better detailed in the next sub-section. 18 Background and State of the Art

2.2.3 Virtual machines vs. containers

Figure 2.5 represents a comparison between a common virtual machine architecture and the Docker architecture. Docker is a tool with a container-based architecture. One of the visible differences is the replacement of the Virtual Machines Hypervisor with the Docker Engine. The Docker Engine is the bridge between the User interface and the Containers. It will receive input from them and interact with the containers accordingly. Another important difference is regarding the containers. The guest OS present in the Virtual Machines is removed. The containers will use the host operating system layer and will be executed as simple processes.

Figure 2.5: Comparison between Virtual Machines and Docker Container based architecture [16]

The document [38] further compares Virtual Machines and Containers. The comparison was made by testing multiple benchmarks for an application running in a Virtual Machine and in Con- tainers. In terms of responses, the Containers were able to process 306% more requests. Making them more useful to handle peaks. In terms of memory, the study showed that the containers are able to reduce 82% of memory utilization comparing with Virtual Machines. Regarding provisioning, it was observed that containers are more flexible and are able to pre- pare and serve at higher speeds when compared with virtual machines. These speeds also reflect the failover mechanism. By consuming less time to boot up, containers provide a more reliable and faster recovery mechanism. 2.3 State of the Art 19

Not everything is perfect for containers, since the comparison also showed that some bench- marks favoured Virtual Machines; It is the case of inter-virtual machine communication. Virtual Machines proved that they are better when there is a need to perform several heavy tasks inside the same isolated environment.

2.3 State of the Art

As stated and discussed in previous sections, testing is essential in software engineering projects. If the testing process takes to much time, it might become a bottleneck in the development process, which is utterly undesirable. Multiple solutions are available. However, not all of them solve specific problems. This section will review several solutions and understand in which specific cases they are useful. A possible solution comes from the work “Test case permutation to improve execution time” [43], which focuses on studying how the cache misses affect test execution time. The authors realized that the execution priority affected the time it took to run the test suite. This happens because if several different tests are run in a non-orderly fashion, the cached data for specific instructions will expire before a similar test will be executed. In theory, if similar tests are run consecutively, the cached data will be reused and the overall execution time will be faster. They propose an algo- rithm that studies the similarities between the unit tests and attempts to minimize execution time by grouping similar tests and leaving tests with more dependencies for last. The study concluded that by reordering test execution in order to reduce cache misses does in fact matter. The study showed a boost of up to 57% in the execution time in local execution. Another proposed solution comes from the paper “Optimizing Test Prioritization via Test Dis- tribution Analysis” [6]. This project proposes a PTP (Predictive Test Prioritization) technique based on machine learning. It is mainly focused on projects that constantly use regressive testing techniques. Its main approach is to reorganize test execution in order to detect failures earlier and to provide faster feedback. To evaluate and reorganize the unit tests order, it uses three heuristics: test coverage; testing time; coverage per unit time. These heuristics are only to study and evaluate each test. They will help develop the prioritization order when applied with the different Algo- rithms proposed. In terms of Algorithms, they propose three different algorithms based on three types of prioritization techniques:

• Cost-unware refers to techniques without balancing testing time and other factors

• Cost-aware which balances testing time and other factors;

• Cost-only refers to techniques that only balance testing time.

To evaluate the proposed scenario, over 50 GitHub open-source projects were tested in a local execution environment. The results proved that in general, the developed model was able to suc- cessfully predict the optimal prioritization algorithm for the given project. 20 Background and State of the Art

“Prioritizing Browser Environments for Web Application Test Execution” [21] is another pro- posed solution. This case the focus is to prioritize the browser environments. As client-side appli- cations get more complex and changes become more frequent, the tests also get more complex— that is where regression testing techniques are used intensively. However, these techniques are resource-intensive and don’t provide fast feedback for the developers. In this paper the focus is on web applications, so the testing environment is mainly the browsers. Since there are dozens of browsers nowadays, testing an application in all of them is very time- consuming and the faulty tests only start being reported in the end. The final goal is to make the tests in the browser environments to fail earlier. This paper proposes 6 techniques and compares with 2 more. All of them will take into account the browser fail history to choose between the different approaches. The two baseline techniques considered in the comparison are “No prioritization”, where where no optimization is carried out, and “Random”, a reordering technique that chooses the browser order randomly. Six possible techniques were then compared to the baselines:

• Exact matching-based prioritizes the browsers that failed recently;

• Similarity matching considers similarities between the browsers and gives priority to browsers that share more similarities;

• Failure frequency considers the environments that overall fail more and assigns them a higher priority;

• Machine learning learns the failure pattern and attempts to predict the failure probabil- ity for each browser. Assigns higher priority to the browsers that have a higher predicted probability of failing;

• Exact matching + Failure frequency is a combination of two previous techniques;

• Exact matching + Machine Learning is a combination of two previous techniques.

Overall, the 6 proposed techniques worked better in failure detection for the tested web apps when compared with the random and no prioritization techniques. However, it depends on the project being tested. “CUT: Automatic Unit Testing in the Cloud” [22], is an automatic unit testing tool designed to provide both virtual machine and container environments to support unit testing. The main goal is to minimize test suites and regression testing selection. The solution makes several advances: full automation transparency, which automatically distributes the execution of tests over remote resources and hides low level mechanisms; efficient resource allocation and flexibility providing the ability for the developer to allocate resources and choose if they want to run locally or remotely; test dependencies, the tool takes into account the dependencies between test cases to achieve deterministic executions. The tool is developed in Java and uses JUnit as the testing framework. The distribution environment consists of using and reusing several Docker containers and vir- tual machines available in the cloud. By using this distribution and virtualization mechanisms, 2.3 State of the Art 21 the tool proved to reduce by half the execution time by doubling the number of containers until 8 concurrent containers were running. No more concurrent containers were tested. “Test suite parallelization in open-source projects: A study on its usage and impact” [5] is a paper that studies the usage and impact of test suite parallelization. The study discovered that the developers prefer high predictability than high performance and that only 15.45% of large projects use a parallel testing approach. The developers also assert that they do not use more parallelism because of concurrency issues and the extra work in preparing the test suite. It was tested in open-source Java projects with the main goal of speeding up test execution. The analysis conducted considered four factors to evaluate the scenarios tested:

• Feasibility, which measures the potential of parallelization to reduce testing costs.

• Adoption, which evaluates how often the open-source projects use parallelization and how the developers involved will perceive the technology involved.

• Speedup, which measures the impact in terms of execution time when running the tests in parallel.

• Trade-offs, which evaluates the observed impact between the execution time and possible problems that emerged.

The testing framework used for the tests was JUnit given that the projects were mainly Maven projects. The study considered four methods when testing the parallelization suite:

• Fully sequential, where no parallelism is involved;

• Sequential classes and parallel methods, a configuration where the test classes are run sequentially and the test methods in parallel.

• Parallel classes and sequential methods, a configuration where the test methods are run sequentially and the test classes in parallel.

• Parallel classes and parallel methods, both classes and methods of the test are run in parallel.

The study concludes that, on average, using a parallel approach optimizes the execution time by 3.53%. In the tested projects, around 73% can be executed in a distributed environment without introducing of flaky tests (tests that can pass or fail in the same configuration). Another solution is “A service framework for parallel test execution on a developer’s local development workstation” [34]. This solution proposes a service framework for parallel test ex- ecution by using Virtual Machines and also Docker Containers. The framework was created to support agile development methods such as Test-Driven Development and Behaviour Driven De- velopment, which were already discussed in this chapter. When tackling complex and large projects, the main problem faced by developers when im- plementing parallelism is concurrency in the file system, resources and databases. There is a need to isolate multiple services in order to prevent race conditions and fixture dependencies. 22 Background and State of the Art

The main goal of this framework is to solve problems like faster feedback, fest execution environments on the go, avoiding fixture dependencies and race conditions and providing instant emulation of real-world scenarios. Conversely, the main challenges faced with this framework were providing process emulation, the existence of race conditions in tests and the challenges in providing emulation of real-world environments. The developed architecture is composed by several modules: the parallel runner, which is tasked with kicking off multiple instances, the test job and target configuration, that provides the test list with the configurations required, and the worker template generator, tasked with generating the blueprint for the environment. One of the features required by the users is the ability to manually configure the test groups (each group runs in parallel) and the running environment. The framework was therefore designed to support projects with multiple services running such as databases, which run in a separate environment and in their own server. The tests were executed using PHPUnit complemented with a PostgreSQL database service. Execution time dropped from 45 min to 15 min. “JExample”: Exploiting dependencies between tests to improve defect localization” [20] is a paper that proposes a solution to improve defect localization. It is a framework of JUnit and proposes several new tags. By using those tags, JExample is able to recognize test dependencies and their relationships. The framework proved to work very well and reduce the overall time of the test suite by simply ignoring tests in which their parent test failed. Another solution is “Finding and Breaking Test Dependencies to Speed Up Test Execution” [19]. This solution attempts to speed up test execution by devising a test detection technique that can suitably balance efficiency and accuracy. The ultimate goal is to find test dependencies and break them. The algorithm proposed creates a directed acyclic graph, where tests are the nodes and dependencies are the edges, and then creates a cluster of tests that can be run in parallel. In order to execute the multiple independent tests, the solution first runs a topological sort algorithm to linearize them and obtain a schedule which respects the test dependencies and enables parallel test execution. Conclusions in terms of efficiency and time proved to be counter-intuitive. However, the solution proved that it is possible to run tests in parallel by breaking down test dependencies, all without introducing flaky tests. An important solution to discuss is “CloudBuild: ’s Distributed and Caching Build Service” [11]. This project was developed by Microsoft and was design to be a new in-house continuous integration tool. CloudBuild has several main goals: to execute builds, tests, and tasks as fast as possible; to be on-board as many product groups as effortlessly as possible; to integrate into existing workflows; to ensure high reliability of builds and their necessary infrastructure; to reduce costs by avoiding separate build labs per organization; to leverage Microsoft’s resources in the cloud for scale and elasticity, and to consolidate disparate engineering efforts into one common service. Regarding its design, it follows several principles:

• Commitment to compatibility: given its goal of integration with existing build pipelines, it must be compatible with other tools and SDKs 2.3 State of the Art 23

• Minimal new constraints: the system must be designed to assist legacy specifications and to maintain them without losing constraints.

• Permissive execution environment: CloudBuild ensures that the build results are deter- ministic

It supports a wide range of testing frameworks and languages such as VSTest, nUnit and xUnit. It has a different set of models in terms of architecture. First, it is built on top of Autopilot from Microsoft. Then, several worker machines are created and managed. These workers are responsible for the builds. In order to detect build dependencies, it extracts the dependency graph and plans the build distribution according to that graph. For several builds, it uses build caches. What this means is that if there is a task with the same input as another task, it uses the cache for that build if available. The results of the tested scenarios reported that with CloudBuild, the execution time can improve from 1.3 to 10 times the original execution speed. Overall, Microsoft CloudBuild is not a solution for tests. It is a solution to improve overall continuous integration execution time, with special focus on the build phase. For the build, it uses several approaches. Parallelization, with dependency graph extraction, and build caches. These approaches help to increase the overall continuous integration pipeline execution time. “Bazel” [2] is another solution that aims to improve continuous integration execution time. Bazel is open-source and developed by Google. It uses high-level build languages and supports multiple languages and outputs formats. Bazel offers several advantages:

• High-level build language, by using an abstract and human-readable language to describe the build, it provides flexibility and intuition for the users.

• Fast and reliable, caches previously done work and track the changes. This way it only rebuilds what is necessary

• Multi-platform, running on all of the most common Operating Systems

• Scalable, it works well in complex projects.

• Extensible, it supports a wide range of languages.

Bazel is a tool to optimize continuous integration time. However, it has a special focus on the build phase. It is also possible to write tests using Bazel with a parallel approach. No results were found comparing Bazel before and after build execution times. Another solution is “SeleniumGrid” [40]. This solution runs several End to End tests over a Network grid in a parallelized environment. No results were found comparing SeleniumGrid before and after build execution times.

2.3.1 Discussion

Figure 2.6 is a table comparing the different existing tools. This comparison will serve as a guide and comparison data when introducing the objectives and goals of this dissertation. 24 Background and State of the Art

Figure 2.6: Comparison the different existing solutions for test execution optimization

The evaluation was made using four evaluation topics: The first (Type) checks if the tool is developed for a scientific context or industrial. The second topic (Approaches) relates to the approaches that the solutions use to improve test execution time. The third topic (Types of Test) evaluates the solutions in terms of types of tests supported. Finally, the Execution Environment topic defines if the solution is designed to support local, cloud-level test execution, or both. For the purpose of this dissertation, service test or end to end tests reside in the same evaluation space. They can be considered the same, given their similarities in terms of structure and execution process. The last topic relates to the place of execution. Which can be a local execution in the developer’s workspace or in the cloud. The first conclusion that it is possible to take is that there is a lack of tools in the industrial context that attempt to optimize end-to-end tests. The majority is made only for academic purposes which creates opportunities to explore the problem referred in a more technical and industrial context. Secondly, the parallelization approach is the most common one, followed by a prioritization technique. The setup caching technique is only used for Microsoft CloudBuild and Bazel, but unfortunately only for the build process. In terms of types of tests, Unit tests are preferred. leaving Service tests and End-to-End tests with a lack of support for improvement. Regarding the execution place, local execution is widely favoured, with cloud execution falling short. Another important conclusion is that tools that support execution of Service and End-to-End tests are mostly associated with the Cloud execution environment. This might be due to the large amount of resources and time that the execution of Service and End-To-End tests take to execute. Chapter 3

Approach

The state of the art analysis revealed that the current solutions for test optimization widely support Unit Tests and local execution. However, regarding non-Unit tests, their support is scarce. Also, the tools that do focus on optimizing non-Unit tests are mostly associated with cloud execution. When analyzing the supported approaches, it is possible to conclude that Parallelization and Pri- oritization are the preferred alternatives, leaving out Setup caching and focusing more on build optimization through caching of build stages and products. With this in mind, there is an opportunity to explore local non-Unit test optimization using setup caching and parallelization approaches. This chapter will describe with more detail the application of the Setup caching and Parallelization approaches in terms of implementation.

3.1 Setup caching

Some build optimization tools already use the concept of “Setup Caching”, an approach that con- sists in saving and reusing versions of builds. These builds do not relate to versions; a build version might be composed by multiple smaller builds corresponding to independent modules or external dependencies. Cloudbuild [11] and Bazel [2] use this concept when creating new software versions. Instead of always recompiling the different modules and dependencies on every build, they save them and only recompile them when absolutely necessary. The setup caching approach that will be discussed in this section and implemented in the test phase is slightly different. A test can be divided into 4 Phases [27]: Setup; Exercise; Verify; Teardown. Assuming that the setup phase can be very time-consuming in non-Unit tests and assuming that some tests share equivalent setup phases, saving that setup state (before running the test) will allow the other tests to simply load it and run it more quickly. In short: save the setup state of each test in order for the other tests to load it and take less time to setup.

25 26 Approach

Test dependencies

In order to represent and discuss the next algorithms and figures, it is established that Test cases are represented by ti and Setup states are represented by si.

To represent the 4 Phases of a test, it is established the following norm: For setup, Setup(si); for exercise, Exercise(ti); For verification, Veri f y(); For Teardown, Teardown(). In normal executions, if a test needs to be executed, then a setup must be executed first. The problems begin when the tests are dependent on a setup dependency tree. What this means is that some tests will require the system being tested to be in a certain setup state so they can be executed. These setups are responsible for creating the correct system state in which the tests are supposed to be exercised and correctly validated. Figure 3.1 displays an example of a Setup and Test dependency tree. It is possible to observe that t1 depends on the execution of s1, the same happens for the remaining tests as well: t2 depends on s2; t3 depends on s3; and t4, t5 depend on s4. It becomes clear that the issues are not limited to the tests, as Setup states also depend on other

Setups. For example: s4 depends on s2 which subsequently depend on s1.

Figure 3.1: Example of test dependency tree

In a normal execution scenario—with no state caching—to run every test, the setup states would have to be re-executed multiple times. The algorithm1 represents the steps that would be necessary to execute the tests in the dependency tree 3.1 if no setup caching is implemented. For simplification purposes, it is only used the functions directly associated with the 4 Phases of test Execution (Setup, Exercise, Verify, Teardown). The algorithm represents an execution where no setup caching is present. By allowing the re-execution of multiple setups, the algorithm repeats this step multiple times. For the five tests, s1 is executed 5 times, s2 is executed 4 times and s4 is executed 2 times. Only s3 is executed once. In many cases, the Setup phase can take several minutes to execute, delaying the exercise of the test. If to exercise a test it is required to execute several expensive setup phases, the whole test suite can be delayed to the point of not providing feedback to the developers within reasonable time. 3.1 Setup caching 27

Algorithm 1: Case 0: Normal execution (no-cache) 1 Function ExecuteTestSuite(): # Executing test 1 2 Setup(s1) 3 Exercise(t1) 4 Verify() 5 Teardown() # Executing test 2 6 Setup(s1) 7 Setup(s2) 8 Exercise(t2) 9 Verify() 10 Teardown() # Executing test 3 11 Setup(s1) 12 Setup(s2) 13 Setup(s3) 14 Exercise(t3) 15 Verify() 16 Teardown() # Executing test 4 17 Setup(s1) 18 Setup(s2) 19 Setup(s4) 20 Exercise(t4) 21 Verify() 22 Teardown() # Executing test 5 23 Setup(s1) 24 Setup(s2) 25 Setup(s4) 26 Exercise(t5) 27 Verify() 28 Teardown() 28 Approach

What it is proposed by implementing a setup caching approach is the execution of each unique Setup only once. This can be accomplished with the assistance of Store and Load operations. Typically associated with Disk or Memory. The algorithm2 represents what should be expected after the implementation of the setup caching approach. In this case, several setup executions were suppressed. To execute a test now, it is only required to execute one setup or none at all (in the case where the setup is not cached or in the case where the setup already exists and it is simply loaded). This is only possible due to the introduction of two new functions: Load and Save. These two functions will allow to load setup states and save them, respectively. It is possible to see that by using a caching approach multiple Setup executions were sup- pressed. In the case of t5, no setup was executed at all. Given that the setup in which it depends

(s4) was already executed and cached in the previous execution of t4. The algorithm3 was developed in order to support the implementation of setup caching. This algorithm will allow the creation and load of setup states in normal test execution. The algorithm was written in pseudo-code and assumes a series of simplifications in order to be more perceptible. Before explaining the algorithms further, the primitive Setup must be introduced. This data structure represents a Setup state (also represented by a circle in the dependency tree) and contains two internal variables: parent and state. The parent variable is a reference to the immediate parent of the dependency tree. Only one parent or null is allowed. If it is null then it means that that Setup is currently the root and thus has no parent. The state variable represents the data associated with the caching of the setup. If this variable is null then the setup is not cached. The algorithm3 is responsible for executing a test. The first step is to load the setup state and verify if it exists. If positive, then it will be restored followed by the exercise of the test, verification and teardown. If the state does not exist, the latest ancestor state will be loaded (detailed in Algorithm4) and the intermediate states will be created (Algorithm5). Only then the sate is restored and the Exercise, Verify and Teardown are executed. Algorithm4 is responsible for getting the latest cached state that belongs to the hierarchy needed by a given test. It will iterate over the dependency tree and verify if it exists. When the setup is found it is returned. If it reaches the parent of the root then returns null. This case happens when there is no saved state and all of them are required to be built from the root. Algorithm5 is responsible for building all the required states since the latest in cache (can be null) up to the current required setup for the test. It will first retrieve all the intermediate setups between the last cached one and the one required for the test. This will be done using the auxiliary algorithm6. After retrieving the list with all the intermediate setups, the list will be iterated and each setup will be executed and saved. It is the function setup() that is responsible for executing and saving the state. Finally, the last setup –the one required by the current test– will be returned. The algorithm6 is responsible for creating and returning the list with the intermediate setups between two setups. It will push setups to the list while iterating the dependency tree. When it reaches the root or reaches a given specified setup, it will break and return. 3.1 Setup caching 29

Algorithm 2: Case 1: Execution with the implementation of setup caching 1 Function ExecuteTestSuite(): # Executing test 1 2 Setup(s1) 3 Save(s1) 4 Exercise(t1) 5 Verify() 6 Teardown() # Executing test 2 7 Load(s1) 8 Setup(s2) 9 Save(s2) 10 Exercise(t2) 11 Verify() 12 Teardown() # Executing test 3 13 Load(s2) 14 Setup(s3) 15 Save(s3) 16 Exercise(t3) 17 Verify() 18 Teardown() # Executing test 4 19 Load(s3) 20 Setup(s4) 21 Save(s4) 22 Exercise(t4) 23 Verify() 24 Teardown() # Executing test 5 25 Load(s4) 26 Exercise(t5) 27 Verify() 28 Teardown() 30 Approach

Algorithm 3: New Execute Test Function with Setup Caching 1 Function Execute(test): # try to get the test setup state 2 setup ← getSetup(test) # if the setup state does not exist, create it using the latest on 3 if setup.state() is null then 4 lastSetup ← getLastCachedState(setup) 5 setup ← buildStates(lastSetup, setup) # put the system in the correct state 6 restoreState(setup)

7 Exercise(test) 8 Verify(test) 9 Teardown()

Algorithm 4: Get Last Cached State 1 Function getLastCachedState(setup):

2 if setup is null then 3 return setup

4 while setup.state() is null do 5 setup ← setup.parent()

6 if setup is null then 7 break 8 end

9 return setup

Algorithm 5: Build States Function 1 Function buildStates(lastSetup, currentSetup):

2 intermediateSetups ← getIntermediateSetups(lastSetup, currentSetup)

3 forall setup in intermediateSetups do 4 setup.setup() 5 end

6 return currentSetup 3.1 Setup caching 31

Algorithm 6: Get Intermediate Tests 1 Function getIntermediateSetups(lastSetup, setup):

2 intermediateSetups ← []

3 if setup is null then 4 return intermediateSetups

5 while setup is not lastSetup do 6 intermediateSetups.push(setup) 7 setup ← setup.parent() 8 if setup is null then 9 break 10 end

11 return intermediateSetups

3.1.1 Test execution behaviour with setup caching with 3 distinct tests

In this section, it will be presented the expected behaviour of executing 3 distinct tests with the algorithms presented above that support the setup caching approach. These tests will be executed sequentially and in the same shared environment. They follow the same dependency tree of figure

3.1. The tests to be studied will be t1, t4 and t2, in that order.

Executing Test 1

The execution environment before executing t1 is presented in figure 3.2. Since this is the begin- ning of the execution, there are no setup states cached –represented by the dotted circle around the setups.

Figure 3.2: Environment State before t1

The log regarding the execution of t1 is presented in algorithm7. The variable snull will be displayed in multiple occurrences. This variable represents the parent of the root Setup State. So, in this case, it will always have null value. By analyzing the log, it is possible to conclude that it is a simple execution. The algorithm executes t1. That test needs the setup s1. That setup is not cached so it will be built. Given that 32 Approach

Algorithm 7: Test 1 Execution

1 Execute(t1)

2 s1 ← getSetup(t1)

3 s1.state() is null

4 snull ←getLastCachedState(s1)

5 snull ← s1.parent() 6 snull is null 7 return snull

8 s1 ←buildStates(snull, s1)

9 intermediateSetups ←getIntermediateSetups(snull, s1)

10 s1 is not snull 11 intermediateSetups.push(s1) 12 snull ← s1.parent() 13 snull is null 14 return intermediateSetups

15 intermediateSetups.pop() 16 s1.setup()

17 return s1

18 restoreState(s1)

19 Exercise(t1) 20 Verify(t1) 21 Teardown() 3.1 Setup caching 33

s1 is the first in the dependency tree it will be the only one executed and saved. The system is restored based in that setup and t1 can finally be exercised, verified and teardown.

The resulting environment after the execution of t1 is represented in figure 3.3.

Executing Test 4

The execution environment before executing t4 is represented in figure 3.3. Only the setup s1 is cached following the previous execution of t1.

Figure 3.3: Environment State before t4

The log regarding the execution of t4 is presented in algorithm8. By analyzing the log, it is possible to conclude that this was a more complex execution. In this case, the algorithm should execute t4, which needs the setup s4. That setup is not cached and needs to be built. However, the creation of s4 depends on the existence of s2 which depends on the existence of s1. The algorithm successfully identifies s1 as a state that already exists and focus on creating only s2 and s4. After executing both setups, the system is restored based on s4 and the t4 can finally be exercised, validated and teardown.

The resulting environment after the execution of t4 is represented in figure 3.4.

Executing Test 2

The execution environment before executing t2 is represented in figure 3.4. Three states are cached following the execution of t1 and t4: s1, s2 and s4.

Figure 3.4: Environment State before t2 34 Approach

Algorithm 8: Test 4 Execution

1 Execute(t4)

2 s4 ← getSetup(t4)

3 s4.state() is null

4 s1 ←getLastCachedState(s4)

5 s2 ← s4.parent() 6 s2.state() is null 7 s1 ← s2.parent() 8 s1.state() is not null 9 return s1

10 s4 ←buildStates(s1, s4)

11 intermediateSetups ←getIntermediateSetups(s1, s4)

12 s4 is not s1 13 intermediateSetups.push(s4) 14 s2 ← s4.parent() 15 s2 is not s1 16 intermediateSetups.push(s2) 17 s1 ← s2.parent() 18 s1 is s1 19 return intermediateSetups

20 intermediateSetups.pop() 21 s2.setup() 22 intermediateSetups.pop() 23 s4.setup()

24 return s4

25 restoreState(s4)

26 Exercise(t4) 27 Verify(t4) 28 Teardown() 3.2 Test parallelization 35

The log regarding the execution of t2 is presented in algorithm9.

Algorithm 9: Test 2 Execution

1 Execute(t2)

2 s2 ← getSetup(t2)

3 s2.state() is not null

4 restoreState(s2)

5 Exercise(t2) 6 Verify(t2) 7 Teardown()

This is the most simple case possible and represents the advantages of having setup caching implemented. By executing t2, the setup needed (s2) already exists. So, the system is restored based on that state and the test is exercised, verified and teardown.

If this dependency tree at this phase had multiple tests depending on s1, s2 or s4, this would be the expected behaviour. Simple restore of setup and execution of the test.

Setup caching load estimation

By applying the caching approach that is present in some tools to optimize build execution time to individual tests, the test phase execution time can be considerably reduced. Simply put, if a test saves the state of all its dependent services after its Setup stage finishes, the following tests that require that state only need to load it and proceed to the Exercise phase. To explain the next comparison, it is assumed that all the tests will depend on the same state. Figure 3.5 represents the estimated CPU loads of end-to-end execution without state caching. In this scenario, it is possible to conclude that each test takes approximately the same amount of time to execute given the necessity to run the setup from scratch at the beginning of each test. In Figure 3.6, the same tests are executed. However, after the first test, the estimated behaviour of the execution time of each test is much smaller. This is due to the fact that after the first test, the state was saved, so, every following test will load that state at the beginning and run the tests. The usage peaks represent the initialization of services that will execute the test and the low usage represent the execution of all the test phases.

3.2 Test parallelization

Given the multi-core architecture of most current CPU’s and the increasing bandwidth of storage solutions such as SSD’s, it becomes possible to execute multiple processes in parallel, even when intense I/O requirements are present. As such, parallelization approaches are widely used across different test and build tools, supporting both unit and non-Unit tests. The concept of test parallelization is to run more that one test at the same time to take ad- vantage of all the existing CPU cores and I/O operation. If the test phase is segmented in several 36 Approach

Figure 3.5: Estimated end-to-end tests without state caching

Figure 3.6: Estimated end-to-end tests with state caching 3.2 Test parallelization 37 processes, more CPU resources will be used and it will finish sooner. The importance of paral- lelization is prevalent in microservice architectures, which have the interesting benefit of scaling in a more elastic manner than its centralized counterparts. After starting the test phase, the platform environment needs to be set up. This means the start- up of multiple services, which will run on the application layer of the Operating System where the test phase was deployed. If the test phase is separated into multiple parallel test processes (each parallel process has a different and unrepeatable set of tests), the tests can suffer from concurrent modifications and race conditions if they, for example, all try to modify the same database running on one of these services. As such, concurrent modifications happen when multiple entities want to interact with the same services at the same time, failing the tests when they would otherwise pass if ran sequentially. In order to prevent the problems associated with parallelization, the environments need to be isolated. Creating an isolated environment in which multiple services are set-up and can interact with each other without affecting the external system. If these isolated environments are repli- cated, it is possible to execute multiple sets of tests in parallel without the associated problems of parallelism. The tests will return the expected outcome and the resulting test suite will be concluded faster. Figure 3.7 represents an example of the current test environment. The multiple system services— and the system itself—are set up in the application layer of the Operating System. Only one instance is able to run at every given time because with parallelization in place, concurrency problems would occur. If multiple tests were running against that environment, then multiple modifications would take place and the tests would start to fail.

Figure 3.7: Environment before parallel approach 38 Approach

Figure 3.8 represents a possible example of the test environment when implemented with parallelization. The execution environment is replicated into multiple isolated environments. If one test is executed in one of these environments then no concurrency problems would occur. Both environments are based on the dependency tree 3.1

Figure 3.8: Environment after parallel approach

This parallel approach is to be implemented with setup caching techniques already discussed in the previous section to take advantage of under-utilized CPU resources, as shown in 3.6. By running multiple isolated environments and parallel tests, the expected CPU usage should be as represented in Figure 3.9

Figure 3.9: Expected CPU usage after parallelization 3.3 Integration tool 39

3.3 Integration tool

The plan for this dissertation is to also create an integration tool that automatically creates isolated environments on-demand and creates multiple test processes from the original test suite. This will be a manager service, able to: create and initialize isolated environments; set up the application environments being tested; order test execution in the isolated environments; save setup states state, and load cached states. 40 Approach Chapter 4

Implementation

This chapter presents some details on the implementation of the project, including issues and challenges. First, the goal of the solution is presented, and then the chosen tools and frameworks. Finally, the developed solution created is described.

4.1 Goal, tools and frameworks

In an initial phase it was decided to support NodeJS [45] applications, due to the fact that the test-bed application, Dendro, is powered by NodeJS. Moreover, there is great interest in this tech- nology, which has great support from a large community of developers. NodeJS is a Javascript run-time that uses an event-driven, non-blocking I/O model. It proves to be a good solution for data-intensive and real-time applications spread across distributed de- vices. It is open-source, cross-platform, and designed for developing server-side and networking applications. It is also important to introduce NPM [33] (Node Package Manager), which is a package manager for NodeJS. NPM contains over 800.000 packages, created by the community and shared with everyone for free. One of the goals for this dissertation is also the creation of an NPM package with the developed solution in order to share it with the community. By doing this, it will be possible for more developers to easily integrate the solution in their applications. For the test Framework, the choice was to support MochaJS [30]. MochaJS is a Javascript test framework created for NodeJS but runs both in NodeJS applications or in browsers. Mocha sup- ports both Unit, Service and End-to-end tests; produces very detailed reports, including Exception back-logs and individual test execution time. Mocha is the test framework used by the test-bed application used to validate this dissertation. For the infrastructure platform, the requirements covered several topics: Isolation, Network- ing, State Caching, Parallelization, Speed and Lightweight. In terms of Isolation, the infrastructure needs to set up replicated environments, which do not depend on the Operating System, hardware configuration, or other environment variables.

41 42 Implementation

The need for isolation is important because the tests must not fail because of changes in these environmental variables. Networking is also important, as it is directly associated with isolation. The main modules that will be set up to create the test environment are services on which the tested application relies. These services are necessarily associated with one or more network ports. By default, in one Operating System, there is only one network interface associated with user services. This networking interface has a list of available ports to serve. Each of them can only be allocated to one service. If multiple services are replicated, those clones cannot listen on the same port. A naive solution would be to manage the free ports and attribute them to different replicas of the same service. This is a bad practice, as it would be very hard to manage. Allocating random ports on demand for services would mean that every service needs to know the new ports to communicate with, instead of the default ones. Since an environment can have multiple services and each service can listen in multiple ports, managing them all can be very challenging. Other issues are related to the direct association of these ports with services: it is much easier for the user to debug when they know which port belongs to which service. Finally, an obvious problem is the actual limit of the number of available ports. A better possible solution should involve the creation of sub-networks, virtualized in the host. These networks have to provide the basic infrastructure necessities to set up a test environment while ensuring network isolation between environments. The next topic, State Caching, is also important. This feature allows the persistence of a given environment state into the host machine storage with an associated label. Later, that state can be quickly restored during the test execution. The state saving and restoring mechanism needs to be quick because this caching feature aims to replace the time-consuming and repetitive setup steps in the test execution. For Parallelization, the requirements need the feature of launching multiple environment replicas running at the same time, to take advantage of all the hardware capabilities available. This feature needs to be well supported and implemented given that Parallelization will also be one of the core features of the solution. Regarding Speed this whole project is about speeding up test execution using different ap- proaches. For the given infrastructure platform, it should not have speed constraints and should be close to the performance that would be observed when running on real hardware instead of a virtualized environment. Finally, for Lightweight the infrastructure cannot tax the machine to the point of compromis- ing results by consuming a considerable amount of resources. It needs to be comparable to the small services and environments executed on the Host machine. These requirements all converge in Docker, which provides a Container-based architecture for virtualization. Docker provides Isolation, by being a platform dedicated entirely to virtualization with orchestras and support for Environment Variables. Provides Networking by having dedicated features for Network infrastructure virtualization. Enables State Caching by being able to save 4.2 Docker-mocha 43

images (the complete state of a container) based on containers names and loading containers based on tagged images. Provides Parallelization, since each container runs as a process in the host machine. This solution provides Speed, as containers are fast and are treated as separate processes in the host machine. Furthermore, in Linux based distributions, Docker uses the native Linux Kernel to execute containers, providing additional performance [31]. Unlike Virtual Machines, they do not need to emulate and replicate entire sections of systems, allowing the container-based archi- tecture to perform at levels close to those processes running directly on the host. Finally, it is a Lightweight approach, containers, as in their nature, are Operating System processes and only virtualize the core requirements in order to work. So, physical resource penalties are usually low, making it possible to have multiple containers running at the same time. NodeJS, MochaJS and Docker are the 3 main frameworks and applications that will support the implementation of the solution. The next section will detail how the solution was designed and developed.

4.2 Docker-mocha

Docker-Mocha is an NPM package created in the context of this dissertation and published on GitHub1. Docker-Mocha is designed to optimize a particular set of test environments. Environ- ments with a considerable amount of service/end-to-end tests and with setup dependencies. In normal execution, these test suites would run in the host machine, in a single process and with multiple re-creations of previous test setup states. The resulting unoptimized pipeline might yield very long execution times. Docker-Mocha solves this problem by using isolated docker environments, each running on their own network, to run each test case. By having isolation, it is possible to run the whole suite in parallel. Additionally, every setup state is saved (cached) in order for the successor test to load and execute more quickly. The requirements for installing Docker-Mocha in any NodeJS application is having NodeJS itself, Docker Community Edition and NPM. If these requirements are met the user can install Docker-Mocha globally, using:

1 npm install -g @feup-infolab/docker-mocha

and install in the project that they are testing with:

1 npm install @feup-infolab/docker-mocha

1Link: https://github.com/feup-infolab/docker-mocha.git 44 Implementation

Installing both the global version and the local version is the recommended solution. To install it in the project scope only, one should include docker-mocha in the development dependencies stated in the package.json file. The global option is useful because the users can directly invoke the docker-mocha command instead of executing it via project NPM scripts.

4.2.1 Tests and setups file

After the installation, the user must create a JSON file where they list all the setups and tests. Each setup and test must have a specific set of parameters as well as a unique identifier string. Tables 4.1 and 4.2 show the additional parameters for each setup and test respectively. The code snippet1 displays an example of a JSON file listing all the setups and tests with the correct dependencies and parameters.

Table 4.1: Additional parameters for each setup

Parameter Description depends_on Identifier of the parent state it depends on. If root, leave it null path The relative path of the setup file, from the project root. It can be null

Table 4.2: Additional parameters for each test

Parameter Description state Identifier of the state it depends on. It cannot be null path The relative path of the test file, from the project root. It cannot be null

1 "states": 2 { 3 "init":{"depends_on": null, "path": null}, 4 "setDollar":{"depends_on": "init", "path": "setup/init.js"}, 5 "setPound":{"depends_on": "init", "path": "setup/init.js"}, 6 "testDollar":{"depends_on": "setDollar", "path": "setup/dollar/setDollar.js"}, 7 "testPound":{"depends_on": "setPound", "path": "setup/pound/setPound.js"} 8 }, 9 "tests": 10 { 11 "init":{"state": "init", "path": "test/init.js"}, 12 "setDollar":{"state": "setDollar", "path": "test/dollar/setDollar.js"}, 13 "setPound":{"state": "setPound", "path": "test/pound/setPound.js"}, 14 "testDollar":{"state": "testDollar", "path": "test/dollar/testDollar.js"}, 15 "testPound":{"state": "testPound", "path": "test/pound/testPound.js"} 16 }

Snippet 1: Example of test.json 4.2 Docker-mocha 45

4.2.2 Compose file

Another requirement of the user is the creation of a docker-compose YAML file (eg: docker- compose.yml) to describe the environment of every service in their application (version 3.5 of docker-compose files is mandatory). If there is already a compose file for the platform, then the recommended is to create a copy only for docker-mocha. After creating or copying the file, the compose file must obey a set of rules and alterations:

1. The container_name property for each service must exist. Also, the environment vari- able ${ENVIRONMENT}. should be attached at the beginning of the value string. This is important because docker-mocha makes use of this environment variable and container name to identify containers.

2. The image property for each service must exist as well as the tag. The users are free to use the tag they want to. If the users do not need a specific image tag, they must use the latest value for the tag. The environment variable ${STATE} must be attached at the end of the tag. Docker-mocha makes use of the image and the ${STATE} environment variable to identify different setup states of the same service.

3. The users must attach a network at the end of the compose file. That network must have a name ${ENVIRONMENT} and the driver must be bridge. This is a docker specific con- straint since this will allow for docker to create isolated networks for these services.

4. (optional) The users are welcome to use the build property with Dockerfiles for custom projects in docker containers, as long as they follow rule number 2.

5. (optional) If the users require for the containers to communicate with each other, then common localhost: addresses will not work. For each service in the docker- compose file, they must add analias for the network created in rule number 3. Ad- ditionally, the configuration of each service might need to be changed from the default localhost to the new specified hostname in the alias.

The code snippet2 displays an example of a docker-compose.yml file correctly changed for docker-mocha.

4.2.3 Execution and options

After correctly configuring the files, users can invoke the docker-mocha command in the project root to run tests. Additional options and flags are available in table A.3 and 4.3 respectively. The code snippet3 displays a small example of how to execute docker-mocha. In this case, the file tests-file.json is the tests and setups file; it will execute with a maximum of 2 parallel instances; the base compose file is specified as docker-compose.yml; the entry-point service as dendro and the default service port is 8080. This call also specifies the flag --no-delete to prevent initial deletion of previous cached states. 46 Implementation

1 version: '3.5' 2 3 services: 4 dendro: 5 container_name: ${ENVIRONMENT}.dendro 6 image: nuno/node-currency:latest${STATE} 7 build: 8 context: . 9 dockerfile: dockerfiles/Dockerfile 10 networks: 11 custom_net: 12 aliases: 13 - dendro 14 mongo: 15 container_name: ${ENVIRONMENT}.mongo 16 image: mongo:3${STATE} 17 networks: 18 custom_net: 19 aliases: 20 - mongo 21 22 networks: 23 custom_net: 24 name: ${ENVIRONMENT} 25 driver: bridge 26

Snippet 2: Example of docker-compose.yml for docker-mocha

Table 4.3: Additional flags for docker-mocha

Flag Description Does not use existing states. Always creates services with base image and --no-checkpoint runs all the setups till it reaches the one needed by the current test --no-delete Prevents deletion of existing cached states at startup Normal Mocha execution without setup-caching and parallelization. Useful --no-docker for debugging tests

1 docker-mocha -f tests-file.json -t 2 -c docker-compose.yml -e dendro -p 8080 -- no-delete

Snippet 3: Running docker-mocha with custom parameters 4.3 Architecture 47

4.3 Architecture

The UML diagram of the architecture of docker-mocha is represented in figure 4.1. It is divided into 3 Packages and 1 class. The Runner depends on 2 modules (Manager and NoDocker) and 1 class (DockerMocha).

Figure 4.1: Docker-Mocha Architecture

4.3.1 Runner

The Runner module is the main entry-point of docker-mocha and requires all of the other modules. The first task is to parse and validate the input information in order to properly execute the user needs (mainly options and flags). It will instantiate a new object from the classD DockerMocha, which will centralize information and pass it around between all modules and sub-modules. The second task of the Runner is to understand which execution type is currently operating on. It can be one of 3:

• Manager Mode

• Setup Mode

• Test Mode

Some modes are triggered by different environment conditions. They are all executed by command arguments, but with different arguments. The table 4.5 represents the execution environment of each mode and an example to execute them. 48 Implementation

Table 4.5: Environment and Example for the modes

Mode Environment Example docker-mocha -f tests-file.json -t 2 -c Manager Host Machine docker-compose.yml -e dendro -p 8080 -no-delete docker-mocha -setupFile setup/createUsers.js Setup Entry-point -config='test' container docker-mocha -setupFile setup/createUsers.js Test Entry-point -testFile test/checkIfUseExists.js container -config='test'

The entry-point container is the container inside a compose environment (a group of mutually communicating containers) that will provide interaction with docker-mocha. This entry-point con- tainer is usually the application being tested but adapted for docker containers. It is where the tests and the setups are executed. This entry-point container is first known as soon as docker-mocha is executed in the Host machine (Manager mode), provided by the argument -e. The Test Mode and Setup Mode exist because, in order to create setups and run tests inside the docker-compose environments, the docker-mocha command must be invoked. If the command is invoked inside these virtual environments then it can either create a setup state or execute a test. This state in which the Runner executes is determined by the values of --testFile and --setupFile. If both values are absent, the Runner understands it is running in Manager Mode and will be responsible for executing and managing all the tasks of the test suite—in particular, launching new container environments for each test. This mode is executed directly in the host, and when the Runner executes in this mode, it will interact and manage docker. It will also coordinate the setup creation and test execution, by calling the docker-mocha instance inside each container with the the --testFile and/or --setupFile flags set. If only the value for --setupFile is present then docker-mocha understands it is running in Setup Mode and will invoke the proper behaviour to create the setup state. The value passed is the path to the setup file which will be executed in order to create the desired state. If both --testFile and --setupFile are present, docker-mocha understands it is run- ning in Test Mode and will invoke the proper behaviour to execute the test. The reason why both values are required is that there is an initialization step that always runs before the test, and those are specified in the setup file. The third task is only executed if the Runner is in Manager Mode. This task comprises creating and managing the task queue. 4.3 Architecture 49

The queue can only accept 2 types of jobs: Setup state Creation and Test Execution. It will first start by identifying the root Setup (the one setup that has no father) and pushing it to the queue to be created. It is always the first job to be executed. Both types of jobs have event listeners associated. The event listener for the Setup Creation jobs –at the end of the job and if successful– will fetch every direct Setup Creation and Test Execution child jobs. It will push to the queue the dependent Setup Creation jobs first and only then the Test Execution jobs. The event listener for the Test Creation –at the end of the job – jobs will report eventual failures and related statistics. Once the Runner queue runs out of jobs it will calculate the results, export them to a CSV file—each with an associated timestamp—and exit properly. With this progressively unlocking architecture, every task is eventually executed. However, given the hierarchical structure, if some Setup jobs hang indefinitely due to reasons out of the docker-mocha context, then some parts of the dependency tree will be left unexplored. If the jobs hang indefinitely, a timeout of 10 minutes is invoked. After that, if no more jobs are on the queue, docker-mocha will exit properly and will report the failed jobs and how many it was able to execute.

4.3.2 Manager

The Manager module is responsible for interacting with docker. There is no API for docker so the only solution is to directly call the docker command. The child_process module in NodeJS was used to spawn docker child processes as needed. The child_process module is also able to interact with a terminal console to log what is happening inside each container. This Manager contains several functions to interact with. The execution flow is controlled by the NodeJS module async. This module assists in running the series of services that create each environment. The functions are listed in table A.1. These functions are also evaluated in tiers. Given that the higher tier applications are the most complex—making use of multiple low tier ones—and are the main functions executed by the Runner module.

4.3.3 DockerMocha

DockerMocha is a singleton that stores information shared across docker-mocha. It is used by the Runner and helps exchange information with the Manager and other modules. The class is represented in figure A.1

Fields

The table A.2 lists all the fields of the DockerMocha class with the respective type and description. 50 Implementation

Methods

The table A.2 lists all the methods of the DockerMocha class with the respective input arguments, return values and a description.

4.3.4 NoDocker

NoDocker allows one to execute the tests using only mocha, for easier debugging of test code before running the entire battery of tests. It will execute the tests without Docker, Setup Caching mechanism or parallelization.

4.3.5 Other

There is an additional file utils.js. It is not a module it is only a file with useful functions used in the entire project.

4.4 Class execution architecture

In an earlier version to create a state the setup file was directly executed as a NodeJS script inside the container. In the case of the tests, they were executed with the mocha command. However, for modularity and extensibility, it was necessary to create another version with a more object-oriented architecture. In the new version, for both the setup and test files, they are executed using the docker-mocha command. The setup file or test file is loaded in the different running mode of the Runner—test or setup—and it proceeds to create a setup or execute a test. Inside the Setup mode, the setup files are distributed in a class hierarchy architecture. The specified class is instantiated and the proper methods are executed. For the Test mode, after a proper initialization, a Mocha class will be instantiated via the provided Mocha framework. That object will receive the test file and it will be executed program- matically. The class that docker-mocha supports is represented in figure 4.2

Figure 4.2: Setup Class

This class has three methods: init, load, and shutdown. The init function exists to start envi- ronments and services and works as a before phase. The load function is responsible for doing the actual setup: loading databases, adding files, etc. The shutdown function exists to perform a clean and secure exit before saving the state. 4.5 Graph extraction 51

These methods are invoked both for the setups and for tests jobs. For the setup the init is invoked first, then the load and finally the shutdown. For the test the init is invoked first, then the test and finally the shutdown. The load in this job is ignored because it was previously executed and saved during the setup creation phase. This architecture exists to prevent code repetition and to have some control. The idea is for each setup state file to be a class that implements these 3 methods. Each setup should extend the parent class present in the parent state file. The only file to extend Setup should be the root state. This way it is possible to have a hierarchy represented in one common file—the setup and test dependency JSON file. This class hierarchy structure is not mandatory, although it helps, since in many cases the init or shutdown methods are the same, so instead of repeating code the user can easily back-track to the root with an execution of the method with the same in the superclass, through the super keyword provided by JavaScript ES6.

4.5 Graph extraction

Another feature developed for docker-mocha is the possibility to extract the visual layout of the entire dependency tree displayed. It is invoked using the docker-mocha-graph command with the setup and test dependency file path as the first argument. For example:

1 docker-mocha-graph tests-setup-structure.json

The figure 4.3 displays an example of a graph extracted from a given setups and tests depen- dency file. The Red Circles represent states with the respective name. At the end of the name is number that represents how many tests depend on that state. The arrows represent dependency. This tool was developed using the package “NetworkX” [9].

4.6 Issues

This section will be used to discuss issues found during the implementation.

4.6.1 Docker for Windows and macOS

Creating a tool that works in multiple operating systems was not a requirement. However, given the transparency that the frameworks used have regarding operating system support, it was expected that they should work without any constraints. That is not the case for windows and macOS. 52 Implementation

Figure 4.3: Graph Example

To understand why docker-mocha is not supported in windows and macOS it is imperative to understand how docker is implemented in these Operating System architectures (Windows archi- tecture is shown in figure 4.4). This example uses Windows but these architectural considerations also apply to macOS. In Linux, Docker works on top of the operating system, with support at the kernel level. The same does not happen for windows, where Docker is not natively supported. The architecture represented in figure 1 is mounted when the installation of the Docker is conducted. It will create a Linux Virtual Machine using Hyper-V called MobyLinuxVM. It is inside this virtual machine that the docker engine is running, and so, it is inside this machine that the containers will be executed. In order to communicate with the Host and vice-versa, an additional virtual switch is created with two virtual interfaces, one attached to the Windows 10 host (defaultNAT with IP 10.0.75.1) and another attached to the MobyLinuxVM (hvint0 with IP 10.0.75.2). This way it is possible to interact with the docker containers running inside the virtual machine. In many common applications, this architecture is transparent to the user. The docker com- mands work properly either in CMD or PowerShell, despite the fact that they are actually interact- ing with MobyLinuxVM. It is as if docker in windows was actually a remote shell (ssh) to another machine. Why does this architecture affect docker-mocha? Docker-mocha uses docker commands in order to manage the states and containers. However, when checking if a service is online it cannot rely on docker to get that information. For that, 4.6 Issues 53

Figure 4.4: Docker in Windows Architecture 54 Implementation

docker-mocha needs to use netcat to communicate with the container directly, using a NAT- translated IP address and port. Given the containers are running in isolated networks, they all have a different assigned IPs. These IP’s are pingable by default on Linux given the Docker architecture in that OS, which includes kernel-level support. However, the same does not happen in windows, where these IP’s are only known by MobyLinuxVM. So, when using netcat in Windows host with the given addresses, the netcat script will run indefinitely instead of communicating with the container. Because of this issue, until docker networks are not properly implemented in windows, docker- mocha does not have support for windows without some workarounds.

Workaround A - Updating host route tables There is a workaround that makes docker-mocha work in windows: updating the windows host route tables with the sub-network where containers are running. Example:

1 route ADD /P 172.20.0.0 MASK 255.255.0.0 10.0.75.2

The first IP address represents the sub-network where the container is running. The second IP address represents the subnet mask. The third IP address represents the IP address of the hvint interface of the MobyLinuxVM, which is the interface that knows the routes to all the docker networks and respective containers. It is also able to communicate with the windows 10 host. Unfortunately, this is a highly unreliable method for several reasons:

1. It is important to keep a high level of isolation for docker-mocha, so altering the routes of the host machine is a bad practice and not recommended.

2. Administrator permissions are required in order to alter the routing table of the host machine.

3. It is practically impossible to obtain the hvint0 interface address. This address is known exclusively by MobyLinuxVM which is inaccessible when using common and reliable methods.

4. Permanently using the hard-coded hvint0 address (10.0.75.2) is not good practice. Not only it should be avoided using hard-coded values, but also using this address is highly unreliable. The interface address might change in the future or it can already be a totally different address from the moment docker is installed in windows.

Workaround B - Relying on wget running on the container Instead of verifying the container from the host machine, it is possible to verify inside the container itself. Not with netcat—given that most docker images do not have support for this tool—but with wget. This is a program that retrieves files using HTTP and it is very likely to exist in the docker images used by the users. If 4.6 Issues 55 it is assumed that the user application handles HTTP requests and wget is installed in the docker image, then it is possible to use docker-mocha in windows and macOS. Using netcat to test the TCP/IP connection is the ideal solution. The TCP/IP protocol is placed in the Transport Layer in the OSI model while the HTTP of wget is placed in the top layer, Application. The ideal is to test the connection as low as possible and TCP/IP is the lowest possible in this scenario. Docker-Mocha only wants to verify if a container is available not the application sometimes. If it is to run NodeJS scripts or mocha files it is irrelevant if the application can handle HTTP requests. However, knowing that the TCP/IP connection can be established means it is ready to process other tasks. Another reason why TCP/IP is ideal is that it is quicker to get ready than the HTTP requests handling. And, of course, assuming that the application being tested handles HTTP requests.

4.6.2 Ambiguous networks

Another problem appeared when it was attempted to use docker-mocha on macOS, this time re- lated to the networks. While in Linux adding and removing networks is easy and reliable, macOS started having multiple problems with Ambiguous Networks errors after some executions. Docker for Mac could not reliably delete and create a clone of the same network with the same identi- fier name. Once this problem happens the whole docker-mocha execution breaks and introduces faults in the next setups and tests executions. Moreover, the only solution is to reset all the docker network infrastructure with docker network prune. This will delete all the networks, only leaving the default ones unchanged. For all these reasons, it is possible to conclude that docker-mocha only works reliably on Linux.

4.6.3 No-volumes images

Docker containers have an internal file system managed by Docker. That data is mainly located inside the container, normally inaccessible from the host machine, and only visible to the running container. However, Dockerfiles provide a VOLUME statement, which allows certain data locations to be exposed to the host, helping to separate data from logic. In other words, volumes map data locations from the container to the host machine file system, making it possible to access some intended data produced by the container in the host machine. This feature is particularly useful, for example in containerized databases, as the container is able to process, handle requests and save data in an isolated and disposable way. The data produced by the database becomes independent from the database software and easier to replicate for backup or migration. This design is intended to use in production software, but it can hinder for testing or multiple concurrent replicas of containers. All locations mapped to volumes are excluded from a docker commit command. This means that any changes made to a container whose Dockerfile specifies a VOLUME will be lost if they impact files or folders mapped in the volume. Practically all impor- tant images (MySQL, Elasticsearch, PostgreSQL, etc) implement volumes in their Dockerfiles, 56 Implementation specifically to map their data directories to folders in the host. Since those are all excluded from a docker commit, they would not be included in the saved states of docker-mocha, rendering it useless. Since a Dockerfile works by adding layers to an existing image, it will inherit all existing VOLUMEs of the parent images, and they cannot be removed. There is a long-standing discussion on whether or not there should be an option to disable VOLUMEs in a Dockerfile that depends on an image where VOLUMEs have been specified, thus returning those locations to the scope of the docker commit command. This issue was already reported and is currently 5 years old with no expectations to be resolved or introduced as a feature [29]. The only solution is to create a version of the images with all VOLUME statements in the Dockerfile commented out. These changes do not modify the behaviour of the service but allow the entire container state to be saved in the images created by docker commit commands. For the validation phase of this project, all the services used images that created volumes, so, forked versions of those images had to be created with the volumes removed or commented. Chapter 5

Validation and Results

This chapter is dedicated to the valuation of the solution, using Dendro as the test-bed, and several test cases already present in the platform. After describing the experimental scenario, the gathered data are analyzed and conclusions drawn. The goal is to prove the hypothesis of this dissertation, which is recalled here:

It is possible to reduce the overall execution time of service and end-to-end tests through setup state caching and parallel execution mechanisms, provided by sets of isolated containers.

5.1 Preliminary test prototype

While implementing docker-mocha an application prototype was used. It consisted of 5 states, 3 setups and 5 tests. Figure 5.1 shows the dependency graph of that application.

Figure 5.1: Application Prototype Dependencies

The application is a currency NodeJS application to perform Euro to Dollar and Euro to Pound conversions. The init State first prepares the application to store the values. The setDollar

57 58 Validation and Results

state stores the value of the dollar in euros. The getDollar carries out a conversion from euro to dollar. The setPound state stores the value of the pound in euros. The getPound makes a conversion from euro to pound. Given the independence between the dollar and pound, both could exist and be converted without the other. It was a good prototype to test setup caching and 2-instance parallelization. The test and dependency file is displayed in snippet4. The docker-compose file used is dis- played in snippet5. Both snippets were used before as an example to the test and setup dependency file and the docker-compose file.

1 "states": 2 { 3 "init":{"depends_on": null, "path": null}, 4 "setDollar":{"depends_on": "init", "path": "setup/init.js"}, 5 "setPound":{"depends_on": "init", "path": "setup/init.js"}, 6 "testDollar":{"depends_on": "setDollar", "path": "setup/dollar/setDollar.js"}, 7 "testPound":{"depends_on": "setPound", "path": "setup/pound/setPound.js"} 8 }, 9 "tests": 10 { 11 "init":{"state": "init", "path": "test/init.js"}, 12 "setDollar":{"state": "setDollar", "path": "test/dollar/setDollar.js"}, 13 "setPound":{"state": "setPound", "path": "test/pound/setPound.js"}, 14 "testDollar":{"state": "testDollar", "path": "test/dollar/testDollar.js"}, 15 "testPound":{"state": "testPound", "path": "test/pound/testPound.js"} 16 }

Snippet 4: Example of test.json

5.2 Dendro: a research data management platform

Dendro [7] is a research data management platform designed to help researchers store, describe and share the data produced in the context of their research works. This data model of the platform is built using ontologies and metadata descriptors can be generic (i.e. Dublin Core) or come from domain-specific ontologies. Later, researchers can use Dendro to share datasets with different repository platforms on the web. Currently, Dendro has over 2000 individual end-to-end tests divided across 168 files. Given the original goal of this dissertation, it is a suitable test-bed given the number of end-to-end tests.

5.2.1 Dendro technology stack

Dendro uses multiple technologies: for the webserver it uses Node.js, but there are 3 other databases: MongoDB, Virtuoso, and MySQL. To power its free-text search, it uses ElasticSearch, a document indexing service. 5.2 Dendro: a research data management platform 59

1 version: '3.5' 2 3 services: 4 dendro: 5 container_name: ${ENVIRONMENT}.dendro 6 image: nuno/node-currency:latest${STATE} 7 build: 8 context: . 9 dockerfile: dockerfiles/Dockerfile 10 networks: 11 custom_net: 12 aliases: 13 - dendro 14 mongo: 15 container_name: ${ENVIRONMENT}.mongo 16 image: mongo:3${STATE} 17 networks: 18 custom_net: 19 aliases: 20 - mongo 21 22 networks: 23 custom_net: 24 name: ${ENVIRONMENT} 25 driver: bridge 26

Snippet 5: Example of docker-compose.yml for docker-mocha 60 Validation and Results

Figure 5.2: The current CI pipeline for Dendro

At this time, the solution uses Docker to instantiate the Dendro server and respective services, but it originally used a virtual machine that was built using Vagrant. Given the high time and resources it took to set up this virtual machine with a set of scripts1, this was later changed to a microservice approach using Docker containers.

5.2.2 Current CI pipeline

In Figure 5.2 it is represented the current Continuous Integration pipeline for Dendro. The current pipeline runs on a Jenkins server and is made up of 5 different stages:

• Build, where Dendro is be packaged and built;

• Test and Coverage, which testing it also runs coverage tools; it is the most time-consuming stage.

• Report Coverage, where the coverage calculated in the previous stage will be reported to the appropriate services;

• Deploy in which an updated Dendro image will be deployed to the Docker Hub platform;

• Cleanup, which removes temporary files and frees allocated resources needed for the pre- vious stages.

5.3 A preliminary benchmark

For illustrative purposes only, a preliminary test run was initially executed with the Dendro test suite. Even though there is no direct comparison between the results of this benchmark and the results of the evaluation part of this work, it allowed the identification of some issues: CPU under- utilization and re-execution of setup phases. The evaluation experiment in the next section will draw its own comparison baseline. The direct comparison is not possible due to different reasons: different Operating System and optimization; different overall architecture regarding I/O; different CPU generations (1st Gen Intel Core architecture in the evaluation vs 4th Gen in this benchmark). In a machine with the specifications shown in Figure 5.3, the pipeline presented in figure 5.2 took over 4 hours to be completed.

1https://github.com/feup-infolab/dendro-install 5.4 Evaluation experiment 61

Model Name: MacBook Pro Model Identifier: MacBookPro11,2 Processor Name: Intel(R) Core(TM) i7-4750HQ CPU @ 2.00GHz Processor Speed: 2 GHz Turbo Boost: 3.2 Ghz Number of Processors: 1 Total Number of Cores: 4 Memory: 16 GB I/O: up to 1.2 Gbps Figure 5.3: Specifications of the test machine

5.4 Evaluation experiment

This experiment contains various scenarios in which the first one establishes the baseline, which represents the value that needs to be improved upon to prove the hypothesis. This re-evaluation as the purpose of proving the following hypotheses and answering the following research questions: To prove hypothesis 1.2, it is intended to find answers to the research questions1 and2. In order for the experiments to be comparable, a set of requirements must be met: The same machine or same hardware configurations will have to be used, as well as the same test suite and execution script will have to be used as well.

Metrics

The primary evaluation metric will be the execution time of the Dendro tests in the same environ- ment but alternating between different configurations. Other metrics will be registered: average CPU Load; maximum CPU Load; minimum CPU Load; I/O usage; Memory usage.

Design

The experiment should establish the comparison baseline, executing the tests with none of the proposed solutions active; the second experiment will be to execute using test state caching; the third and next experiments will be to execute using parallelization by gradually increasing the number of parallel instances. The figure 5.4a shows the complete test dependencies and setups of Dendro. However, due to crashes and multiple different failures in some of the tests that prevented the suite to finish, some of them were removed to make a test suite that passed and executed accordingly. These tests failed due to reasons that are not related with docker-mocha. The figure 5.4b shows the new test and setup dependencies. From an original of 168 test files, only 118 were present in the test suite used, as the ones excluded were not passing at the time when run in the existing pipeline. The graph was not made with the docker-mocha-graph tool, but instead produced manually to help in visualization. 62 Validation and Results

(a) Total Dependencies (b) Total dependencies passing

Figure 5.4: Total dependencies passing 5.5 Results 63

Hardware and software configuration

The hardware specifications for the experiment are shown in figure 5.5 and the software specifica- tions in figure 5.6.

Single Machine Processor 1 Name: Intel Xeon [email protected] 6 Cores Processor 2 Name: Intel Xeon [email protected] 6 Cores Number of Processors: 2 Total Number of Cores: 12 Total Number of Threads: 24 Processor Speed: 3.33 GHz Memory: 64 GB Memory Type: DDR3 Memory Speed: 1333MHz Storage: 1x 480GB SATA SSD Drive Max Read Speed: 500MB/s Max Write Speed: 450MB/s

Figure 5.5: Hardware specifications of the test machine

Operating System: Ubuntu 18.10 Docker: 18.09.04 Docker-Compose: 1.23.1 NodeJS: 8.11.4 npm: 5.8.0 Mocha: 5.1.1

Figure 5.6: Software Configuration

5.5 Results

Several experimental runs were carried out, with results shown in table 5.1.

S. Caching Parallel. # Time Avg. CPU T. Read T. Write Avg. Mem. 1   1 4h:23min 4.63% 0.549 GB 41.363 GB 7.03 GB 2   1 1h:04min 7.49% 1.53 GB 51.1 GB 4.69 GB 3   2 0h:37min 13.33% 1.26 GB 50.9 GB 4.89 GB 4   4 0h:22min 23.3% 1.24 GB 49.5 GB 6.26 GB 5   8 0h:17min 32.6% 1.26 GB 51.2 GB 8.68 GB 6   12 0h:16min 35.8% 1.26 GB 51.6 GB 11.7 GB 7   24 0h:15min 38.9% 1.34 GB 51.5 GB 10.61 GB Table 5.1: Evaluation Scenario with Results 64 Validation and Results

From the initial results, it is possible to conclude that the time is significantly reduced by implementing state caching. By implementing parallelization it is reduced even more. So, by using more parallel instances, the time it takes to finish the test suite is overall reduced. However, not linearly. The average CPU usage increases with the number of instances, as does memory usage. Disk Reads and Writes stay approximately the same. This answers the research question1. By implementing setup-caching and parallelization in the test phase, it is possible to reduce the overall execution time. There is a small exception for the 1st case where no approach was used. For this case, the Total Disk Read is approximately 1 Gigabyte smaller than any other case running with both approaches. Something similar happens with Total Disk Writes, but with a 10 Gigabyte difference. In terms of memory, running with 8 instances or more consumes more memory than without setup caching or parallel instances. As there are more docker operations when running with setup caching and more processes as the number of instances increase, it is only natural the average CPU usage increases as well. The difference of 1 Gigabyte for Disk Reads in running with no setup caching to running with might be due to the fact that it will always be the same base image that will be loaded from disk for each test. Because of that repetitive behaviours, some data is probably stored in RAM cache and used to speed up the reading process. When running with parallel instances, it becomes steady, with 1 more gigabyte read in total. This can be due to the fact that each test needs a specific image and not the base images. So, given the new variety of data to load, it is consequential that the Total Read data increases and stays constant with the increase of parallel instances. The Total Disk Write reports over 40 GB of data stored; this might be explained due to the fact that every container is running setups and therefore creating data inside the containers’ virtual hard drive that is mapped on the local hard drive. The increase of more 10 Gigabytes when running with setup caching is explained due to the fact that besides running setups, the containers are also being saved as images, which will consume more disk resources and space. The average memory usage is lower when running without setup caching. This is a result of the constant re-execution of previous setups. Without setup caching, each test requires the execution of the setup from the root state to the one required for the test. This process is repeated over and over again without saving state data to disk. As more operations are run since the start of each container, they end up using more memory. Conversely, when running with setup caching, each individual container will consume less memory due to the fact that all of its state was stored in disk and not created again in memory. With the increase of instances, more of these low memory containers will be used at the same time, however, so the overall memory usage will be higher overall.

5.5.1 Total execution time

The figure 5.7 shows the gains in time between different approaches. The percentages are relative to the build time immediately at the left. 5.5 Results 65

The first column represents the total time with No Setup caching or parallelization. The second and the remaining columns use setup caching and parallelization, from 1 instance to 24 instances. By using Setup caching, the build time decreased by 75%. Running with 2 instances an addi- tional decrease of 42%. With 4, an additional decrease of 40%. With 8, a decrease of 19%. With 12 and 24 both report an additional decrease of 5%. Running with 4 instances can be considered the optimal time for this configuration. It resulted in a decrease in total time from 4 hours and 23 minutes to only 22 minutes. From here, the gains are negligible when converted to real-time. This answers the research question2: by incrementing the number of instances the time is reduced, albeit with diminishing returns as more instances are added.

15000

10000

5000 Build time (seconds) −75.37%

−42.76% −40.61% −19.82% −5.09% −5.07%

0

No Caching 1 2 4 8 12 24 Number of instances

Figure 5.7: Time Gains

5.5.2 CPU usage

The average CPU usage throughout the tests in the different scenarios, including setup caching and parallelization from 1 to 24 instances is displayed in figure 5.8. The Y-axis represents the CPU usage and the X-axis the time in seconds. The Blue Line depicts the average CPU usage. With 1 instance, the peak CPU usage stays below the 50% usage, with an average of 4%. There are no major usage spikes. With 2 instances, we can now observe some localized spikes in CPU usage, but never more than 85%. The average usage is now around 13% it starts to increase as time passes. With 4 instances the spikes increase in quantity and stop from localized to more constant. The spikes never get to 100% usage, staying below 90%. The slope of the average increases even 66 Validation and Results more but only between 25% and 30%. A pattern starts to stand out. It is possible to see that the execution in the beginning and in the end stays usually below the 50% mark. With 8 instances, the spikes finally get to 100%, but are very localized. The pattern identified in the 4 instance scenario is even more visible in this one. The slope of the average increases even more. With 12 instances, the 100% spikes increase in numbers. The pattern is easily identified and the slope of the average increases even more. Finally, with 24 instances, it is possible to observe that the 100% spikes are practically constant in the middle. The pattern becomes even more defined. The slope of the average reaches its maximum. The slope in the trendline is the most visible of all scenarios. The behaviour displayed is expected. As the number of instances increases, the average CPU usage increases and the 100% usage becomes more frequent. However, as the instances increase a pattern was identified. It is possible to observe that in the beginning and in the end, the average CPU stays below 50%. In the middle, it is practically constant at 100%. This happens because of two reasons: the nature of the dependency tree and the job queuing logic. By analyzing the setup dependency tree in figure 5.4b it is possible to observe that there are two sub-trees that have only one branch, in the beginning, and at the end bottom of the tree, with no tests in their intermediate setups. These two parts of the dependency tree and the queuing of state creation and test execution in docker-mocha are the answer to the pattern identified. Docker-Mocha only unlocks the next job when its parent finished. So, having a tree with a pattern of “one parent one child” will result in a linear execution as well, as there is no possibility of parallel execution. This is verified both in the beginning and at the end, both in the CPU average usage and in the dependency tree. Also, having multiple intermediate setups with no tests contributes even more to the linear behaviour. From this analysis, we can conclude that to take advantage of all available CPU cores, depen- dency trees should be wider and less deep. What this means is by having less empty setups—setups with no tests that depend on them—and having more direct children, it is possible to unlock the full features of parallelization.

5.5.3 Memory usage

The memory usage by time instant chart is represented in figure 5.9. This chart represents the memory used in Megabytes for the executions with setup-caching and parallelization from 1 to 24 instances. For 1 and 2 instances the behaviour is constant, with a small increase in the end. This might be due to the nature of the setup in the last part of the dependency tree. It might require more data to be loaded or to be created. For 4 instances, the pattern identified in the previous section starts to be visible for the memory usage as well. For 8, 12 and 24 instances the pattern becomes even more visible, gradually increasing the memory used in the middle section as more instances are added. 5.5 Results 67

1 Instance 2 Instances 4 Instances 100

75

50

25

0

8 Instances 12 Instances 24 Instances 100

75 CPU Usage, All Cores (%) CPU Usage,

50

25

0 0 1000 2000 3000 4000 0 1000 2000 3000 4000 0 1000 2000 3000 4000 Build time (seconds)

Figure 5.8: Average CPU usage throughout the test runs

The increase in memory usage is by the increase of instances is explained due to the fact that with more instances more concurrent containers are running and so, more data and resources they need.

1 Instance 2 Instances 4 Instances

20

10

8 Instances 12 Instances 24 Instances Used Memory (GB)

20

10

0 1000 2000 3000 4000 0 1000 2000 3000 4000 0 1000 2000 3000 4000 Build time (seconds)

Figure 5.9: Memory Used

5.5.4 Disk read

A disk read activity by time chart is represented in figure 5.10. This chart represents the data read from the disk in Megabytes along the time (X-axis represents seconds) for the executions with setup-caching and parallelization from 1 to 24 instances. 68 Validation and Results

It stays practically constant for 1 instance and 2 instances. At 4 instances there is a slight increase, and at 8, 12 and 24 instances the previously observed pattern starts to emerge. Despite increasing the number of instances, the values do not seem to increase any further. The expected behaviour should be that by increasing the number of containers that want to load more data per unit of time, it should result in the same total amount of data in all of the charts. The area of the black region represents the total data read and it should be the same in all of the charts. However, this is not true, perhaps due to the Operating System caching files in memory.

1 Instance 2 Instances 4 Instances

60

40

20

0

8 Instances 12 Instances 24 Instances

60 Disk I/O Read (MB/s)

40

20

0 0 1000 2000 3000 4000 0 1000 2000 3000 4000 0 1000 2000 3000 4000 Build time (seconds)

Figure 5.10: Disk Read

5.5.5 Disk write

The disk write by time instant chart is represented in figure 5.11. This chart represents the data written in the disk in megabytes per time elapsed for the executions with setup-caching and paral- lelization from 1 to 24 instances. For 1 instance, it is practically constant. For 2 instances some spikes higher regions are be- coming visible in the middle. For 4 instances it becomes constant again but with a higher average. For 8, 12 and 24 instances the chart is practically the same with the same average and the same constant value in the middle section. The increasing average pattern seen previously in CPU and memory usage is still visible, however, less accentuated. 5.5 Results 69

1 Instance 2 Instances 4 Instances

200

100

0

8 Instances 12 Instances 24 Instances Disk I/O Write (MB/s) 200

100

0 0 1000 2000 3000 4000 0 1000 2000 3000 4000 0 1000 2000 3000 4000 Build time (seconds)

Figure 5.11: Disk Write 70 Validation and Results Chapter 6

Conclusions

After a detailed analysis of the current test optimization solutions, it is possible to conclude that there is a lack of tools for optimizing the local execution of non-Unit tests. The majority of tools focus on local optimization of Unit tests, while the support for non-Unit is usually deployed in the cloud and the optimization is usually conducted on the build stage before the tests start to run. This dissertation proposed two techniques for accelerating the execution of Service and End- to-end tests: Setup state caching and parallelization. The implementation of such techniques resulted in docker-mocha, a tool created that helps to optimize the large and time-consuming test suites that take a long time to complete. Docker-mocha was used to optimize the tests present in open-source software (Dendro) that had over 2000 end-to-end tests spread over 168 individual files. The execution took place in the same machine and was repeated multiple times with an increasing number of parallel instances. The results showed an overall decrease from a baseline of 4 hours and 23 minutes to 22 minutes. The baseline scenario one had no setup-caching nor parallelization implemented, and from the analysis conducted, it is possible to conclude that the optimal scenario consisted of setup-caching enabled and 4 parallel instances. After 4 instances, the improvements started to get negligible. This dissertation proved that having a setup caching mechanism and increasing the parallel instances, drastically decreases the test suite execution time. This might bring some changes to the work of software developers who need to write those few end-to-end tests to complement their unit tests: instead of skipping these time-consuming tests for faster feedback during development, they could now keep them in the test suite. Some problems also appeared. The technical analysis concluded that adopting Docker for this purpose is only reliable on Linux machines. Windows and macOS implementations of Docker are not fully reliable because of the way that they handle networking and the translation of network addresses between containers and the host. Other issues are more related to the dependency tree the users create for their test suites. Having deep sub-trees with several empty setups—states with no dependent tests—results in a sequential execution of tests and state creation, which behaves as if there was no parallelism in the solution. The users should make their dependency tree more shallow and with a higher breadth.

71 72 Conclusions

As future work, there are many topics that can be investigated. One of them is using prioritiza- tion by gathering past results to train a machine learning model, to try to optimize different goals: failing faster, targeting the highest coverage possible in the least amount of time, and others. This way, the order in the priority can be changed to target these specific goals, helping developers to have a quicker feedback cycle, and optimizing the development time. Another topic that could be explored is how to broaden the approach to other test frameworks. Docker-mocha is exclusive to MochaJS, in the future it will be useful for other users if it had support for additional testing frameworks. A feature that was lost with docker-mocha was the code coverage reports that were possible when the original mocha application was used. Given that the tests are now executed individually and in an isolated environment instead of a single process running on the local machine, it is currently impossible to get coverage reports. Having this feature implemented in the future will certainly prove useful to developers. Another issue was the existence of sub-trees with a single branch, where there are no tests to be run in each node. It would be interesting to identify these subtrees and optimize their execution by finding first ascendant with tests, directly executing all the setups on the children that do not have dependent tests and saving only the image of the last child. With this, it is possible to skip all the startup and shutdown cycles that happen during the creation of the images of the intermediate states, and saving some execution time. A paper was also created in container driven testing. It was submitted into a top tier software engineering conference and is currently being reviewed and waiting for approval. Appendix A

Appendix

Table A.1: Functions of the Manager

Function Tier Description Responsible for restoring state, even if it means creat- ing a state or loading from cache. It will first check if the --no-checkpoint flag is active and will execute restoreState High startWithVanillaSetups if so. If not, then it will check if the state exists with checkIfStateExists. If it exists then startEnvironment is invoked. If not, then the createState is invoked. Invoked when a given state does not exist. It will check if the parent state exists with checkIfStateExists. If the state does not exist, it will recursively call itself for the state parent (this recursive call is the mechanism that will allow the creation of the parent setup states required for a given state, until the root of the dependency tree if no state exists). After creating the parent state, it will stop createState High the environment (which by default is left up and running after creation to save resource), then a new environment for the current setup will be started with startEnviron- ment based on the newly created state parent. The setup will be executed with runSetup and the environment is saved with saveEnvironment. In the event that the parent state already exists, the environment is simply started, the setups executed and finally saved.

73 74 Appendix

Responsible for verifying if a given state exists in docker. The first thing it does is to check if the state to verify is null. In this case, it immediately returns true. This is because the null value represents the parent of the root setup; given that the root setup does not depend on a pre- viously existing state than that value is null. It returns true because it is considered an already existing state. checkIfStateExists High If the state to verify is not null then all of the services with the respective image and tag will be retrieved us- ing docker-compose by getAllServices InOrchestra. Af- ter that, the mapSeries function of the async module will verify if an image for the required state exists for all of the services. If all of them exist then the state is con- sidered as existing. If at least one of them does not exist then the state is considered non-existing. Executes the setups in the entry-point container. It will invoke a docker exec command that in- vokes the docker-mocha command with the proper runSetup High arguments inside the entry-point container. An example of what this function executes: docker exec dendro docker-mocha -setupFile setup/createUsers.js -config='test' This function will get the hierarchy from the root to the runSetups High current setup and execute all setups with runSetup. The execution is from the root to the current setup Executes the test in the entry-point container. It will invoke a docker exec command that invokes the docker-mocha command with the proper arguments inside the entry-point con- runTest High tainer. An example of what this function exe- cutes: docker exec dendro docker-mocha -setupFile setup/createUsers.js -testFile test/checkIfUseExists.js -config='test' Iterates the Docker compose file and retrieves a list with getAllServicesInOrchestra Low services with the corresponding image and tag Deletes all states based on the services in the docker- deleteAllStates Low compose file. Used at the beginning of the Runner Man- ager mode for when the user wants to create new states. Appendix 75

Stops and removes all the containers that start with the StopAndRemoveContainers Low environment name to prevent conflicts. Used when cre- ating a state or running a test. Removes networks to prevent conflicts. Used when cre- RemoveNetworks Low ating a state or running a test. Stops all docker containers. Currently unused due to in- stopAllContainers Low terfering with containers not created by docker-mocha. Removes all docker containers. Currently unused due to removeAllContainers Low interfering with containers outside the scope of docker- mocha. Removes all unused volumes. Used in the Runner in removeAllVolumes Low Manager Mode to free hard drive space. This function is invoked when the flag --no-checkpoint is active. It starts the envi- startVanillaWithSetups Low ronment and runs all the setups from the root to the current desired setup Starts a given environment by invoking startEnvironment Low docker-compose down Saves the current state. It will iterate all of the ser- saveEnvironment Low vices in the docker-compose file and perform a docker commit in all of them for the new state. Stops a given running environment by invoking stopEnvironment Low docker-compose down to stop all containers required by the environment. It logs the activity of the entry-point container with logEntrypoint Low docker logs. getContainerIP Low Retrieves a container’s IP Checks if a given container is ready to interact with. This is important because sometimes the containers are not ready and if some request is sent at that time, that would result in a failure. It will first retrieve the container IP and waitForConnection Low loop the requests until the container responds success- fully. The IP is currently not used since the connection test is now made inside the container instead of testing from the host machine. Used by waitForConnection. It will recursively call it- loopUp Low self until checkConnection does not return an error or a timeout occurs. 76 Appendix

Checks if a given container is available to establish a TCP connection in a given port passed through the docker- mocha execution arguments. In a first version, it was used the combination : from the host ma- chine, which proved unreliable in some situations. So a checkConnection Low new version is used where it is tested inside the container using wget localhost:. Using this version allows docker-mocha to function on other Operating Sys- tems such as macOS and windows since the direct com- munication between the host machine and container via IP is currently only possible in Linux. Allows docker-mocha to run commands on a container runCommand Low using docker exec

Table A.2: Fields of Docker Mocha

Field Type Description Saves all the data associated with the states. For each key—which represents the name of the setup statesMap JSON state—there are two properties. One for the state it depends_on and another for the path to the setup file that is used to create it. Saves all the data associated with the tests. For each key—which represents the name of a test—there are two testsMap JSON properties. One for the state required by that test and another for the path to the text file. Stores the children of a given state (a child state de- pends on the parent). For each key—which represents the name of the setup state—there can be multiple val- ues that represent the states and tests that have the key as dependencyMap JSON a direct parent or dependent state. This Map is needed for the Runner to unlock jobs as the states that they de- pend on are created. It is updated each time a test or setup is added to this class. Stores the components of the compose file. The com- pose file is originally a YAML file. However, with the composeContents JSON help of the js-yaml [32] NPM package it is possible to convert it to a JSON object. Appendix 77

composeFile string Stores the path of the compose file. Represents the name of the entry-point service used to entrypoint string run setups and tests. port int Represents the port used in the entry-point service. Represents the first job to be executed by the Runner. rootState string It will be assigned only once in the beginning once the root setup is discovered. Represents the user desire to run with the setup caching noCheckpoint Boolean mechanism or by using vanilla containers with all se- tups. Represents the user desire to not delete the already ex- noDelete Boolean isting states at the beginning of the Runner execution. Represents the user desire to execute the test suite with noDocker Boolean docker or a linear Mocha execution. Represents a configuration passed at run time to the ap- deployment_config string plication being tested.

Table A.5: Methods of DockerMocha

Mehtod Arguments Return Description Adds a state and its infor- mation to this class. Up- addState state (string) void dates the statesMap and dependencyMap. Verifies and informs of irregu- larities in the statesMap de- pendencies. The verifications are if some state does not con- verifyAndWarnState void void nect to the root –thereby is never executed–. The verification for no root is done in the Runner and immediately exits with an error. It is used by the verifyAndWarnState it loopStatesParent state (string) Boolean helps to identify dependency irregularities. 78 Appendix

Returns the state needed for a getTestState test (string) string test. getTestPath test (string) string Returns the file path of a test. Returns a list with the test iden- getTestsList void array of tifiers only. strings Adds a test to the map of tests to addTest test (string); void run. It updates the testsMap parsed tests data and dependencyMap. (JSON) Adds the path of the docker- addComposeFile compose file path void compose file and the contents. (string); com- pose componets (JSON) Returns the parent state of a getStateParent state (string) string given state. Returns all the direct connec- tions from the root setup to the given setup. To be iterated cor- getHierarchy state (string) array of rectly, the order is from the fur- strings ther ancestor setup to the closest ancestor setup Prints information about the print void void class and variable values to the console. Appendix 79

Table A.3: Possible arguments in docker-mocha calls

Argument Alias Type Description Default Value The relative path to the tests --file -f string and setups file. Relative to the tests.json project root. The relative path to the docker --compose -c string compose file. Relative to the docker-compose.yml project root. The maximum amount of --threads -t int 4 tasks running in parallel. The name of the service in the compose file where the docker-mocha tasks will be ified then the package.JSON file will be parsed and the project name will be used The port of the entry-point service. The execution only continues when the combo --port -p int 3000 : specified is ready to establish connec- tions and handle requests A specific config environment variable that will be passed to the application when running a test or creating a setup. This --config string (undefined) variable is useful for the ap- plication if it is able to choose different deployment configu- rations based on that variable Value that identifies if docker- mocha is currently starting a --testFile string (undefined) test execution task with an as- sociated test file Value that identifies if docker- mocha is currently starting a --setupFile string (undefined) setup creation task with an as- sociated setup file 80 Appendix

Figure A.1: Docker-Mocha Class References

[1] Charles Anderson. Docker [software engineering]. IEEE Software, 32(3):102–c3, 2015.

[2] Bazel. Build and test software of any size, quickly and reliably. https://bazel.build/, 2015. [Online, accessed 26-November-2018].

[3] Kent Beck. Test-driven development: by exemple. Addison-Wesley, 2003.

[4] Moritz Beller, Georgios Gousios, and Andy Zaidman. Oops, My Tests Broke the Build: An Explorative Analysis of Travis CI with GitHub. IEEE International Working Conference on Mining Software Repositories, pages 356–367, 2017.

[5] Jeanderson Candido, Luis Melo, and Marcelo D’Amorim. Test suite parallelization in open- source projects: A study on its usage and impact. ASE 2017 - Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, pages 838–848, 2017.

[6] Junjie Chen, Yiling Lou, Lingming Zhang, Jianyi Zhou, Xiaoleng Wang, Dan Hao, and Lu Zhang. Optimizing test prioritization via test distribution analysis. Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Sympo- sium on the Foundations of Software Engineering - ESEC/FSE 2018, pages 656–667, 2018.

[7] João Rocha da Silva, Cristina Ribeiro, and João Correia Lopes. Ranking dublin core de- scriptor lists from user interactions: a case study with dublin core terms using the dendro platform. International Journal on Digital Libraries, pages 1–20, 2018.

[8] Jennifer Davis and Ryn Daniels. Effective DevOps: building a culture of collaboration, affinity, and tooling at scale, chapter 4, pages 38–39. OReilly, 2016.

[9] NetworkX Developers. Networkx. https://networkx.github.io/, 2019. [Online, accessed 28-April-2019].

[10] Thomas Erl. Service-oriented architecture: concepts, technology, and design. Prentice Hall Professional Technical Reference, 2011.

[11] Hamed Esfahani, Jonas Fietz, Qi Ke, Alexei Kolomiets, Erica Lan, Erik Mavrinac, Wolfram Schulte, Newton Sanches, and Srikanth Kandula. CloudBuild. Proceedings of the 38th International Conference on Software Engineering Companion - ICSE ’16, pages 11–20, 2016.

[12] Alessio Gambi, Sebastian Kappler, Johannes Lampel, and Andreas Zeller. Cut: Automatic unit testing in the cloud. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 364–367. ACM, 2017.

81 82 REFERENCES

[13] InfoLab Information Systems Research Group. Infolab. http://infolab.fe.up.pt/, 2018. [Online, accessed 26-November-2018].

[14] Mitchell Hashimoto. Vagrant: up and running, chapter 1, pages 1–14. OReilly Media, 2013.

[15] Kelsey Hightower, Brendan Burns, and Joe Beda. Kubernetes: up and running dive into future of infrastructure. OReilly, 2017.

[16] Pablo Iorio. Container based architectures i/iii: Techni- cal advantages. https://medium.com/@pablo.iorio/ container-based-architecture-i-iii-technical-advantages-7176195456c5, 2017. [Online, accessed 16-May-2019].

[17] Paul Jorgensen. Software testing: a craftsmans approach, chapter 1, pages 3–12. CRC Press, 2013.

[18] Chuanqi Kan. DoCloud: An elastic cloud platform for Web applications based on Docker. In- ternational Conference on Advanced Communication Technology, ICACT, 2016-March:478– 483, 2016.

[19] Sebastian Kappler. Finding and breaking test dependencies to speed up test execution. Pro- ceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Soft- ware Engineering - FSE 2016, pages 1136–1138, 2016.

[20] Adrian Kuhn, Bart Van Rompaey, Lea Haensenberger, Oscar Nierstrasz, Serge Demeyer, Markus Gaelli, and Koenraad Van Leemput. JExample: Exploiting dependencies between tests to improve defect localization. Proc. XP, pages 73–82, 2008.

[21] Jung-hyun Kwon and Gregg Rothermel. Prioritizing Browser Environments for Web Appli- cation Test Execution. 2018 IEEE/ACM 40th International Conference on Software Engi- neering (ICSE), pages 468–479, 2018.

[22] Jung-hyun Kwon and Gregg Rothermel. Prioritizing Browser Environments for Web Appli- cation Test Execution. 2018 IEEE/ACM 40th International Conference on Software Engi- neering (ICSE), pages 468–479, 2018.

[23] Brent Laster and Kohsuke Kawaguchi. Jenkins 2: up & running: evolve your deployment pipeline for next-generation automation, chapter 1, pages 1–21. OReilly, 2018.

[24] Karl Matthias and Sean P. Kane. Docker: up and running. OReilly, 2016.

[25] Volodymyr Melymuka. TeamCity 7 continuous integration essentials, chapter 1, pages 7–17. Packt Pub., 2012.

[26] Andreas Menychtas, Theodora Varvarigou, and Anna Gatzioura. A business resolution en- gine for cloud marketplaces. Proceedings - 2011 3rd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2011, pages 462–469, 2011.

[27] Gerard Meszaros. XUnit test patterns: refactoring test code, chapter 19, pages 358–361. Addison-Wesley, 2010.

[28] Ming Huo, J. Verner, Liming Zhu, and M.A. Babar. Proceedings of the 28th annual interna- tional computer software and applications conference. Proceedings of the 28th Annual Inter- national Computer Software and Applications Conference, 2004. COMPSAC 2004., 2004. REFERENCES 83

[29] Moby. docker commit data container with volume · issue #6999 · moby/moby. https:// github.com/moby/moby/issues/6999, 2014. [Online, accessed 27-November-2018].

[30] Org MochaJS. the fun, simple, flexible javascript test framework. https://mochajs. org/, 2019. [Online, accessed 01-October-2018].

[31] Sam Newman. Building microservices: designing fine-grained systems. " O’Reilly Media, Inc.", 2015.

[32] nodeca. js-yaml. https://github.com/nodeca/js-yaml, 2012. [Online, accessed 27-April-2019].

[33] Inc NPM. What is npm? https://www.w3schools.com/whatis/whatis_npm. asp, 2010. [Online, accessed 15-November-2018].

[34] Mazedur Rahman, Zehua Chen, and Jerry Gao. A service framework for parallel test execu- tion on a developer’s local development workstation. Proceedings - 9th IEEE International Symposium on Service-Oriented System Engineering, IEEE SOSE 2015, 30:153–160, 2015.

[35] Mazedur Rahman, Zehua Chen, and Jerry Gao. A service framework for parallel test execu- tion on a developer’s local development workstation. In Service-Oriented System Engineer- ing (SOSE), 2015 IEEE Symposium on, pages 153–160. IEEE, 2015.

[36] Alfonso V. Romero. VirtualBox 3.1: beginners guide: deploy and manage a cost-effective virtual environment using VirtualBox, chapter 1, pages 7–42. Packt Pub., 2010.

[37] Joao Rufino, Muhammad Alam, Joaquim Ferreira, Abdur Rehman, and Kim Fung Tsang. Orchestration of containerized microservices for IIoT using Docker. Proceedings of the IEEE International Conference on Industrial Technology, pages 1532–1536, 2017.

[38] Mathijs Jeroen Scheepers. Virtualization and containerization of application infrastructure: A comparison. In 21st twente student conference on IT, volume 1, pages 1–7, 2014.

[39] Goasguen Sébastien. Docker cookbook:. OReilly, 2015.

[40] SeleniumHQ. Seleniumhq/. https://github.com/SeleniumHQ/ selenium/wiki/Grid2, 2016. [Online, accessed 27-October-2018].

[41] John Ferguson. Smart. BDD in action: behavior-driven development for the whole software lifecycle. Manning, 2015.

[42] Andreas Spillner, Tilo Linz, and H. Schaefer. Software testing foundations, chapter 3, pages 39–77. Rocky Nook, 2014.

[43] Panagiotis Stratis and Ajitha Rajan. Test case permutation to improve execution time. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engi- neering, ASE 2016, pages 45–50, New York, NY, USA, 2016. ACM.

[44] Marco Antonio To, Marcos Cano, and Preng Biba. DOCKEMU - A Network Emulation Tool. Proceedings - IEEE 29th International Conference on Advanced Information Networking and Applications Workshops, WAINA 2015, pages 593–598, 2015.

[45] Tutorialspoint.com. Node.js introduction. https://www.tutorialspoint.com/ nodejs/nodejs_introduction.htm, 2009. [Online, accessed 15-November-2018]. 84 REFERENCES

[46] Vmware Workstation. Using VMware Workstation. https://pubs.vmware. com/workstation-9/index.jsp?topic=%2Fcom.vmware.ws.using.doc% 2FGUID-55FF3F07-6C2E-41F7-B361-C7D870BCC4D7.html, 2012. [Online, accessed 30-November-2018].