DEGREE PROJECT FOR MASTER OF SCIENCE IN ENGINEERING

GAME AND SOFTWARE ENGINEERING

Mocking SaaS Cloud for Testing

Johannes Henriksson | Simon Svensgård

Blekinge Institute of Technology, Karlskrona, Sweden, 2017

Supervisor: Mikael Svahnberg, Department of Software Engineering, BTH

Abstract

In this paper we evaluate how is affected by the usage of a mock-object, a dummy implementation of a real object, in place of having data in a cloud that is accessed through an API. We define the problems for testing that having data in the cloud brings, which of these problems a mock-object can remedy and what problems there are with testing using the mock-object. We also evaluate if testing using the mock-object can find the same faults as testing against the cloud and if the same code can be covered by the tests. This is done at Blekinge Institute of Technology(BTH) by creating an integration system for the company Cybercom Sweden and Karlskrona Municipality. This integration system is made in C] and works by syncing schedules from Novaschem to a cloud service, Calendar. With this paper we show that a mock-object in place of a cloud is very useful for testing when it comes to clean-up, triggering certain states and to avoid query limitations.

Keywords: Mock-Object, , Testing, Test-Evaluation

i

Sammanfattning

I detta arbete utvärderar vi hur programvarutestning påverkas av användandet av ett mock-objekt, en dummy-implementation av ett riktigt objekt, istället för att ha data i ett moln som man kommer åt via ett API. Vi definierar de problem som uppkommer av att ha data i molnet, vilka problem som kan avhjälpas av mock-objektet och vilka problem mock-objektet medför. Vi utvärderar även om testning med ett mock-objekt kan finna samma fel som testning mot molnet och om samma kod kan täckas av testerna. Detta görs på Blekinge Tekniska Högskola(BTH) genom att skapa ett integrationssystem för företaget Cybercom Sweden och Karlskrona Kommun. Integrationssystemet görs i C] och fungerar som så att det synkar scheman från Novaschem till en molntjänst, Google Calendar. Med detta arbete visar vi att ett mock-objekt istället för molnet är väldigt användbart när det kommer till städning efter tester, att utlösa vissa tillstånd och för att undvika begränsningar.

Nyckelord: Mock-Objekt, Molntjänster, Testning, Testutvärdering

iii

Preface

This thesis is the final step for our master in game and software engineering at BTH and represents 20 weeks of full time study.

At BTH, we would like to thank our supervisor Mikael Svahnberg and our examiner Emil Alégroth for valuable feedback during the project. As well as Bogdan Marculescu for help with insight into the world of academia.

We want to thank Alexander Andersson and Johan Persbeck at Cybercom Sweden for providing us with the idea and assistance throughout the project.

We also want to thank those who have read and given feedback on this thesis.

v

Nomenclature

Acronyms

API Application Programming Interface AWS Amazon Web Services FITTEST Future Internet Testing IaaS Infrastructure as a Service NIST National Institute of Standards and Technology PaaS Platform as a Service SaaS SUT System Under Test UI User Interface

vii

Table of Contents

Abstract i Sammanfattning (Swedish) iii Preface v Nomenclature vii Acronyms ...... vii Table of Contents ix 1 Introduction 1 1.1 Introduction ...... 1 1.2 Background ...... 1 1.3 Objectives ...... 2 1.4 Delimitations ...... 2 1.5 Thesis question and technical problem ...... 2 2 Theoretical Framework 3 2.1 Cloud Computing ...... 3 2.2 Testing ...... 5 2.3 Test Evaluation ...... 8 3 Method - Design 11 3.1 Literature Study ...... 11 3.2 Interview and Observations ...... 12 3.3 Testing ...... 12 3.4 Test Evaluation ...... 12 4 Method - Execution 15 4.1 Literature Study ...... 15 4.2 Interview and Observations ...... 15 4.3 System Under Test ...... 16 4.4 Testing ...... 20 4.5 Test Evaluation ...... 21 4.6 Validity Threats ...... 22 5 Results 23 5.1 Research Question 1 ...... 23 5.2 Research Question 2 ...... 26 5.3 Research Question 3 ...... 28 6 Discussion 29 6.1 Coverage Measurements ...... 29 6.2 Mutation Score ...... 29 6.3 Advantages ...... 29 6.4 Disadvantages ...... 31 6.5 When to use a mock-object ...... 31 6.6 Sustainability ...... 31 7 Conclusions 33 8 Recommendations and Future Work 35 References 37

ix A Interview Developer Cybercom A-1 B Example unit tests from Test Suite A-5 B.1 EventExists ...... A-5 B.2 NewActivityAdded ...... A-6 C Example Mock-Object Unit Test A-7

x 1 INTRODUCTION

1.1 Introduction

Cloud computing [3] has been on the rise over the past decade[18] with a heavy increase of its use in the software industry. A cloud is to put it simply, computer functionality that can be reached over network. Cloud computing can, depending on the cloud service, be used to host executable code and/or data through different ways of access; everything from simple Application Programming Interfaces (APIs) or User Interfaces (UIs) that give the user very little control to full control over the system. The focus of this paper is the highest abstraction level of cloud services, Software as a Service (SaaS), where the user have the least control over the cloud system.

When it comes to testing of a system that uses data that is stored on a SaaS cloud, there are some extra challenges compared to having the data in a database. Since a SaaS cloud can only be accessed at a high abstraction level, such as through an API, the available requests can be limited both in how many requests can be sent and the amount of different requests; which limits the testing that can be done. This area of testing is very relevant to modern internet and not that much have been written about it before which makes it an area that have potential to evolve.

This study is focused on evaluating how testing such a system will differ when using a mock of the SaaS in the form of a database compared to regular testing of the system. This is done in two main parts, with the first part being a classification of differences when testing an application that make use of a SaaS cloud and a database respectively to store data. This classification is based on a literature study as well as observations and an interview at Cybercom.

The second part is done by implementing an integration system between a scheduling service by Nova Software, Novaschem[26], and a SaaS, Google Calendar[12]. This is done to see in practice what difficulties there are when it comes to testing this kind of system and if a mock of the SaaS cloud is useful in a concrete scenario. In addition to that the testing is evaluated using three test measurements, Block coverage, Branch coverage and Condition coverage.

With respect to cloud testing, there is plenty of research discussing how to use the cloud for testing[34][16]. However, very little is written about testing of applications that use cloud services. This study is made to see if there are any specific problems and if a mock pattern can be used to make testing easier and in what way.

1.2 Background

Cybercom Group Karlskrona have been tasked by the municipality of Karlskrona with creating an integration system between Novaschem, a scheduling system by Nova Software, and Google Calendar, a SaaS cloud. To make sure the system is robust the system needs to be well tested. Since Cybercom does not have any special routines for testing applications that are using data in a cloud and there are some challenges within this area that we thought could be explored more in depth with the possibility of finding a better testing routine.

1 1.3 Objectives The goal of this thesis project is to create a stable integration system between a scheduling service called Novaschem and Google Calendar. This integration system needs to be tested thoroughly to ensure that it is stable and functional. When testing two different approaches will be compared, regular unit testing against Google Calendar, the SaaS cloud, and unit testing using a database as a mock object. These two approaches will be evaluated and compared both in terms of what challenges and experiences from applying the techniques, but also by measuring different unit test coverages for the two test suites.

1.4 Delimitations The implementation of the integration system which will be used to compare and evaluate the testing will make use of a single SaaS, Google Calendar, and results could vary depending on what SaaS is used. However some general conclusions can be drawn from the results when taking into account the other data sources for the case study, such as the theoretical framework and looking at documentation of other SaaSs services. When looking at these SaaS we limit ourselves to 3rd party public SaaS clouds.

The study is limited to only unit testing and how using a database mock object in place of a SaaS can be used for this.

1.5 Thesis question and technical problem The technical problem that has led to the research questions for this thesis is the implementation of a robust integration between a scheduling system called Novaschem by Nova Software and the cloud calendar system by Google, Google Calendar.

RQ1: When it comes to testing, what are the main differences between an application using SaaS instead of databases for data storage?

RQ2: What are the challenges and experiences when unit testing against a mock-object compared to unit testing against the actual API when testing a cloud integration application?

RQ3.1: Given the proposed integration system, will the test coverage achieved differ between testing using the SaaS and using the mock-object?

RQ3.2: Given the proposed integration system, will the found defects differ between testing using the SaaS and using the mock-object?

2 2 THEORETICAL FRAMEWORK

The main theoretical parts of this study is cloud computing with a focus on SaaS clouds; testing using unit testing and mock-objects; and test evaluation using code coverage and mutations.

When it comes to testing and clouds there are many scientific papers about testing applications by using the cloud for computing power, called Testing as a Service(TaaS),[34][43] or testing of systems located on a cloud, SaaS applications[16][32]. This part of testing and clouds are not relevant for this study and are not included in the theoretical framework.

When it comes to specifically testing of systems that are using data in a SaaS cloud there is less material available. The framework thus contains separate research of cloud computing and mock testing as well as FITTEST research project that is a more general research project spanning more than only SaaS clouds.

2.1 Cloud Computing When talking about cloud computing, different researchers give different definitions of what it actually means.

In an article by Armbrust et al.[4] at UC Berkeley they defined cloud computing as the sum of SaaS and Utility Computing. SaaS refers to software or services being delivered to customers across the internet, with the software running on datacenters instead of client computers. Utility computing deals with the sale of computational resources, where clients are charged for computation time instead of having flat rates for renting actual hardware. Armbrust et al.[4] also limits the definition of cloud computing to only include public clouds, which means that the systems are made available to the general public in some form, removing businesses internal datacenters from the cloud computing definition.

National Institute of Standards and Technology (NIST) in USA have also made a definition of cloud computing which is more extensive, it states:

"Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model is composed of five essential characteristics, three service models, and four deployment models." [22]

The five essential characteristics are:

• On-demand self-service, which means that the consumers should be able to provision computing server time, network storage and similar on their own without contacting service providers. • Broad network access, which means that the service should be available over the network and accessible on different client platforms and devices such as phones, tablets, laptops or computers. • Resource pooling, which means that the providers computing resources should be pooled

3 and serve multiple consumers by reassigning virtual resources dynamically to serve different consumer demands. The consumers should not have any direct knowledge about the location of physical resource except at a high abstraction level such as where the datacenter is located.

• Rapid elasticity, which means that computing capabilities should be elastically provisioned and released, preferably automatically, so that it scales after the consumer demands, thus creating a sense of unlimited computing resources.

• Measured service, which means that cloud systems should automatically measure the the service(bandwidth, storage, processing etc.) and use it to control and optimize the resource usage. This measuring can also be used for statistics for the consumers as well as for the provider to calculate the price of the service depending on the consumers usage.

The four deployment models deals with access for the cloud infrastructures, with private, public, community and hybrid models. The private cloud is deployment that is exclusive for a single organization while the community cloud a specific community of consumers that share specific concerns, e.g. security requirements. The public clouds are instead open to the general public instead of being restricted to specific companies. The hybrid clouds are a composition of at least two of the previous three deployment models that remain unique entities. However the models are bound together by proprietary or standardized technology to enable portability for applications and data which enables techniques like cloud bursting(using computational resources from the other cloud infrastructure in times of high demand).

Figure 2.1: Image showing the different service models with their different access levels.

The three service models represent different abstraction levels up from the hardware. Starting with the lowest level of abstraction we have Infrastructure as a Service (IaaS) which is more or less just providing hardware capabilities with an underlying cloud infrastructure. The consumers can choose everything from operating system and upwards themselves according to their needs.

4 At the next level of abstraction we have Platform as a Service (PaaS) which takes away certain control from the users. PaaS is used by users wanting to run their own application, without keeping it running on their own hardware. The PaaS infrastructure does not let users control any of the underlying cloud infrastructure which includes operating system, storage, network and servers. The users only control the deployed applications and configuration settings relating to the platform environment.

The final service model, with the highest abstraction level is Software as a Service (SaaS) in which even the application provided as a service. These applications run on the cloud infrastructure and are available to the users through different clients such as web-clients or mobile application clients. The only configuration available for the users are specific settings for the application, leaving no control over the underlying components like operating systems or servers. Some SaaS systems also provide an API(Application Programming Interface) for communicating with the service to allow other developers to create their own systems that uses their existing system or data in some way.

The three different service models can be visualized easily as a different layer of abstractions where the main difference is the access available as illustrated in figure 2.1. As the figure illustrates, the service models can easily build upon each other, where each subsequent service model can be created based on the previous service model. This means that a SaaS cloud can be run upon a PaaS cloud which in turns run on an IaaS cloud. This however depends on the actual cloud provider and different providers grant access to the different service models, usually at different price rates. One example is Microsoft Azure [31] which provides either a PaaS cloud or an IaaS cloud.

2.2 Testing Software testing[29] is a practice that can be used during the entire development life-cycle of a software. Starting from the requirements of the software, tests can be designed before the actual program that can be used to assess that the software fulfils the specifications[15]. For a released product testing can be used both to further improve the system by finding previously hidden defects, but also by verifying that new features and updates does not break previous functionality.

Testing can also used to ensure a certain quality in the developed software[6]. Testing can be used to find defects in the software and ensure that the software adheres to the specified design and purpose. The testing of the software can verify whether a system performs as expected, both during the development phase, but also during refactoring or maintenance. By discovering and eliminating defects in the software the quality of the system is improved.

2.2.1 Unit Testing Unit testing is a testing method where small parts of the code, units, are tested individually. In a system that uses an object oriented language and practises the units are most the different classes of the system. Unit tests are used to test the functionality of the classes by calling different functions using specific input data. Due to inter class dependencies in the system a class can not always be isolated in unit tests. When the unit tests cover bigger and bigger parts of the system the unit tests themselves generally become more and more complex as well due to the inter class dependencies[28].

5 2.2.1.1 Unit Testing with Mock-Objects In 2001 Mackinnon et al. introduced the concept of unit testing with mock objects in the paper "Endo-Testing: Unit Testing with Mock Objects"[19]. In the paper they describe mock objects as dummy implementations that emulates the real functionality of the objects that they represent. A unit testing pattern by Clifton[7] provides a brief description on the usage of mock objects for unit testing. The pattern makes use of abstract methods, interfaces and the factory pattern to be able to work with both the real implementation object, but also the mock object implementation.

Thomas and Hunt[37] presents a list of seven reasons for using mock objects, simplified from the original paper by Mackinnon et al. as seen in the list below.

• The real object has non-deterministic behaviour. • The real object is difficult to set up. • The real object has behaviour that is hard to trigger. • The real object is slow. • The real object has (or is) a user interface. • The test needs to ask the real object about how it was used. • The real object does not exist yet.

However, there is also certain limitations that are associated with testing using mock objects. A big risk when using mock objects is the validity, or how well the mock can represent the actual object. Errors that are present in the mock object can both fail tests that should pass, as well as pass tests that should fail leaving errors that might have been caught when testing with the actual object.

The procedure for unit testing using mock objects described by Mackinnon et al.[19] can be described in a simple step-by-step format.

• Create instances • Set the states in the mock objects (Setup preconditions) • Set the expectations for the mock objects (Setup expected results) • Call the domain code, using the mock objects (Run the actual test case) • Verify the mock-objects (Verify the results with expected results, Asserts)

2.2.2 FITTEST Research Project Between 2010 and 2013 there was a research project funded by the European Commission lead by Dr. Tanja E. J. Vos from Universidad Politécnica de Valencia. This research project was called Future Internet Testing (FITTEST) and had a focus on developing a test suite for Future Internet applications [11]. When testing Future Internet applications there are several challenges compared to testing a regular application [39]. The challenges that the researchers identified with testing the Future Internet applications are described in the list below.

• Self Modification and Autonomic Behaviour: Many Future Internet applications make use of Service Level Agreements and dynamically loaded components to better be able to adapt

6 to different use scenarios. Together with the autonomous behaviour of Future Internet applications which causes them to be hard to properly define during the design-phase, there is a greater need for more testing, in the case of the FITTEST project they complement with Continuous Testing. • Asynchronous Interactions: Due to the highly asynchronous nature of Future Internet applications with many clients accessing the applications and services with multiple requests which makes the testing require additional concurrency aspects to achieve proper test coverage. • Time and Load Dependent Behaviour: Reproducing bugs is made difficult due to the effect that timing and load conditions have on the applications which can cause specific errors only during very specific conditions. • Huge Feature Configuration Space: Future Internet applications usually have a large amount of options and configurable features and environment details. This causes the applications to have a larger domain which requires testing. • Ultra Large Scale: Generally speaking, Future Internet applications consists of systems of systems, which causes low test coverage for even good test situations due to the inadequacy of traditional testing criteria for these types of systems. • Low Observability: The systems that make up the Future Internet applications are increasingly third party systems or services. These systems and services are often accessed in a black box fashion which is harder to test compared to in-house systems.

To deal with the identified challenges, the researchers, Vos et al.[40], developed several different testing techniques catered for Future Internet applications. These techniques were then used in case studies at four different companies; IBM, Softeam, Sulake and Clavei. Each case study used a different subset of testing techniques depending of the needs of the company. The techniques developed for this research project are briefly described in the list below.

• Continuous Testing: For continuous testing they have developed a technique that creates new test cases based on logs generated by the end-users running the system. By using the logs they infer a finite state machine as a behaviour model for the System Under Test (SUT). By traversing the state machine new test cases can be generated. To make sure that the new test cases are different from the logged executions a combinatoric approach is used on the execution parameters. This technique uses oracles that are inferred from the same logs used for the test case generation. Because of this all of the errors discovered needs to be checked manually to make sure that they actually represent errors. • Regression Testing: They use audit testing for their test suite as a form of regression testing. Their goal with this technique is to make sure that new services or new releases of current services are compliant with the system. Their approach is to perform a test case prioritization using a technique called Change Sensitivity Test Prioritization which detects the most important test cases based on mutations of the semantics. • Combinatorial Testing: This is a technique that designs test cases for a SUT by combining different input parameters. To improve on this they use a simulated annealing hyperheuristic search algorithm to improve the results. This is a type of unsupervised machine learning that learns a combinatorial strategy to improve the test case generation. • Rogue User Testing: This type of testing is focused on Graphical UIs as a way to automate the testing of all possible actions that the user can take from the interface provided by the

7 application. The test uses the state of the Graphical user interface as a starting point and selects and executes an action from all possible actions available in the current state. An oracle is then used to control the resulting state, saving sequences that produce invalid states for replay. • Concurrency Testing: This technique deals with testing the issues that arise from concurrent users such as data races and deadlocks. The FITTEST project has improved on an existing testing tool by IBM called IBM Concurrency Testing Tool. It works by inserting noise and delays into the program and then testing commands using different schedules to identify concurrency problems and unintended behaviour.

2.3 Test Evaluation There are many different characteristics that can be used when evaluating different test suites. You can look at characteristics like cost of generating tests, the time required to run the test, different measurements that describe to what extent different areas of code are tested and the number of faults they detect.

2.3.1 Test Coverage There exists many different types of measurements that can be evaluated for a test suite [13, 1, 44], however these measurements do not specify on their own how well a program is tested. Instead of measuring how well a program runs, these measurements are used when comparing different test suites to each other. When combining the measurements with the results from the tests, as errors and defects found, it can be used as a base for comparing test suites. In the list below three different coverage measurements are described.

• Code coverage measures the percentage of the code that is covered by tests. Block coverage and Statement coverage are two coverage measurements with the same focus. Block coverage measures in terms of blocks which are sequential statements without any outward or inward flow of control, while code coverage and statement coverage measures in terms of statements(rows of code). Code and Block coverage are two common coverage criteria used by many researchers [13, 1] when comparing test suites or coverage criteria. Full code or block coverage is achieved when all blocks or statements in the program are executed at least once by the tests in the test suite. • Branch coverage, also known as decision coverage, is another common coverage criterion[13, 1, 44] which instead of dealing with the programs total statement percentage, like code coverage, measures how many branches that have been executed. It measures how many branches that have been traversed by the tests, that is where a decision can evaluate to either true or false and take different paths depending on the result. As such complete branch coverage is achieved when all of the branches have been executed at least once by the test suite. This means that all points of control flow in the program must evaluate to both true and false for complete coverage. • Predicate coverage, also known as Condition coverage[13], can be seen as a more thorough version of branch coverage. Instead of looking at only the outcome of a decision, as is done in branch coverage, predicate coverage looks at every individual condition. This leads to complete predicate coverage requiring that every single predicate, or condition, in all of the program decisions need to evaluate to both true and false during the execution of the test suite.

8 2.3.2 Fault Detection When evaluating and comparing different test suites, a measurement on how many actual faults or defects is good to take into consideration. A way to do this is by creating mutants of the program. A mutant is a program with a planted fault[45]. If a test suit detects a fault when running the mutated program, by having a different result on at least on test, the mutant is considered to be dead or killed. Test suites can then be evaluated and compared based on how many mutants the respective test suites manages to kill. By taking the percentage of the killed mutants compared to the mutants created it can be used as a measurement for mutation adequacy[45].

9

3 METHOD - DESIGN

A case study is used to investigate contemporary phenomena in a specific context while making use of multiple sources of evidence[33]. Because of these characteristics a case study, using the guidelines provided by Runeson and Höst, was chosen as the base of the scientific method for this thesis. The case used in the case study was provided by Cybercom, an external company that needed a system implemented. The selection of the case can be seen as a revelatory study. This thesis makes use of different sources of data to answer the research questions, which is a defining criteria of a case study.

The data sources for the case study are the implementation of the integration system, which is used as the actual case, a literature study, field observations and an interview with a developer at Cybercom. The implementation part includes both the actual implementation of the system but also the testing, coverage and fault detection measurements performed on the finished system. The data from the different sources is compared and combined to produce the results.

To get the results for RQ1 the literature, documentation for cloud services and observations/inter- view at Cybercom is used to identify the differences. When the differences are identified the results from RQ1 is used as a base to get results for RQ2, by looking at how these differences are affected by the use of a mock-object. Observations are done during testing to see if there are additional experiences or challenges with using and implementing a mock-object for the SaaS cloud. Lastly, to answer RQ3, test coverage and fault detection is analysed using the developed mock-object.

Using experiment as the scientific method was evaluated, however in the scope of the entire thesis a case study was deemed to be the better choice. The study makes use of both quantitative and qualitative data collection, while an experiment would look only at quantitative[42]. These qualitative aspects that are used make a case study a better choice than experiment for this thesis.

Action research was also evaluated as a possible methodology. While action research uses participant observation and interviews as key data collection[17]. While both participant observation as well as interview and field observations are used as for data collection, a big focus of this research makes use of both actual implementation and testing as well as measuring the differences between test suites. As such it is a better match to use a case study as the methodology for this project.

The different data collection parts that are used for this study are described in greater detail in the sections below.

3.1 Literature Study A literature study is a necessary part of the project both to determine the gap and which testing approaches are relevant for testing applications using data in the cloud but also to provide a solid theoretical framework for the project.

The literature study is used to derive the theoretical framework and also as a base for answering the first research question. The relevant scientific literature is used in combination with an interview of an employee of Cybercom Karlskrona and technical documentation of SaaS APIs as

11 a base for mapping out the differences between testing and application using SaaS or databases. This mapping is derived from a search of differences between cloud systems, with a specific focus on SaaS clouds, and databases.

3.2 Interview and Observations Since the study is done for a company with experience in implementing and testing similar systems, developers at the company can give insight into the subject and what is common practice. Throughout the study, observations can be done.

Talking with developers that are knowledgeable about the subject adds a good base for knowing about the relevance of the subject, if the current way of testing is in need of improvement, and also what is common practice when testing systems with data in a SaaS cloud.

In addition to observations made by informal meetings a more formal semi-structured interview is planned[8]. This kind of interview uses a set of prepared open questions so that the interviewed person gets a clear topic but can stray a bit so that unexpected perspectives may appear.

3.3 Testing To see what difficulties and experiences there are when testing an application that uses data in a SaaS cloud, and to what extent a mock-object can be used for testing, a system with these properties is used. In order to compare the differences directly, a Mock-Object Pattern[7] which uses a database as the base for the mock-object can be used.

After implementing the mock-object, the system is tested using a test suite, which is run using both the mock-object connection and the real SaaS connection. This test suite is used to evaluate the usage of a mock-object using a database for replacing SaaS during testing. Tests that can not use both the mock-object and SaaS connection is not part of the test suite used for measurements on usage of mock-object compared to SaaS. Some tests like these is however created to evaluate the challenges and experiences gained by testing with a mock-object and a SaaS.

3.4 Test Evaluation The evaluation of the testing done on the system is done in two parts, coverage measurements and fault detection.

3.4.1 Test Coverage For coverage measurements three different types of coverage are measured, the coverages that are evaluated are:

• Statement coverage • Branch coverage • Condition coverage.

The planned process for collecting the coverage data is described in figure 3.1.

12 The three coverage measurements are chosen as they represent which parts of the codes that are exercised when running the unit tests. Both the Statement and Branch coverage provides a good overview of what parts of the code that are completely uncovered by tests, which can help in generating new test cases that exercise the uncovered code. The Condition coverage measurement, while similar to Branch Coverage, is chosen because it represents more specific paths in system. The Condition Coverage will, when used in relation to the Branch Coverage, be able to show where branches are not triggered by certain flags. This can point towards specific cases which are missed by the unit tests. While coverages does not give any indication of whether the unit tests performed are of a good quality or not, they can still provide useful information to if the test runs differ.

If the coverage results differ when running the test suites with different connections, it can point to different things. If the combined coverage is greater than the other two measurements, different code is executed when running the tests with the two different connections. This can potentially point towards validity concerns in the mock-object, since the results returned from the mock-object could be different from what is returned by the real SaaS object. The same can be said for when the coverages differ between the mock object and the real object. If the coverages are different it points to inconsistencies between the mock object and the real object. While the mock-object should be a simplified implementation of the real object, the key functionality should still stay the same to gain the ability to use it for testing. So having the same coverage for the two connection types is the desired outcome, since it points to the connection types running same code.

Figure 3.1: Design of how the coverage measurements are taken. The code base will be tested by running two different test suites, as well as running them together. Coverage measurement tools are then used to obtain the coverage measurements for the two test suites as well as the measurements when combining the test suites.

3.4.2 Mutation Testing When measuring the number of faults each test suite can find the process in figure 3.2 is used. The system is injected with synthetic faults, creating mutated program versions called mutants. These mutant programs are then tested using the test suite, counting the number of mutants that are successfully eliminated(detected by the tests). The results from this process are presented as percentages of the total number of mutants that each test suite are able to eliminate, called mutation score.

In the case of the mutation score being different between the two connection types, it can happen in two different ways. On one hand, if the real object manages to eliminate mutations that the mock-object does not, it points towards mock-objects not being able to comfortably replace SaaS for testing purposes. When this happens it can be concluded that the mock-object can not be used to replace the real object for testing as the test suite using the mock-connection is not able to properly identify faults in the code.

13 On the other hand, if the mock-object instead manages to eliminate mutations that the real SaaS object does not it becomes trickier. These faults then need to be examined, since they could represent actual errors in the code. But even if they represent an actual error, the question of whether it is a relevant error comes into question. If the error can not be found when using the SaaS, is it an actual problem for the system? Fixing errors than can not happen during production could be seen as a waste of resources.

The goal is of course to have the mutation score the same when using the mock-object connection and the real SaaS connection, as it points towards the mock-object being able to replace the SaaS object for testing purposes.

Figure 3.2: Design of how the number of faults detected are measured. The integration system receives planted faults creating a set of mutants. These mutants are then tested using the test suites to obtain the percentage of eliminated mutants compared to number of mutants created.

14 4 METHOD - EXECUTION

4.1 Literature Study

The literature study is executed using the reference databases scopus[35] and inspec[14] as well as some assistance from researchers at BTH.

Since cloud is a concept that is defined in many different ways some precaution is needed to find relevant information. As noted in section 2, there are a lot of research for testing using clouds(TaaS) and testing systems that are located on a cloud that is irrelevant for this study. So in order to avoid them it is needed to filter results by reading titles and abstracts.

FITTEST[39] list of challenges with testing applications in the "Future Internet", which include SaaS clouds. This is used as a base of identifying the differences between testing an application using SaaS or databases.

By looking at characteristics that define cloud systems, and SaaS clouds in particular, different aspects that have an effect on the testing of applications using these systems can be found. The characteristics that define cloud systems are based in the theory and definitions of cloud systems. The cloud system definitions and theory can be looked at together with actual technical documentation of specific cloud systems to provide a set of constraints that affect the way testing of applications that make use of these systems. These characteristic and constraints are then compared to how applications that use a database can be tested.

4.2 Interview and Observations

The company that the project is done for, Cybercom, have a Google integration consultant with experience in developing software that uses data in SaaS clouds as well as databases. Because of that, a person with knowledge on the subject was easily accessible for the semi-structured interview.

The interview, Appendix A, is conducted at the beginning of the project, taking place at Cybercom Karlskrona using questions that have been prepared before the interview. The interview is made face to face, while using a recording device to record the audio of the interview to avoid transcribing during the interview. The interview is then manually transcribed into text at a later date.

The questions used for the interview are created with the intent of seeing what problems are perceived as problems in the eyes of professional developer with experience in the field. The problems that have been identified by FITTEST as problems and are applicable to systems using SaaS clouds are used as a base for the questions; as well as letting the questions be open in case there are other difficulties that were not identified prior to the interview.

As for the observations, regular meetings is held at Cybercom and discussions regarding problems and solutions is discussed with developers.

15 4.3 System Under Test The system that is used for testing is an integration system between a scheduling software called Novaschem, by Nova Software, and Google Calendar’s SaaS. An overview of the integration systems connections can be seen in figure 4.1. The testing of the system is done for the integration system(Pink box), with the connection being controlled by a factory which instantiates either the real connection, to Google Calendar, or to a Mock-Object which uses a MySQL database. The following subsections will describe the system in greater detail.

Figure 4.1: An overview of the System under Test, with the connections mapped out. The outgoing connection is based on the Mock-Object pattern. The system makes use of a factory to create the mock-object or SaaS connection depending on the chosen setting.

4.3.1 Google Calendar The SaaS database that is used for this case study is Google Calendar. Google Calendar is a free-to-use online calendar service that can be accessed through a normal web browser such as Internet Explorer or Google Chrome, or it can be accessed through an API which is compatible with a wide array of programming languages, for example python, java, php, http and C] /.NET.

All possible requests and how to use them can be accessed for the API can be found on the Google developers webpage[12]. The core functionality provided by the API supports different methods of modifying the calendars and events. Events can be created, removed or edited for different calendars owned by users. In a likewise fashion calendars can also be created, edited

16 and removed for users, however the creation of calendars is more restricted with heavier usage limits.

Google Calendar has several different usage limitations [5] when it comes to accessing the API as is common when it comes to SaaS clouds. These limitations cover different aspects of the API and has different effects when they are exceeded. The limitations for Google Calendar can be seen in table 4.1. As we can see in the table, there is a hard limit for the number of queries or requests that can be done during a day. There is also a limit preventing too many queries from being done in a short period of time, in this case 100 seconds. These hard limits prevent any further requests for the duration the limit covers, either for the rest of the day or for the remainder of the 100 seconds.

For Google Calendar there are also more specific limitations imposed on the API, as seen in the bottom rows of table 4.1. These limits concern specific requests or actions that the user can perform, as a safeguard against either faulty or malicious use. When these specific limitations are exceeded the SaaS goes into read only mode for the user that exceeds the limitations. The exact lockout time is not specified in the official documentation but rather it states that the lockout lasts for several hours. These limitations are above what Google considers normal usage of the SaaS and as such only possible to be reached when performing administrative tasks using third party applications that make use of the API. To get a better understanding of the limits, and the durations which are not specified in the documentation a small test was conducted to help with deciding the overall control flow of the synchronization.

Type of Request Limit Limit Duration Penalty for exceeding All Queries* 1’000’000 Per Day Hard limit - requests fail for limit duration All Queries* 500 Per 100 Seconds Hard limit - requests fail for limit duration Create Calendar† 25 Short Duration API becomes read only for a few hours Create Event† 10’000 Short Duration API becomes read only for a few hours Invite to Event† 100-300 Short Duration API becomes read only for a few hours Table 4.1: Google Calendar usage limits * https://developers.google.com/google-apps/calendar/pricing † https://support.google.com/a/answer/2905486?hl=en "Short Duration" is the limit that is mentioned in the documentation. Results from a short test to determine the duration can be seen in section 5.1.3

4.3.2 Integration System The integration system is made in Microsoft Visual Studio using the C] and .NET, C] and .NET is used since .NET is the only programming language that is supported by the API of Novaschem. This integration system is made for the Karlskrona Municipality to synchronize the schedules for pupils and teachers in the public schools to Google Calendar. Since all pupils and teachers already use Google Groups and Google Mail they already had Google accounts and with that Google Calendar, so having the schedules on Google Calendar makes access to them much easier.

The system is made to be modular so that it can be modified to an integration between other calendar systems if needed. One part of the system is used for getting source data from Novaschem and parsing that data. If changes to the schedules have been made, these changes are interpreted into task objects. These tasks contain data that is needed to perform a task such as to create an

17 event. All tasks are then sent in a list to another part of the system that manages the outgoing connections, either to Google Calendar or the Mock database.

The system is a windows service which means it runs in the background with no UI, settings can be changed in a config file and output such as errors and information messages can be sent to the administrator through the windows event logger and email.

4.3.2.1 Source Data Connection The source data is taken from Novaschem through their API. This source data is received as a serialized C] file. The data representation is created from an XML-schema using Microsoft’s XML Schema Definition tool(XSD.exe)[24], which creates C] class representation with the use of XML-schemas. The source data can then be deserialized into the C] data structure representation or parsed directly in XML-format.

The source data contains the needed information about the events that are synchronized to Google Calendar. The data contains information about time, location and participants for the different events as well as information about course name and event descriptions.

When the source data is imported it is interpreted by the system which generates tasks. These tasks are designed to contain the necessary information required to synchronize a certain event or calendar with the SaaS cloud.

4.3.2.2 SaaS The connection to the SaaS cloud, Google Calendar, is done through an interface. By going through an interface the functionality that is needed can be defined in one place and the actual implementation of the functionality can be taken care of in a different place.

This is used in combination with the factory pattern to facilitate the creation of either the SaaS connection or the mock database connection. The usage of the same interface ensures that both the SaaS connection and the mock database connection has the same outward functionality.

Because of the limitations on creating calendars and the system needing to be able to manage thousands of users it is not possible in this case to create a calendar for each class group or user. Instead every event is added directly into the primary calendar for every participant through impersonation. Impersonation is using a service account to gain access to a user in a domain and edit that users calendars. Since the only calendars that are used are the primary calendars the only needed types of access to the SaaS are create event, delete event and edit event.

4.3.2.3 Mock Object The mock object for the SaaS cloud is created using a MySQL database. The database is reached through the same interface as the connection to Google Calendar. The mock database is created to resemble the functionality of the SaaS, to achieve the same results as when using the SaaS. However since mock-objects is supposed to be simpler than the real code[19], a completely similar functionality was neither the goal nor the actual outcome. Instead the focus lies simulating different states which are then used to test the implementation that makes use of the mock-object.

18 To emulate the functionality of Google Calendar that the integration system makes use of, the database needs to be able to store events, users and connections between the users and events. This lets the database have states similar to that of Google Calendar, with a calendar being represented by all of the events that are connected to the user. In Google Calendar an event is connected to a specific calendar. However due to the fact that this integration system only makes use of one calendar per user, the default primary calendar, the mock can be simplified by having the users directly connected to events instead of multiple calendars without losing any form of functionality.

The states that are possible in Google Calendar are not only connected to the actual data that resides in the cloud. There are some states that are triggered when specific sequences of requests are done, e.g. when performing more than 500 requests in a 100 second time-frame. When this happens it enters a lockout state in which further requests are denied for the remainder of the duration. These states are added to the mock database together with manual trigger functions.

To handle the data validation, like id-format checking, a C] layer between the system and the database is added. This layer handles the validation of parameters sent to the mock-object. It could have been implemented solely in the database, but it is easier to perform these checks in C] than performing them directly in the database. The logic for this validation layer is based on the documentation of the Saas(Google Calendar) to keep the mock-object as valid as possible.

The one thing that the mock-object does not support is the account authentication process. The login parts on their own could have been handled since Google Calendar makes use of OAuth2.0[27]. However the permissions used by Google, with service accounts and domain authorization[36] can not be added for the mock-object as well as the actual accounts. A completely fake authorization and permission system could have been added, but it would make the mock-object significantly more complex, and would not provide much for testing of the system.

The mock-object is tested manually, performing actions, and the respective action for the real SaaS object. The resulting data is then checked for the mock-object and compared to the results gained from performing the actions against the SaaS. The same is done for performing "faulty action", or actions where the SaaS would return error codes. Both the mock-object and the SaaS are subjected to faulty actions and the errors returned from the SaaS are compared against the result from the mock-object.

4.3.3 Limitations Test Since the case study is based on the system described in chapter 4.3 which uses Google Calendar as the SaaS cloud and the documentation on the query limitations is unspecified in some of the cases. A small test was designed to gather more detailed information about the limit that looked like it would prove to be a problem are for both the SUT and the testing of the system. The limitation in question, from table 4.1, max 25 calendars can be created within a "Short Duration" or the user goes into read-only mode for several hours. The documentation about this limit is rather brief and does not state either what a short duration means or how many hours the read-only mode lasts for.

The test tries to determine the "Short Duration" mentioned in the documentation. This is done by allowing an application to create 30 calendars with a delay between each calendar created. By

19 varying the delay and looking at how many calendars that are able to be created before the API goes into read-only mode. This provides a baseline for how long the "Short Duration" needs to be to avoid the read-only penalty.

4.4 Testing

Testing of the system was done using unit testing, written in the same language as the system, C] and the Visual Studio unit test framework MSTest. A test suite is created for the system where the connection type is used as a setting to allow the suite to be executed using either the mock-object or the real SaaS object. The test suite contains 30 separate unit test cases which are implemented by us.

The test cases are written to cover the desired functionality of the system representing the requirements of the system. These use cases taken from the requirements are then used to create tests in which the functionality is tested. Example of a use case is the addition of a new activity in the source data, this test can be seen in appendix B.2. In this test a new activity has been added to the source data, and the test check that new events has been created, as well as checking the data of the added events.

The data used for these tests are taken from the test server provided by Nova Software stored locally as XML-files. This test data uses the same format as the real data, with the exception being that it is fake users in the data as to adhere to the Personal Records Act[30]. This data has then been altered slightly, so that the fake users in the data can be mapped against the test accounts available for Google Calendar.

The test suite also contains unit tests which focuses the systems handling of incorrect requests made to the mock-object/SaaS connection, and the resulting behaviour of the synchronization queue. An example of this kind of test can be seen in appendix B.1. In this test the functionality creating an identical event is tested, checking how the queue and logging handles this specific case. These test cases are of the shorter and more simpler type. The test cases that are created from the system requirements are of a different type; using more advanced setup, testing bigger or more complex scenarios.

In addition to the created test suite, there are some standalone tests that are used to test the integration system. These tests are not used when evaluating the difference between using a mock-object and a SaaS. These are the types of tests that can not be run in the test suite for both the mock-object and the SaaS. One example of this kind of test is the test listed in appendix C, which tests the functionality of the system when the Daily Limit is triggered. This test can be run easily with the mock-object since the limit can be manually triggered and reset. If it had to be done for the SaaS the test would take a very long time to execute as well as prohibit all further access to the SaaS for the remainder of the day.

These standalone tests represent the types of tests that are lacking in the test suite. While these tests still help to test the system, as well as evaluating the challenges with testing an application using SaaS or mock-object database, they are not used when measuring the coverage or mutation score using the two different connection types.

20 4.5 Test Evaluation The test evaluation is done in two parts, measuring the coverage and evaluating fault detection using mutations. The evaluation is performed for both of the test suites described in section 4.4 above. The coverage measurements are also taken for the combined test suites, measuring how much the two test suites cover together. In addition to this it could be noted that the execution times for the suit varied a lot between using the SaaS and using the mock-object so a test to see the difference is done by running the test suit ten times for each set-up and calculating the average test execution times. The set-ups tested is against the SaaS, running the tests with the mock-object on an external computer within the same city and running the tests with the mock-object on the same computer(localhost).

4.5.1 Test Coverage The coverage measurements are measured using third party tools. For Statement Coverage dotCover by Jetbrains[10] is used, however it does not support any other type of coverage measurements, so another tool is needed as well. To measure Branch and Condition coverage the tool NCover[25] is used.

When measuring the Branch and Condition coverage the process described in figure 3.1 is used. The NCover tool automatically collects coverage during the execution of the test suites. The coverage measurements are presented as both statistically with percentages but also with a visual representation in the editor which allows the developers to see directly what code is covered. Both test suites are first executed separately to generate the coverage for the individual test suites, and then executed together to get the coverage of the combined test suites.

The Statement coverage is measured in a similar fashion to the other coverage measurements, but using dotCover instead. Like the NCover tool, dotCover also provides both the actual measurement data but also a visual representation in form of highlighting in the editor.

4.5.2 Fault Detection When evaluating the fault detection for the test suites the process described in figure 3.1 is used. This is done with the help of an external tool called VisualMutator[38]. By using the VisualMutator tool, mutation operators are applied to the original program injecting faults to the source code.

The mutation operators that are used for this evaluation are all of the "Standard Operators" which are included for the VisualMutator tool and can be seen in the list below.

• AOR - Arithmetic Operator Replacement • SOR - Shift Operator Replacement • LCR - Logical Connector Replacement • LOR - Logical Operator Replacement • ROR - Relational Operator Replacement • OODL - Operator Deletion • SSDL - Statement Block Deletion

21 For each mutant created by the tool, the test suite is then run on the mutated program identifying if a mutation is killed or left alive by the test suite. If all of the tests in the test suite passes the mutation is considered alive. If the test suite instead has at least one test which fails when running the test suite the mutation is considered to be killed.

The total number of mutants generated by the tool and used for this evaluation is 281. These mutants are evaluated using the test suited described in section 4.4, consisting of 30 unit tests. The evaluation is done two times, once with the test suite using the mock-object and once using the real SaaS object.

4.6 Validity Threats Due to the fact that the tests against the SaaS are using a public cloud, the executions of the tests can be affected by uncontrollable factors. These factors include for example unexpected backend errors in the SaaS or unexpected downtime of the service. To alleviate these problems both the mutation testing and the coverage measurements are taken multiple times, during different days, to ensure that the same results are received and no unexpected factors affect the result.

The first plan for performing the mutation testing was to have manually inserted faults into the code. Because the test suite was created first, the mutations would suffer from bias if they were to be manually created and seeded by the same people who created the test suite. To get around this problem it was instead decided to use an external tool to generate the mutants, as described in section 4.5.2. By using a tool the bias that would be present with manually injected faults is avoided which would have presented a very large threat to the validity of the results.

22 5 RESULTS

5.1 Research Question 1 When it comes to testing, what are the main differences between an application using SaaS instead of databases for data storage?

Results summary: When it comes to testing, the three main differences are lack of envi- ronmental control, lack of transparency and stricter usage limitations.

List of challenges with testing of Future Internet Applications, according to FITTEST[39], and if it is applicable when it comes to systems with data in a SaaS cloud:

• Self Modification/Autonomic Behaviour - A SaaS service is not controlled by the developers and can be changed at any time but for the commercially available SaaS services this should not happen frequently and if the functionality of the service changes it affects more than just the testing. • Asynchronous Interactions - Whether the system is threaded or not is not affected by where the data is stored. • Time and Load Dependent Behaviour - Since a SaaS is not controlled by the developers, the state it is in is not fully configurable and this could impact testing. • Huge Feature Configuration Space - The size of the domain is not affected by where the data is stored. • Ultra Large Scale - The scale of the SUT is not affected by where the data is stored. • Low Observability - This is a defining point of a SaaS cloud.

Looking at this it is possible to combine Self Modification/Autonomous Behaviour and Time and Load Dependent Behaviour into a more general term, lack of environmental control, since that is what creates those challenges.

In addition to that, another difference to databases can be seen by looking at documentation of SaaS services. That is that there are a lot of limitations for queries, and special limitations such as a specific amount of calls of a specific function in a day.

5.1.1 Lack of environmental control Summary: Limited access to functionality makes it harder to clean up after tests and to set up tests for the SaaS to return certain output.

Having to access a database through an UI or API may naturally lessen the control over it to some degree. This can be explained as having full control over the database compared to only having access to a set of stored procedures provided for the database. This can be likened to a SaaS system, which by its design is accessed through either an API and/or an UI, as described in section 2.1.

When testing an application that only has access to the data through these APIs the lack of control makes the testing harder, as the developers works in a higher abstraction layer with less control.

23 When you create new data in a database for testing, for example adding new events, a reversal back to the database’s original state can be reached through a simple roll-back of the database. When working with a SaaS, like Google Calendar[12], all access must go through the API. This means that to revert back to the original state from performing a test like with the database where you add some form of data, each query made must have an additional query which performs the inverse action of the original tested query. In the example of Google Calendar, if you test the functionality of a programs event creation, to revert back to the original state from before the test, each event created must be removed with an additional deletion query instead of performing a simple roll-back. A single deletion query is not a problem but in large quantities it can be. Something could go wrong for the deletion as well, disabling the clean-up; a roll-back is instead guaranteed to succeed.

By having access to the environment when testing it is also possible to control the state of the connecting system or database. This can be used when testing by manually triggering certain states in which the system behaves differently. By having the states manually triggered the preconditions for tests can be set instantly instead of replicating the steps which triggers a certain state. The steps required to trigger a state can be both more time consuming, but also have longer effects on subsequent tests if there is no quick way to revert to the original state.

5.1.2 Lack of transparency Summary: Transparency makes defect analysis and removal easier and gives the individual tester a more comfortable situation.

Working with a SaaS means working with a black box i.e. there is only access to the API/UI so that the underlying functionality is hidden. Not knowing more than what to input for a query and the expected output could pose a problem as a developer in theory. For example in a function with full insight it is often possible to see where in the code it turns out wrong if the output is not as expected; but if data is sent into a black box and it returns an unexpected output it is much harder to know when and where the problem occurred.

The employee interviewed said, Appendix A, that he does not see this as a problem since he expects that Google’s services are stable and works as intended; but, also uncomfortable not to have full control.

FITTEST[39] mentions lack of transparency as a problem but does not motivate it further.

5.1.3 Query limitations Summary: Limitations in amount and speed of queries slows down testing and can lock out access for other tests, clean-up and development.

When it comes to commercial cloud services, most services have set usage limitations which can either be a restriction on virtual resources, hardware or actual limitations on requests that can be made to the service. These limits vary from each different cloud service provider. In the list below three big SaaS providers, Google(with focus on the calendar service), Amazon Web Services (AWS) and Microsoft, are presented with some of the limits they put on their services.

24 • Google - The official limits can be seen in Table 4.1. The first two rows are limits that applies to all Google services and the last three rows are special limits for Google Calendar. After some testing it could be noted that these limitations are not exact, the limit on creation of calendars was tested with varying sleep times between calendar creations. The results of this test that can be seen in Table 5.1 shows that the limit rather seems to be at 37 calendars than 25 and that it is possible to create approximately three calendars per hour after reaching the limit. The sleep time also does not seem to affect the amount of calendars that can be created so the short amount of time mentioned is fairly long and at least 37 ∗ 120seconds = 74minutes. Sleep time Calendars created Wait time Calendars created 10 seconds 37 45 minutes 2 30 seconds 37 1 hour 3 60 seconds 37 2 hours 5/6 120 seconds 37 3 hours 8 300 seconds 37 6 hours 15 Table 5.1: Limitations test results - Sleep time is the time in seconds until the next calendar is created, Calendars created is how many calendars that could be created before going into read only mode and Wait time is the time after reaching the limit until the next try. Sleep time for all tests that was done after going into read only mode was done with a sleep time of 10 seconds.

• Microsoft - Microsoft provides several different SaaS options, like Office 365 and Onedrive. The Onedrive service can be accessed either through the standard user interfaces or through an API. When accessing through the API there are throttling limitations, similar to when using the Google SaaS APIs. Microsoft however does not provide any specific numbers on what the throttling limits actually are since they reserve the right to change the limits at any given time[23]. • Amazon Web Services - Amazon have a long list of limitations[2] depending on which of their services that is used; a lot of them not relevant for this paper since they do not only provide SaaS. The limits are mostly defined in requests per second or maximum numbers, e.g. users, databases. Some limits can be requested to be increased.

This can be compared to the use of databases, where the only limitation is the actual hardware, which puts a limit on the processing power as well as bandwidth limitations of the infrastructure. For some SaaS providers this can be the case as well, for example some of AWS services, but in general there are more specific limitations for SaaS services and the limitations can be for specific queries as well. Observations at Cybercom showed that working against a SaaS cloud makes for slow execution times for queries which the employees think can be explained by built in slowdowns by the provider.

Having limitations like these may naturally slow down the testing process if a lot of tests needs to be run. If a query amount limit is reached this may also lock out tests from finishing, running new tests and even other developers from running their code at times.

25 5.2 Research Question 2 What are the challenges and experiences when unit testing against a mock-object compared to unit testing against the actual API when testing a cloud integration application?

Results summary: A mock-object can be used to alleviate the problems of lack of environ- mental control and query limitations but lack of transparency is a problem when designing the mock-object.

Thomas and Hunt[37] made a list, described in section 2.2.1.1, with seven reasons for using mock-objects to make testing easier. When looking at the testing of the integration system described in this paper the following reasons are found to be true for the SUT.

• The real object has behaviour that is hard to trigger. • The real object is slow. • The real object has (or is) a user interface

Out of these three reasons, only the first two have an actual impact on the testing of the system. The third reason, "The real object has (or is) a user interface", does not effect testing for this system since the user interface of the SaaS, or real object, is not used at all by the system.

The other four reasons, listed below, given by Thomas and Hunt are not found to be relevant for this system.

• The real object has non-deterministic behaviour. • The real object is difficult to set up. • The test needs to ask the real object about how it was used. • The real object does not exist yet.

When testing the integration system, described in section 4.3, different aspects are found in which the testing differed when using a mock-object. Some aspects are more challenging when making use of a mock-object, while it makes other aspects easier.

Triggered States - The generation of tests for testing behaviours occurring during specific triggers, such as limitations lockout and network errors.

As noted in section 5.1.1, not having full access to the data can make it harder to trigger certain states such as e.g., the rate limit for Google Calendar, when testing with data in a SaaS cloud. This makes a big difference when testing the behaviour of the system that is occurring when these states are triggered. During the testing of the integration system it was found that using a mock-object made for easier testing of the behaviour occurring if the SaaS reach certain states. Testing using a mock-object can make use of what was also noted in section 5.1.1 for databases, manually triggering specific states. This is important when testing behaviour resulting from states that are not possible to trigger by the developer, but can be triggered on errors from the SaaS provider, like backend errors.

26 In certain cases the actions to trigger a state can also be unfit to carry out during testing. When considering the usage limits in Google Calendar, creating 10’000 events, it is not feasible when working against live servers to use actual test accounts. In these cases a state can not be triggered in a safe and workable way, which again is avoided using a mock-object.

The time needed to test how the system handles certain output is much greater if the test involves actually reaching the limit. To try this a test was made to try and reach the rate limit for Google Calendar; which is 500 requests in 100 seconds. Since a request to Google Calendar takes some time to execute the limit takes at least one minute to reach. After reaching the limit; the limit will block other requests, stalling the clean-up for the tests and the next-coming tests. If a daily limit is reached, other testing and to some extent even development is stalled until the next day.

Data Validation - The generation of tests that check for data validity faults and the resulting behaviour.

Mackinnon et.al. say that "[t]he most difficult aspect is usually the discovery of values and structures for parameters that are passed into the domain code"[19] when it comes to mocking. The more complex the system that is mocked is, the harder it is to guarantee that the mock behaves in the same way as the real object does to certain input.

During the implementation and testing of the SUT it could also be noted that this is in practice the biggest disadvantage to testing with a mock-object. Testing how the mock-object handles faulty input is not useful to seeing how sturdy the system is and testing how the SaaS handles faulty input is not possible to do with a mock-object. However, testing how the system handles the output from the mock/SaaS, sending faulty input is useful. Having error handling that works exactly as the SaaS may need to be very detailed and cumbersome to implement; and it is mostly not worth the effort as long as the mock is able to return all output that is needed for testing the system. In the case of Google Calendar, for a lot of faulty input it returns the error "BadRequest". Implementing error handling in the mock-object for returning a "BadRequest" for only a single scenario is a way of creating a simpler mock-object that is able to return the error code but is not able to handle all input to create that result.

Test Clean-up - Bringing the object back to it’s original state after performing tests.

In addition to problems with data validation, lack of environmental control brings one more difficulty when testing a SaaS cloud. As said in section 5.1.1 the other problem is test clean-up. A mock-object have a big advantage over testing using the SaaS because of the roll-back feature that none of the SaaS providers that were examined provided the user with.

Test Execution Times

During testing of the integration system a big difference in execution times can be seen depending on which object the tests are run against. Running the test suite against the SaaS took on average 99.05 seconds. Running the test suite against the mock-object located externally within the same city took on average 39.80 seconds which is about 2.5 times faster than against the SaaS. Running the tests against the mock-object located on the same machine(localhost) took on average 5.76 seconds which is about 17.2 times faster than running against the SaaS. These results will differ depending on what SaaS provider is used and where the mock-object is located and how it is implemented.

27 5.3 Research Question 3 RQ3.1: Given the proposed integration system, will the test coverage achieved differ between testing using the SaaS and using the mock-object?

Results Summary: No differences in coverage was found.

Figure 5.1: Graph displaying the three different coverage measurements, for both the SaaS connection and the mock-object connection.

As can be seen in figure 5.1 the coverage did not differ at all between testing using the SaaS and testing using the mock-object.

RQ3.2: Given the proposed integration system, will the found defects differ between testing using the SaaS and using the mock-object?

Results Summary: No differences in mutation score was found.

Connection Type Mutants Alive Mutants Eliminated Mutation Score SaaS 59 222 79% Mock-Object 59 222 79% Table 5.2: Table showing the results of the mutation testing described in section 4.5.2. 281 total mutants were injected into the source code. The test suite contained 30 test cases.

The same results could be found for the mutations, table 5.2. Testing using the SaaS eliminated the same mutations as testing using the mock-object.

28 6 DISCUSSION

6.1 Coverage Measurements As can be seen in figure 5.1, the coverage measurements for the two different connection types, mock and SaaS, achieve the same amount of coverage. This is true regardless of the type of coverage measurement taken. With this data the conclusion that the test suite covers the same amount of code, regardless of the connection used. By looking at the coverage for the combined bar, in which the coverage is measured using the test suite with both connections, in separate executions, it can be seen that the coverage is not just the same amount, but covers the same actual code. If the coverage differed for the two connection types, the coverage for the combined measuring would increase.

While the graph, figure 5.1, shows that the test suite executes the same code in the integration system regardless of connection type. It does not however show anything other than that the code has actually been executed when running the test suite. This is one of the biggest problems with test coverage measurements, as stated in section 2.3.1, test coverage can not be used to measure how well a program is tested. When the coverages stay the same for both connection types, it does not say much, other than confirming that the same code is executed, which can be seen as a goal for the mock-object connection, in being able to replace the usage of the real SaaS object for testing.

6.2 Mutation Score When looking at how the two different connection types actually performed, the ability to detect faults in the code is a much better measurement than using test coverage since it gives actual faults found, instead of just code that has been executed. Since the tests make sure that the data is valid after the execution is finished, actually checking whether faults are detected by the test suite is a better measurement. This is looked at by the fault detection, or mutation testing, described in section 4.5.2.

The results of the mutation testing, presented in table 5.2, show that both of the connection types performed the same for the test suite. This means that the same amounts of faults is found by the test suite regardless of which connection type that is used. Since, as stated in section 5.3, it is the same mutations that has been eliminated, and not just the same mutation score, it can be seen that the test suite works as well at finding when using the mock-object as when using the real SaaS object.

By having the same mutation score for the two connection types it is shown that the mock-object works for replacing the real object for testing purposes. Combined with the results gained from the coverage measurements, it is also shown that the exact same code is executed for both connection types, further strengthening this conclusion. When using a mock-object for testing there are several advantages which are presented in the following section.

6.3 Advantages As noted in section 5.2 the main advantages with testing using mock-object instead of the SaaS is to alleviate problems with lack of environmental control and query limitations as well as a

29 possible execution time decrease.

If it is possible to have the mock-object located on the same machine as the testing is run on, the execution times for tests compared to using the SaaS for testing can be greatly decreased. But, even having it on a different location can create a speed-up, for this project the mock-object executed the tests over 2 times faster with the database in the same city and 17 times faster on the same machine. For a system with a lot of developers wanting to access the SaaS or a lot of continuous testing a speed-up can be a big factor in favour of the mock-object.

However this speed-up could provide a problem for the mock-object as it does not perform the same as the real object, in terms of execution times. The delay introduced by the SaaS could in theory affect certain tests. Running the test suite using the real SaaS object or with artificial delays that mimic the delay of the SaaS can catch possible faults that are caused by this. This could be done once in a while to ensure stability while still maintaining the advantage of the original speed-up.

6.3.1 Lack of environmental control Lack of environmental control is the main difference between a SaaS and a database or PaaS. For testing it comes with problems, the most notable ones are returning specific output from the SaaS, reaching certain states in the database and clean-up. Lack of environmental control is an area where creating a mock of the SaaS can come into good use since full access to the data is gained.

Having full access makes it possible to send any output from the mock-object which is handy for testing the system’s behaviour for output that is possible to be returned from the SaaS but hard or time-consuming to trigger, e.g. "BackendError" for Google Calendar, query limitations or edge cases like int.MaxValue. Some caution should be taken when it comes to that though, if output is hard to trigger in a test it may also never trigger during real execution of the system making writing a test for that case an unnecessary waste of development time.

The mock-object is also a good tool for handling the clean-up as mentioned in section 5.2. This is not always a necessary part of testing but in some cases tests are designed in a way that they require a clean or specifically set up environment which may be hard or even impossible to do in a SaaS. If such an environment is needed a mock-object is a good way of achieving it. Other times clean-up can be important, is if there are no dedicated test users and it is unwanted to clutter the data for real users.

6.3.2 Query limitations Query limitations is one characteristic that sets many SaaS apart from databases. While databases can also have different throttling limits, SaaS usually presents one more step of limitations, having both a maximum query limitation but also having stricter limitations for specific actions.

When it comes to testing, query limits like these are normally not low enough so that it impacts the amounts of tests that can be run in a smaller system like the one used. However in a larger system with more developers or if the limits are very low it could happen. If these limits are reached when testing this could be a problem if queries fail or are slowed down. As a practical example, when using the Google Calendar API, after a certain amount of requests has been made,

30 as can be seen in table 4.1, all requests are blocked. This lockout could last up to a day, in which no further executions can be done, which will stop both testing, clean-up and development.

6.4 Disadvantages The only real disadvantage of using a mock-object for testing is the actual development and maintenance of the mock-object. This development time is an upfront time investment in which the mock-object in which the mock-object also needs to be validated to perform like to the system it impersonates. This development time is proportional to how complex the SaaS that should be mocked. A more complex SaaS is harder and more time consuming to create a mock-object for. When maintaining the mock-object, it comes mainly to keep the functionality of the mock the same as that of the SaaS.

The development of the mock-object is also dependent on what information is available about the SaaS that should be mocked. This problem is related to the lack of transparency, section 5.1.2, in which the actual development of the mock-object is made more difficult depending on the transparency of the SaaS. The more extensive documentation that is available for the system that should be mocked, the easier it becomes. If the developers needs to manually find out how the SaaS handles specific edge cases and what they are, the time required to develop and validate the mock-object increases drastically. This is also a problem with cases where the documentation does not match the reality. Like described in section 5.1.3, where the SaaS limitations, listed in table 4.1, does not correspond to the limitations that are present when testing the SaaS, as listed in table 5.1.

6.5 When to use a mock-object When deciding if a mock-object should be used for testing, all these advantages and disadvantages should be taken into consideration. The complexity of implementing a mock of the SaaS versus the utility of the extra environmental control. The initial cost is a bit higher but simpler and faster testing will return the value over time.

A mock-object is also very useful if it is not possible to test against the SaaS, for example if the SaaS does not exist or can not be accessed yet.

6.6 Sustainability The relevant sustainability aspects of this paper comes from the usage of cloud computing. By researching more efficient ways to test applications that make use of SaaS, more applications could in theory make use of SaaS. Since cloud computing make heavy use of both resource pooling and delivering measured services, as described in section 2.1. This means that the resources will be shared between different costumers leading to better environmental sustainability[9, 41, 21, 20].

Looking at the economic sustainability there are two interesting factors to look at, the first factor is connected to the same one discussed for the environmental sustainability, usage of cloud computing. By making use of cloud computing a company can limit the spending on servers to the actual amount used, paying only for the used resources. Having servers that can support peak activity is usually a waste of resources if it would be possible to only pay for the actual overall usage.

31 The other interesting factor for economic sustainability is the usage of the actual mock-object. By using a mock-object and getting past the limitations for a SaaS, as discussed in 6.3.2, potentially more development and testing could be performed. This can in turn lead to better economic growth through development of additional features and increased product quality.

32 7 CONCLUSIONS

This paper looks at differences between testing systems that are using data in a SaaS cloud instead of a database, how the problems with testing these systems can be remedied by using a mock-object instead of the SaaS and finally if the test coverage and mutation score would differ between testing with the mock-object or the SaaS.

The differences for testing between using a database and a SaaS cloud are that for SaaS clouds there is a lack of environmental control, lack of transparency and stricter usage limitations. Lack of environmental control makes certain output from the SaaS hard to trigger and makes clean-up harder. Lack of transparency makes it harder to identify faults and is less comfortable for the testers. Stricter limitations can slow down testing and can make the SaaS no accessible for testers and developers.

It is shown that a mock-object is very useful for problems with usage limitations and lack of environmental control as well being able to provide a speed up for test execution. The test coverage and mutation score does not differ for this project, meaning that the mock-object is equally good at finding faults as using the SaaS directly.

The only disadvantages with creating a mock-object are the initial cost of implementing it and lack of transparency. Lack of transparency is a problem if the SaaS have unexpected behaviour that makes it hard to recreate the same functionality for the mock-object. A more complex SaaS makes it harder to recreate all functionality for the SaaS and increases the implementation cost.

In conclusion it was shown that a mock-object can find the same faults as using the real SaaS and that the mock-object is a very powerful tool for testing when it comes to systems that are using data in a SaaS cloud.

33

8 RECOMMENDATIONS AND FUTURE WORK

This study could be expanded to examine different SaaS services and testing approaches. Different SaaS services have different properties and seeing how testing of systems using various SaaS services differ could be valuable, or seeing what properties are hard to recreate in a mock-object. This study is done using only unit testing so creating similar studies with other testing approaches and techniques may give other experiences, such as system tests, regression tests or stress tests. Looking at if and how mock-objects instead of SaaS clouds are used for testing commercially in companies can also be interesting to see if there is research done that is not available academically.

The design of the mock-object could be examined as well, this study is using a database but other solutions deserve to be tried. For example, having the mock-object be entirely based in memory or write to file. For smaller systems this could be a useful alternative since the implementation complexity should be lesser.

Something that could be very useful for testing of this kind of system is a best practice guide for the mock-object. This study shows what a powerful tool the mock-object is but more work could be done to see how the mock-object is best created for a certain system.

35

REFERENCES

[1] Martijn Adolfsen. “Industrial validation of test coverage quality”. In: (2011). [2] Amazon Web Service Limitations. 2017. url: http : / / docs . aws . amazon . com / general/latest/gr/aws_service_limits.html. [3] Michael Armbrust et al. “A view of cloud computing”. In: Communications of the ACM 53.4 (2010), p. 50. issn: 00010782. doi: 10.1145/1721654.1721672. [4] M Armbrust et al. “Above the clouds: A Berkeley view of cloud computing”. In: University of California, Berkeley, Tech. Rep. UCB (2009), pp. 07–013. issn: 00010782. doi: 10.1145/1721654.1721672. [5] Calendar Limits. url: https://support.google.com/a/answer/2905486?hl=en#. [6] Poonam Chaudhary and Seema Sangwan. “Software Testing : Affirming Software Quality”. In: International Journal of Innovations in Engineering and Technology (IJIET) 5.3 (2015), pp. 378–383. [7] Marc Clifton. Mock-Object Pattern. 2004. url: https://www.codeproject.com/ Articles/5772/Advanced-Unit-Test-Part-V-Unit-Test-Patterns#Mock- Object%20Pattern25. [8] D Cohen and B Crabtree. Qualitative Research Guidelines Project. 2006. url: http: //www.qualres.org/HomeInte-3595.html. [9] Konstantinos Domdouzis. “Sustainable Cloud Computing”. In: Green Information Technol- ogy: A Sustainable Approach (2015), pp. 95–110. issn: 18670202. doi: 10.1016/B978- 0-12-801379-3.00006-1. [10] dotCover. url: https://www.jetbrains.com/dotcover/. [11] Fittest Web. url: http://crest.cs.ucl.ac.uk/fittest/project.html. [12] Google. Google Calendar API. url: https://developers.google.com/google- apps/calendar/. [13] Atul Gupta and Pankaj Jalote. “An approach for experimentally evaluating effectiveness and efficiency of coverage criteria for software testing”. In: International Journal on Software Tools for Technology Transfer 10.2 (2008), pp. 145–160. issn: 14332779. doi: 10.1007/s10009-007-0059-5. [14] Inspec. url: http://www.theiet.org/resources/inspec/index.cfm. [15] Cem Kaner. “What Is a Good Test Case?” In: Software Testing Analysis & Review Conference (STAR East) (2003), pp. 1–16. [16] Manveen Kaur. “Testing in the Cloud : New Challenges”. In: (2016), pp. 742–746. [17] Ned Kock. “Action Research”. In: The Encyclopedia of Human Computer Interaction. 2nd Editio. 2013. Chap. 33. isbn: 9788792964. url: https://www.interaction- design . org / literature / book / the - encyclopedia - of - human - computer - interaction-2nd-ed. [18] Neal Leavitt. “Is cloud computing really ready for prime time?” In: Computer Society IEEE 42.1 (2009), pp. 15–25. issn: 00189162. doi: 10.1109/MC.2009.20. url: http: //ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4755149.

37 [19] Tim Mackinnon, Steve Freeman, and Philip Craig. “Endo-Testing : Unit Testing with Mock Objects”. In: Extreme programming examined (2001), pp. 287–301. url: http: //citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.23.3214& rep=rep1&type=pdf. [20] Alexandros Marinos and Gerard Briscoe. “Community cloud computing”. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 5931 LNCS (2009), pp. 472–484. issn: 03029743. doi: 10.1007/978-3-642-10665-1{\_}43. [21] Dragan S. Markovic et al. “Smart power grid and cloud computing”. In: Renewable and Sustainable Energy Reviews 24 (2013), pp. 566–577. issn: 13640321. doi: 10.1016/j. rser.2013.03.068. url: http://dx.doi.org/10.1016/j.rser.2013.03.068. [22] Peter Mell and Timothy Grance. “The NIST definition of cloud computing”. In: NIST Special Publication 145 (2011), p. 7. issn: 00845612. doi: 10.1136/emj.2010.096966. url: http://www.mendeley.com/research/the- nist- definition- about- cloud-computing/. [23] Microsoft. OneDrive Limits. url: https://social.msdn.microsoft.com/Forums/ en - US / c4aaa90c - dc75 - 441f - 9b30 - f5c45e402ac4 / limitation - on - api - requestfile-size-limitationresume-upload?forum=onedriveapi. [24] Microsoft. XSD-Tool. url: https : / / msdn . microsoft . com / en - us / library / x6c1kb0s(v=vs.110).aspx. [25] NCover. url: https://www.ncover.com/. [26] Nova software. Novaschem. url: http://www.novaschem.com/. [27] Oauth. url: https://oauth.net/2/. [28] Benny Pasternak, Shmuel Tyszberowicz, and Amiram Yehudai. “GenUTest: A unit test and mock aspect generation tool”. In: International Journal on Software Tools for Technology Transfer 11.4 (2009), pp. 273–290. issn: 14332779. doi: 10.1007/s10009-009-0115-4. [29] Ron Patton. Software Testing, Second Edition. 2nd Editio. Sams, 2005. isbn: 0-672-32798- 8. [30] PUL. url: https://lagen.nu/1998:204. [31] Carl Rabeler et al. SQL (PaaS) Database vs. SQL Server in the cloud on VMs (IaaS) | Microsoft Docs. 2017. url: https://docs.microsoft.com/en-us/azure/sql- database/sql-database-paas-vs-sql-server-iaas. [32] Leah Riungu-Kalliosaari, Ossi Taipale, and Kari Smolander. “Testing in the cloud: Exploring the practice”. In: IEEE Software 29.2 (2012), pp. 46–51. issn: 07407459. doi: 10.1109/MS.2011.132. [33] Per Runeson and Martin Höst. “Guidelines for conducting and reporting case study research in software engineering”. In: Empirical Software Engineering 14.2 (2009), pp. 131–164. issn: 13823256. doi: 10.1007/s10664-008-9102-8. [34] Dmitry Savchenko, Nikita Ashikhmin, and Gleb Radchenko. “Testing-as-a-service ap- proach for cloud applications”. In: Proceedings of the 9th International Conference on Utility and Cloud Computing - UCC ’16 (2016), pp. 428–429. doi: 10.1145/2996890. 3007890. url: http://dl.acm.org/citation.cfm?doid=2996890.3007890. [35] Scopus. url: https://www.scopus.com/home.uri.

38 [36] Service Account Documentation. 2017. url: https://developers.google.com/ identity/protocols/OAuth2ServiceAccount. [37] Dave Thomas and Andy Hunt. “Mock objects”. In: IEEE Software 19.3 (2002), pp. 22–24. issn: 07407459. doi: 10.1109/MS.2002.1003449. [38] VisualMutator. url: https://visualmutator.github.io/web/. [39] Tanja E J Vos et al. “Future Internet testing with FITTEST”. In: Proceedings of the European Conference on Software Maintenance and Reengineering, CSMR (2011), pp. 355–358. issn: 15345351. doi: 10.1109/CSMR.2011.51. [40] Tanja E J Vos et al. “The FITTEST Tool Suite for Testing Future Internet Applications”. In: Future Internet Testing: First International Workshop, FITTEST 2013, Istanbul, Turkey, November 12, 2013, Revised Selected Papers. Ed. by Tanja E J Vos, Kiran Lakhotia, and Sebastian Bauersfeld. Cham: Springer International Publishing, 2014, pp. 1–31. isbn: 978-3-319-07785-7. doi: 10.1007/978-3-319-07785-7{\_}1. url: http://dx.doi.org/10.1007/978-3-319-07785-7_1. [41] Daniel R. Williams, Peter Thomond, and Ian Mackenzie. “The greenhouse gas abatement potential of enterprise cloud computing”. In: Environmental Modelling and Software 56 (2014), pp. 6–12. issn: 13648152. doi: 10.1016/j.envsoft.2013.11.012. url: http://dx.doi.org/10.1016/j.envsoft.2013.11.012. [42] Claes Wohlin et al. Experimentation in Software Engineering. Springer International Publishing, 2012. [43] Lian Yu et al. “Testing as a service over cloud”. In: Proceedings - 5th IEEE International Symposium on Service-Oriented System Engineering, SOSE 2010 (2010), pp. 181–188. issn: 09505849. doi: 10.1109/SOSE.2010.36. arXiv: 0402594v3 [cond-mat]. [44] Yuen Tak Yu and Man Fai Lau. “A comparison of MC/DC, MUMCUT and several other coverage criteria for logical decisions”. In: Journal of Systems and Software 79.5 (2006), pp. 577–590. issn: 01641212. doi: 10.1016/j.jss.2005.05.030. [45] Hong Zhu, Patrick a. V. Hall, and John H. R. May. “Software unit test coverage and adequacy”. In: ACM Computing Surveys 29.4 (1997), pp. 366–427. issn: 03600300. doi: 10.1145/267580.267590.

39 A INTERVIEW DEVELOPER CYBERCOM

Har du arbetat med utveckling av applikationer som använder sig av data i molnet?

Ja

Vilken typ av system har du utvecklat?

Det är en integration mellan verksamhetssystem, procapita, som är utvecklat av Tieto som nästan alla kommuner använder. Det används för att synca datan, det är alltså mastern, procapita, och så är det många program som är beroende av den datan. Därför är det då naturligt att man använder det om det då kommer en ny ide exempelvis, då fyller man i det där och då ska det syncas så det finns motsvarande på google. Det är det jag gjort egentligen. Med användargrupper exempelvis, så man får en elevgrupp och en lärargrupp på varje skola. När jag säger grupper menar jag egentligen sån där sendlista, google groups har ni säkert hört talas om. Det är ju en sån då, och så skapas det en för varje klass också, det är väldigt smidigt när de har google classroom så kan man lägga till hela klassen på en sendlista. Och så syncar man naturligtvis namn och alla runtliggande personuppgifter och så där. Det är mycket, det jag tycker är spännande med det där är framför allt att det är väldigt givet att finns det en person som heter Petter i procapita ska han heta Petter osv. Det är rätt självklart, inte så mycket som kan hända men där kommer det massa sånt där som exempelvis nu häromdagen bara hade jag en elev i komvux just, där har man så där kurs för kurs för kurs, där är det inte i klass som i gymnasiet eller vanlig skola. Deras behov ser ju lite annorlunda ut, då kan det vara nån som jobbar ett heltidsjobb och så läser han komvux liksom och då är det väldigt viktigt att google-kontot finns färdigt när de kommer på första introduktionsdagen, det kanske är enda gången man träffar eleverna förutom på uppropet sen. Så då har vi ju specialregler för just komvux att de får sina konton 30 dagar tidigare än alla andra elever. Mycket såna affärsregler liksom och då är det väldigt spännande hur man hanterar dem, det är ju inte alltid enkelt. Från att man får en konkret problemställning till att man bryter ner det i krav då. Sen att man har nåt som är övertyckligt och framförallt testbart sen också, se så att man kan testa alla möjliga utfall då, det är inte alltid enkelt.

Hur testades då systemet?

Ja det är ju det, det är ju unit-tester som är grunden då. Det kanske inte säger så mycket egentligen utan det är ju snarare så att man försöker använda sig utav någon sån form av mockning. Det finns ju nästan inget bra verktyg som gör detta, första steget av systemet så startar du ett konto, andra steget så ändrar du konto, tredje steget gör du så här med kontot så nästa steg är man här baserat på vilka punkter som redan startats osv. Då är man i liksom i en, vad ska man kalla det, det är ju inte stateless liksom. Man har ju steg för steg som bygger på tidigare och så och då måste man ju ha ett testramverk som stödjer det och så, som kommer ihåg vilka tester som gjorts tidigare. De blir ju väldigt beroende av varandra. Det blir ju väldigt svårt, då kan man ju inte bara mocka att man får typ 200 OK tillbaka eller nåt annat sånt. Du får ju inte det där fullständiga testet som du skulle vilja ha, men däremot kan man ju alltid, du kan ju se, du kan ju liksom skriva ner till en log-fil och så. Nu skickar vi till den här systemprodukten istället och då får du manuellt se att den här requesten ser rimlig ut, men också att requesten som leder till den är rimlig samt att följden av requests är rimlig. Då undviker du de absolut vanligaste, om du lägger till den här användaren 30 gånger då fattar du att det är fel. Det behöver du kanske inte

A-1 automatisera för att se. Men det är ju ändå så att man hade gärna velat ha så att man kan trycka på compile och köra tester och veta att allt fungerar, men så lyxigt är det tyvärr inte. Det kanske finns nåt sätt men inte vad jag har stött på.

Vad är de huvudsakliga skillnaderna, angående testning, mellan en applikation som an- vänder sig av en egen databas och en som använder molnet istället? Jag misstänker att det är det som är själva grejen då att du har ju inget sånt här vad är det nu det heter?

Johannes - Roll-back?

Ja exakt, när man roll-backar och har en transaktion, det får man ju inte riktigt, det hade varit väldigt skönt om man hade haft nåt sånt liknande. Men det är inget jag sett google stödjer alls, jag tror inte google är unikt dåliga på något sätt heller utan det är något generellt med molntjänster.

Johannes - Ja, det finns molntjänster där man kan köpa så man får databaskontroll men det är en annan nivå då.

Ja, det är lite flummigt att säga molntjänster. En databas kan ju vara en molntjänst och så där.

Johannes - Google stödjer inte det då iallafall.

Nej inte som jag har förstått det. Det finns nån sån där tredjepartssystem som heter google mock eller vad det hette men det är inget vi riktigt använt oss av. Vi vet ju inte om vi kan lita på det och så där, ska vi börja testa testbiblioteket liksom? Från nån okänd utvecklare på github och så.

Johannes - Sen måste man ju kunna lita på att system stämmer.

Ja, det gör det ju absolut oftast.

Johannes - Det är framförallt ens koppling till det som måste stämma.

Ja exakt.

Hur har du då hanterat problem som att du skickar datan till google och har ett API att det beter sig som en black box och att du inte vet vad du har där i?

Mm, nja, vad ska man säga? Det är väl inte så mycket egentligen, jag ser det inte som ett så stort problem egentligen. Jag tänker som när du gör när du ska skriva till en fil på systemet exempelvis, då har du ju abstraherat bort filsystemet och så här liksom. På samma sätt känner jag om själva kontoskapandet, om jag bara skickar en create så bryr jag mig egentligen inte om att det är en svart låda eller vad som händer där i bakgrunden. Bara att den specificerar vad som är förväntad output baserat på input så ser jag det nog inte som ett större problem egentligen. Sen är det ju alltid väldigt obehagligt och man vill ju gärna ha sudo, koll på bit för bit.

Hur har du hanterat asynkrona, flertrådade, begäran från klienterna om det finns såna?

Ja, det funkar väl just nu som så att den fyller på en lång kö med en massa konton, flaskhalsen är ju då att skapa allt, nätverket givetvis. Så den fyller på en stor kö, den blir jättestor, jättemånga användare och sen sätter den upp vad det nu är, 7 trådar eller nånting som bara börjar peppra

A-2 konton till google. Och samla alla requester som är likadana. Det är ju en blocking queue som används, det är ganska enkelt, inget komplext. Inga mutexes och sånt, eller ja, det kanske är det i en blocking queue i grunden och botten, men den löser ju problemet åt dig. De läser ju också bara från samma kö, inget att hålla på och skriva och så där så att det kan bli race conditions och sånt där.

Har det blivit problem med att många använder systemet och att det uppstår saker ni inte kan återskapa?

Jaja, det var det du tänkte på mer kanske? Flera användare på samma system med trådar. Ja, det har vi upplevt så, vi har ju delvis en syncsnurra som kör, och så har du ett administrationsverktyg. Administrationsverktyget och det andra använder ju givetvis samma databas, det finns privat databas, en lokal, och så finns det ju google som skriver till samma lokala databas. Då blir det problematiskt om man säger då att vi säger att syncen har hängt sig så det tar extremt lång tid när nån kommer in klockan sju på morgonen och ska göra en snabb fix liksom och då håller den redan på och använder databasen. Det har vi aldrig sett att det händer men däremot har det hänt att nån glömt stänga av programmet på kvällen, då har inte det sparats ner som det ska göra vid avslutning och då körs programmet när syncen ska starta igång där på natten sen. Så vi har löst det är att om programmet är startat kan man inte starta programmet igen, det är låst till en användare liksom. Det är en superenkel modell att den skapar en fil i en mapp och existerar den filen går det inte att starta programmet, för då får man avsluta det korrekt.

Du pratade innan om konton att det var mycket användarkonfigurering med namn och personuppgifter. Hur hanterar ni det med testning om det blir clashar och sånt när ni ska skapa google-konton?

Det finns ju ganska specificerat vad som ska finnas och så där. Så man har bara en regel som dem måste följa, är det ett ’ä’ eller nånting så blir det ett ’a’ istället.

Johannes - Om det är dubletter så flera heter samma sak?

Då är det bara en räknare som snurrar på, så då blir det andersandersson2 och så vidare.

Johannes - Och vid testning av det så har ni testat det på nåt sätt? Eller stött på några problem?

Nej, vi har aldrig fått problem med det riktigt. Mer än att det har körts i typ ett och ett halvt år har det aldrig testats i utvecklingsskedet.

A-3

B EXAMPLE UNIT TESTS FROM TEST SUITE

B.1 EventExists [TestMethod] public void EventExists() { Dictionary> tasks = new Dictionary>(); tasks.Add(testPersonNr, new List());

//Create event id based on run time String eventID = GenerateEventID();

//Add two create event tasks with the same ID to the synchronizer queue tasks[testPersonNr].Add(new CreateEvent(){EventID = eventID}); tasks[testPersonNr].Add(new CreateEvent(){EventID = eventID}); calSyncer.AddTasks(tasks);

//Get number of warnings in the log before performing tasks int nrLogsBefore = Logger.Instance.GetNumberOfLogMessages("warning");

//Setup the environment to contain a task with specific ID to test against TaskResultCodes resCode = calSyncer.PerformFirstTask(); Assert.AreEqual(TaskResultCodes.Success, resCode);

//Try to create an event with the same ID that was already created and check its return code resCode = calSyncer.PerformFirstTask(); Assert.AreEqual(TaskResultCodes.Fail_IDExists, resCode);

//Check that the logs contain a new warning Assert.AreEqual(nrLogsBefore + 1, Logger.Instance.GetNumberOfLogMessages("warning"));

//Check that the queue contains a task Assert.AreEqual(1, calSyncer.GetTasksInQueue());

//Check that the first task is an update task var firstTask = calSyncer.GetFirstTask(); Assert.AreEqual(TaskTypes.UpdateEventTask, firstTask.TaskType);

//Compares the data of the generated UpdateEvent task to the CreateEvent task CheckTaskData(firstTask, tasks[testPersonNr].Last);

//Cleanup -- Store all created event IDs that needs to be removed when using SaaS createdEvents[testPersonNr].Add(eventID); }

A-5 B.2 NewActivityAdded [TestMethod] public void NewActivityAdded() { var setupSchedule = LoadSetupSchedule(setupDataPath); var testSchedule = LoadTestSchedule(testDataPath);

string testTime = GetTestTime()

//Setup start environment SetupEnv(setupSchedule, testTime);

//Check that the correct events exist Assert.AreEqual(true, CheckEvent(ID1, EventDataInfo));

scheduleComparer.GetChanges(testSchedule, setupSchedule); foreach (var taskList in scheduleComparer.tasks) for (int c = 0; c < taskList.Value.Count; c++) taskList.Value[c].EventID = taskList.Value[c].EventID + testTime;

calSyncer.AddTasks(scheduleComparer.tasks);

//Add created tasks to cleanup when using SaaS AddToCleanup();

while (calSyncer.GetTasksInQueue() > 0) calSyncer.PerformFirstTask();

//Check that the correct events exist Assert.AreEqual(true, CheckEvent(ID1, EventDataInfo)); Assert.AreEqual(true, CheckEvent(ID2, EventDataInfo)); Assert.AreEqual(true, CheckEvent(ID3, EventDataInfo)); }

A-6 C EXAMPLE MOCK-OBJECT UNIT TEST

[TestMethod] public void CreateEventDailyLimit() { //Trigger daily limit in database mockConnection.TriggerDailyLimit(true);

Dictionary> tasks = new Dictionary>(); tasks.Add(testPersonNr, new List());

//Create event id based on run time String eventID = GenerateEventID();

//Create task and add to calendar syncer tasks[testPersonNr].Add(new CreateEvent(){EventID = eventID}); calSyncer.AddTasks(tasks);

//Get number of warnings in the log before performing tasks int nrLogsBefore = Logger.Instance.GetNumberOfLogMessages("warning");

//Perform task and see if it succeeded TaskResultCodes resCode = calSyncer.PerformFirstTask(); Assert.AreEqual(TaskResultCodes.Fail_UsageLimits_DailyLimit, resCode);

//Check that the queue now contains an create task Assert.AreEqual(1, calSyncer.GetTasksInQueue());

//Check the first task var firstTask = calSyncer.GetFirstTask(); Assert.AreEqual(TaskTypes.CreateEventTask, firstTask.TaskType);

//Check so that a warning has been added to the logger Assert.AreEqual(nrLogsBefore + 1, Logger.Instance.GetNumberOfLogMessages("warning"));

//Check that backoff was activated and failedattempts set to 1 Assert.AreEqual(1, calSyncer.GetFailedAttempts());

//Disable daily limit after the test mockConnection.TriggerDailyLimit(true); }

A-7 Blekinge Institute of Technology, Campus Gräsvik, 371 79 Karlskrona, Sweden