tApp: testing the future

by Karsten Westra

University of Groningen

March 17, 2014

1 Change log

Version Date Author Comment 0.1 01-02-2013 Karsten Westra - Initial version/plan. 0.2 16-07-2013 Karsten Westra - Added layout for application chapter (6). - Illustrated basic layout using example. Minor detailed description of general workflow. 0.8 29-08-2013 Karsten Westra - Changed title "Specification: tApp" to "tApp: Testing the future". - Added change log. - Added acknowledgment. - Expanded applications chapter (6). 0.8.1 30-08-2013 Karsten Westra Chapter6: - explained device package - explained device browser. 0.8.2 05-09-2013 Karsten Westra - Explained and implemented report browser and graphs. - Stakeholder summary. - Added references chapter 0.8.3 07-09-2013 Karsten Westra Expanded chapter 7.7 with detailed cases: t/m Monkeytalk demo app, Fit for free, LG Klimaat. 09-09-2013 0.9 10-09-2013 Karsten Westra Written discussion. 1.0 11-09-2013 Karsten Westra Written conclusion 1.0.1 02-10-2013 Karsten Westra Thorough review. until 05-10-2013 1.1 07-10-2013 Karsten Westra Added chapter with related work. 1.2 09-10-2013 Karsten Westra Thorough review of references to chapters, goals, until figures and tables. 11-10-2013 1.3 14-10-2013 Karsten Westra * Added specification, distribution, reporting layout to chapter5 (previously known as implementation). * Reviewed chapter6 + added references to relevant requirements, figures and sections. 1.4 15-10-2013 Karsten Westra * Reviewed size and placements of figures in chapter6. * Reviewed and added references to figures 1.5 16-10-2013 Karsten Westra Merged Chapter 5.4 with 6. Rewritten implementation chapter and called it application. Rewritten introduction and reviewed references.

2 Version Date Author Comment 1.6 17-10-2013 Karsten Westra Reviewed old applications chapter and renamed it ’experiment’. 1.7 17-10-2013 Karsten Westra Review and rewrite of chapters7 and8. 2.0 18-10-2013 Karsten Westra Reviewed and rewritten discussion and conclusion. 2.1 22-11-2013 Karsten Westra * Changed structure of chapter 2.7 into Specification, Distribution and Reporting. * Added 7.1.2, 7.1.3 and 7.1.5 to discussion. * Renamed reference labels. * Changed al usages of you by a user, a developer and so on. 2.2 19-01-2014 Karsten Westra Refined chapter2: until * Added general comments about automated testing 12-02-2014 and how and why to tackle fragmentation with it. * Elaborated on how other researchers tackle fragmentation. * added section on the "perfect solution" in conclusion subsection.

3 Contents

1 Introduction 9 1.1 Scope...... 10 1.2 Goal...... 10 1.3 Thesis structure...... 11

2 Related Work 12 2.1 Automated testing...... 12 2.2 Tackle fragmentation...... 12 2.2.1 Proposed solutions...... 12 2.2.2 Useful theories...... 13 2.3 Existing tools...... 13 2.3.1 Monkeyrunner...... 14 2.3.2 UIAutomator...... 14 2.3.3 Robotium...... 15 2.3.4 Seetest...... 15 2.3.5 Telerik Test Studio...... 16 2.3.6 Monkeytalk...... 16 2.4 Conclusion...... 17 2.4.1 UI element recognition...... 18 2.4.2 Presentation of issues...... 18 2.4.3 Cover representative group of device types...... 18 2.4.4 Nice to haves...... 18

3 Concept 19 3.1 Description of stakeholders...... 20 3.2 Stakeholder requirements...... 20 3.3 Summary...... 21

4 Design 23 4.1 General data flow...... 24 4.1.1 Project and versioning...... 24 4.1.2 Test and device package...... 24 4.2 Detailed data flow...... 24 4.3 Refined design...... 26 4.3.1 Create a project...... 26 4.3.2 Specification perspective...... 27 4.3.3 Distribution perspective...... 28 4.3.4 Reporting perspective...... 29 4.4 Summary...... 29

5 Application 31 5.1 Specification...... 31 5.1.1 Test package...... 31 5.2 Distribution...... 32 5.2.1 Test execution...... 32 5.2.2 Device package...... 32

4 5.2.3 Collected device information...... 33 5.3 Reporting...... 33 5.3.1 Stable behavior...... 33 5.3.2 Verify expected values of components...... 34 5.3.3 Execute native code...... 34 5.3.4 Boundaries...... 34 5.4 Equivalence class...... 35 5.4.1 Predict behavior...... 37 5.5 Summary...... 37

6 Use cases 38 6.1 General usage...... 38 6.1.1 Dashboard...... 38 6.1.2 From nothing to test report...... 39 6.2 Device package...... 42 6.2.1 Device data...... 43 6.2.2 Preparing an existing project...... 44 6.2.3 Script/Suite exports...... 44 6.2.4 Device package internals...... 46 6.3 Device browser...... 46 6.4 Report browser...... 47 6.5 More complex examples...... 50 6.5.1 Experiment...... 51 6.5.2 Explanation of cases: ’Monkeytalk demo’, ’Fit For Free’ and ’LG Klimaat’... 51 6.5.3 Monkeytalk demo...... 52 6.5.4 Fit for Free...... 56 6.5.5 LG Klimaat...... 60

7 Discussion 63 7.1 Challenges...... 63 7.1.1 Specification...... 63 7.1.2 Validation of specification...... 64 7.1.3 Portability of specification...... 64 7.1.4 Reporting...... 64 7.1.5 Beyond reporting...... 65 7.1.6 Running from background...... 65 7.1.7 Test execution and app preparation...... 65 7.1.8 Settings...... 66 7.1.9 Record from device...... 66 7.1.10 Equivalence classes and predict behavior...... 67 7.2 Improvements...... 67 7.2.1 Ease of test result inspection (G1)...... 67 7.2.2 Level of detail (G2) and Queries (G3)...... 67 7.2.3 Ease of test execution (G4)...... 68 7.2.4 Scalability (G5)...... 68 7.3 Summary...... 68

5 8 Conclusion 69 8.1 Specification...... 69 8.2 Distribution...... 69 8.2.1 Settings...... 69 8.3 Reporting...... 70 8.4 Summary...... 70

A List of requirements 73 A.1 Specification (FRS)...... 73 A.1.1 Distribution (FRDI)...... 74 A.1.2 Reporting (FRRE)...... 74 A.1.3 Device package (FRDP)...... 74 A.1.4 Test package (FRTP)...... 75 A.1.5 Non-functional requirements (NFG)...... 75 A.1.6 Evolution requirements (ERG)...... 75

B Scripts/suites used 76 B.1 Monkeytalk demo...... 76 B.2 Fit for free...... 77 B.3 LG Klimaat...... 78 B.3.1 Scenario part 1...... 78 B.3.2 Scenario part 2...... 78

6 Acknowledgements

First of all I would like to thank my main supervisor (Alex Telea) for giving the large amounts of feedback and suggestions. He really helped me with the process from idea to working prototype. Thank Peperzaken, the app development company who I work for. Thanks a lot for having the patience to let me finish my study next to work. And thanks for lending me a workspace setup with test devices and apps. It really helped me to try everything on a range of devices with existing apps. Thank to my family who always supported my choices and wishes. You motivated me whenever possible when things got really challenging. Thanks to you all for helping me get to where I am now!

7 Abstract Mobile phones are a fast growing technology on the market. Nearly everybody has a smart phone nowadays. There is an astonishing amount of different device types to choose from. It is important to adequately test an app on these device types. Owning them all is not feasible either. There are two possible solutions: own a subset of devices on the market or try to potentially reach all of them without owning them. We presents a solution in the form of a testing tool called tApp. This tool makes execution of a test on a device without owning it possible. Furthermore we tried if we can group certain sets of devices and predict behavior of software based on similar device types. We elaborate on the entire process from ’blank’ app to test to inspecting results and automating the execution process. The final goal was presenting these results in an insightful way.

8 1 Introduction

It is difficult for developers nowadays to develop an app for a mobile platform that runs flawless on every device out there. This is caused by the sheer amount of different types of devices that exist on the market. When wanting to test an app for iOS, the mobile Operating System (OS) designed by Apple, it comes down to about five different phones and three types of tablets, which makes a total amount of eight devices. The more difficult case of testing an app for Android, the Mobile Operating Systems designed by , has significantly more devices on the market. This is often referred to as fragmentation. The amount of different devices types (phones and tablets) on the market that have the Android operating system created by Google running on it is high. It is so high that an adequate overview of these types is very difficult to obtain. Let us begin with the notion that there are at least ten manufacturers that create different device types. Each of them has about five device types on the market (10 manufacturers * 5 devices = 50 Devices). Then it becomes apparent that this is five times more than the eight devices types Apple has on the market. And this is a very rough estimation of how many devices running Android exist today. This estimation is far from accurate. There are probably more. The next difficulty to note is that manufacturers of device types that run Android each have there own slightly different version of the Android OS. Apple also releases new versions but the amount of different OS versions is not as high as with Android. Accu- rate testing and/or trying to guarantee that a developed app works on all devices is an amazingly expensive and time consuming task.

Figure 1: Mobile testing in practice

Business experience reflects that testing an app consists of a list of actions that have to be performed in an app in a certain order. A test subject receives a device type and a

9 list of actions. These actions are executed on this device and strange behavior is noted. When this is all done then a test subject has to repeat it all again with the remaining 49 devices. The first problem here is that this approach is incredibly time-consuming. This, in turn, creates economic problems e.g. passing such costs to customers. With this notion we also have to conclude that it is not feasible to own all these 50 devices. Owning them all would get rather expensive. Another issue is that a device only shows that an app crashed. It does not show what piece of code causes this crash. But this is unfortunately what a developer wants to know. If a less technical person is the tester than it is difficult to pass on this information to a developer.

1.1 Scope

We argued that the testing process of mobile apps is an expensive process. To decrease this process we propose a software testing tool called tApp. tApp will try to simplify and generalize the testing process with different Operating Systems and device types. With detailed test reports tApp will help developers to see what goes wrong in an app they built. These test reports will also show when and where things go wrong. Furthermore we will try if tApp can predict behavior of an app on a certain device type, or group of device types, based on past results. Ultimately we want to try and define a statement about future test results based on a smaller set of device types.

1.2 Goal tApp aims at providing the infrastructure, tools, and techniques allowing developers and end users to create and use a "bidirectional device-to-app stability mapping". The map- ping encodes whether a given app runs stably on one or more devices, and conversely, which apps are run stably by a given device. End users can use this mapping to assess the stability of an app on their device types. Developers can use the mapping to assess the stability of their newly developed apps. The next list summarizes the top-level re- quirements. They are given a number so we can refer to them later. A example might be reference G1 , which refers to Goal 1. The top level requirements for this mapping are as follows:

• Ease of test result inspection (G1): end users and developers should have an easy-to-use way to query the mapping, e.g. find out which apps are stable on a device type, on which device types is a given app stable, and similar; • Level of detail (G2): the mapping (and presentation) should allow browsing the contained information at several levels of detail, e.g. by organizing device types, applications, and stability reports in a hierarchical manner; • Queries (G3): besides the above simple queries, the mapping should support more advanced queries, such as finding our apps, or device types, which behave similarly with respect to stability; • Ease of test execution (G4): developers should have an easy-to-use way to add new stability information about their app(s). In particular, test-running the app on a family of device types should be a lightweight process, which is executed as

10 automatically as possible, and which does not require physical ownership of the device types. • Scalability (G5): the presented solution should be scalable to accommodate a large number of apps, users, and device types (conservative bounds are tens of such instances). An end user should not care why an app will or will not run on their device type, but only cares if it runs on their device type. To be able to answer such a question of the end user, we need to say something sensible about an end user his/her device type with respect to what we have seen before in the available test cases. This could be achieved by defining equivalence classes. These classes divide device types into groups that should show similar, stable, behavior. When a user then wants to know if an app runs stably on their device type we only have to compare the information of his device to the known device types and test cases in tApp’s . Based on this we can predict if an app will show stable behavior. Chapter4 elaborates further on what we mean by stable behavior and equivalence classes.

1.3 Thesis structure

This document proposes a tool that tries to solve the difficulties with testing that exist in the mobile app development field. We start by looking at existing tools on the market an describe their features in chapter2. After looking at existing tools we identify stakeholders of the system in chapter3 and explain the requirements that each of these stakeholder might have. Chapter3 explains on a conceptual level what problem tApp addresses and clarifies how the stakeholders fit in the picture. After that chapter4 proposes a design and in chapter5 we elaborate on the implementation of the test tool. To verify if the approach works we discuss possible applications of the app in the form of use cases of the test tool in chapter6. When this verification is completed we discuss what parts of the requirements tApp covered and which proved to be difficult in chapter7. Finally in chapter8 we conclude on tApp as a framework and its results.

11 2 Related Work

In the previous chapter we mentioned that testing a mobile application on the abundance of device types on the market is time-consuming. We begin by looking at activietis in the acadmic world to see if we can find existing theories, solutions and/or ideas that can be used to propose a possible solution that tackle the fragmentation issue in the mobile testing field. Besides that we look at existing test automation tools currently available on the market that could simplify this testing process. The most well known tools used are monkeyrunner, uiautomater, robotium, Telerik test studio, seetest and monkeytalk. Finally we will explain what each of these tools offers a user and assess their strengths and weaknesses with respect to our goal. Finally that we assess a "perfect solution" and what kind of skill set a user of the "perfect" tools needs to be able to work with them.

2.1 Automated testing An important question for a developer to ask to himself is if it is feasible to put a lot of effort in black-box testing of an application. There are only few references to theories and solutions to tackle the fragmentation issue by automating the test execution process. This could be explained by the fact that are relatively new. Another reason could be that a lot of solutions might work but prove to be difficult in use or prove to be hard to implement. At the time of writing there is no out of the box solution that solves all our challenges. The software development industry begins to see that testing is important to stay at the top of the ranking of the ever growing amount of apps in different app stores. A few negative review give a lasting impression that a brand will never survive. It is also virtually impossible to recover from a bad name. This makes a well tested app essential to its success. To achieve a perfect status there really is only one theoritcal solution: test an app on all deveice types that exist in the world. We already mentioned that this is not feasible. Baride and Dutta[2], Dubinsky and Abadi[3] and Haller [1] all note that a good solution currently might be a cloud based test platform. However the decision which range of device types is suffcient to cover a significant part of device types that are used in the world is a difficult hurdle to take. The perfect solution would be to connect every device out there to a "test cloud" wirelessly and chooce a representative subset of them all to cover all relevant devices to test with.

2.2 Tackle fragmentation The main challenge that we noted is the fragmentation issue. There are many different combinations of device types running different versions of an Operating Systems built by different manufacturers. These abundant amount of device types make developing an app for these platforms very time consuming. Sice different devices type have different device traits it becomes vital it is tested properly. Untested apps might lead to low ratings in an app stores (if they do not function properly. Which inevitably means no user will use an app.

2.2.1 Proposed solutions Baride and Dutta[2] propose a cloud based system with emulators and real devices to accurately test a mobile app. This approach indeed has the advantage that a developer can test an app on a phone connected to a cloud. They also mention that there are many aspects of an app thatshould be tested. Business based app (or apps that communicate over a network) are vastly more complex and require more extensive testing. Another usefull notion is that an automated script should be abstracted

12 from the UI of the app since there are to many different devices running different operating system. However automated testing is the solution according to Baride and Dutta[2]. Dubinsky and Abadi [3] have made an assesment of what parts of testing in mobile development are important. They list all the issues that need to be addressed and propose an agenda on how to tackle these key drivers. Many of them point out that we have to cope with many device type platforms and diversity of device types running these platforms. A solution proposed by Ridene and Barbier[4] is a Domain-Specific Modeling Language (DSML) which they call MaTEL (Mobile Applications Testing Language). A modeling language in which they can uniformly specify the behavior of an application which is in theory platform specific. Their solution offers a smart theorie on how to control sensors in a device type (e.g. Wifi, GPS, etc.). They approach changing a sensor setting as a set of actions on a device leading to the correct setting of a sensor. Navigating a phone is basically the same as navigation through an app. A very detailed description of the test process of a mobile application is described by Haller [1]. The trade off between different types of tests, time spend to test extensively and user reviews obtained afterwards is an extremely difficult process. Haller notes that to make this process easier to grasp a developer wants to automate different parts of this testing process.

2.2.2 Useful theories A possible solution that all researcher propose is some sort of cloud based test platform that connects real devices to a test bed and, remotely, start tests on these device types. This could be done by attaching simulators/emulators to some sort of cloud. But the large amount of different device types prove in practice that they do not accurately mimic the behaviour of all device types on the market. This due to the different amounts of (small) difference in versions of the operating systems out there. Since device types are so different you could argue that a useful testing tool needs some sort of general specification of a test. A test can then be (automacilly) executed on a device connected to a cloud. With an analysis of the current screen and applied image processing a test tool could determine UI elements. These UI elem and gestures performed on them are in theorie a solid platform independent script executable on a device type. Such a specification could be writen down by using a DSML. Another approach is to use natural language to descibe these scenarios instead of a Domain Language. One could argue that a DSML is a more structured language for test specification. If one however has the goal to save a developer time, natural language can be used to let someone else then an developer write down a test. There are a lot of proposed solution on how to create a testing framework, preferably in a cloud, that automate the execution of a test on a certain device type. Nobody every really elaborates on the step after test execution. How do we acurately present results from a test so that an analyst immediately sees what parts were successful or failed in a test run. The amount of device types on the market is large. This would seem that it is a very important part of the testing process that should get some extra attention. There however is no real list of references of researchers who elabotrate on how a test report should look like. This could be because there is no perfect solution on how to automate test execution on a representative set of device types.

2.3 Existing tools When a developer would look for a (commercial) tool that makes automated application testing possible then there are a few tools that offer some of the theories explained in the previous section. We will now list some existing testing tools and their functionalities. We furthermore can see im-

13 plementations of some interesting theories which are proven to useful and feasible to implement in practice.

2.3.1 Monkeyrunner Google provides a tool [5] for app developers in which they can test their apps. This tool is called monkeyrunner. It tests apps on a functional level. Google offers an API, written in python, to their device types. This means that monkeyrunner can initiate gestures like pushing a hardware button or executing a gesture on a touch screen if it is present. It is not so much coupled to an actual of a certain app. It is more of a remote device controller. That is execute an action on a hard- or software part of a device type and see what an app does if this is executed. Monkeyrunner runs outside and independent of an app. It is thus not necessary to have the actual source code of the app under test. It works more or less out of the box. Monkeyrunner is started from a developer’s workstation. A test written in python can be executed on multiple devices connected to that workstation. A developer can create a program which installs an app, then runs it, sends different gestures to it and takes screenshot during the process. A device must be connected to this workstation using the Android Device Bridge (or ADB) provided by Google. ADB is an interface between a workstation and a device type. A workstation can communicate with the operating system running on an Android device type through ADB. Monkeyrunner is specifically designed with Android in mind. A user needs to be a prog rammer to use of monkeyrunner. The fact that it runs outside of an app gives it potential for very generic specification of what to test. It works quite well but focusses to much on controlling the device rather than controlling the app under test. In the previous chapter we mentioned that physical ownership of many device types is expensive. Monkeyrunner needs a connection to ADB which means that physical ownership is required.

2.3.2 UIAutomator Another tool [6] that Google provides for developers using Google’s operating system is uiautomator. This testing framework gives developers opportunities to efficiently create automated functional tests for multiple android device types. It supports Android API level 16 which effectively means version 4.1 and above. Many device types currently on the market are still using older versions of Android. This means that using uiautomator already excludes a large group of device types we can actually test. Uiautomator requires a connection to a workstation through ADB. With respect to monkeyrunner, uiautomator focusses more on controlling the UI instead of controlling the device type it runs on. It is still based on the same principle of controlling the device. But uiautomator gives more control in testing a user interface than monkeyrunner does. The API that uiautomator offers has more functionality that lets a user select a certain User Interface element like a button. Another very strong feature that uiautomator offers is that a part of the tool shows a visual analysis of a user interface. It does this by creating a screenshot and analyzing the views that are currently visible. Selecting a specific element is easy since the tool shows which are available and information about their properties. It also shows what type they are and where they are located on the device type’s screen. uiautomator relies on accessibility support of Android. Components are identified by the text on labels and the content description of a UI component. This requires specific knowledge of the apps UI structure and naming of the UI elements. Al test are packaged in a single jar file which makes reuse of parts of the scripts difficult.

14 uiautomator is, like monkeyrunner, a good and simple tool for testing an app by controlling a device type it runs on. An improvement over monkeyrunner is that it focusses more on the UI. It gives more freedom in where to execute a gesture on a touch screen. A downside is that it only works on higher versions of Android.

2.3.3 Robotium Robotium [8] is open source testing framework for Android apps. An added bonus over monkeyrunner and uiautomator is that in the latest version it also supports testing hybrid apps. These apps use html for the UI instead of the native features Android offers. So Robotium can also test PhoneGap based apps. Robotium is focussed on the UI of an app. A developer sends gestures to an app using a simple java framework. This framework can send all kinds of actions to a specific UI component in an app. This is different the monkeyrunner and uiautomator. They could only send a gesture to a certain point on a touch screen. Robotium can control a UI component without exactly knowing where it is located on the screen. It propagates this to Robotium and lets the framework search for the component and execute the gesture there. Robotium can also verify certain properties on UI components because of this. A developer knows how his/her app works. A developer can write a test to verify a certain result of a screen that appears after a certain button is pressed. This gives a lot more flexibility than monkeyrunner and uiautomator offer. Because of this the API Robotium offers is more simple then the offered by monkeyrunner and uiautomator. Monkeyrunner and uiautomator depend on the developer to execute gestures on the location of a UI component. For Robotium to work a Android device type needs to be connected to a workstation through ADB. This means we need physical ownership of the device. It is not necessary to have the source code of the app undergoing the test. So Robotium can also test apps that are pre-installed by the manufacturer of a device type. The results of a test is a unit test report. This comes down to a list with the created tests and some marks that show if it is successful or not. It does not take screenshots like monkeyrunner and uiautomator. The visual result in the form of a screenshot of a device is useful to immediately see what is successful or not.

2.3.4 Seetest An interesting commercial tool created by Experitest takes a somewhat different approach. This tool is called Seetest [7]. It takes a visual approach with respect to mobile testing. They offer support for testing multiple platforms that have device types on the market. Besides Android and iOS they also offers support for Windows Phone and Blackberry. They rely on the accessibility features of the platforms they run on to recognize UI elements and execute actions on them. They however also introduce another way to search and find UI element. They use image recognition of the screen. It does this by recording gestures on a connected device type. A device has to be connected to a workstation or VPN cloud as Experitest calls it. The source code of the app to test is not necessary. A developer can simply provide an application binary and record tests from it there. Specification of a test is done in a visual editor. A user can simply connect the device, provide a binary and push a record button. All actions executed on the device are recorded by the tool. A developer can distribute his tests to al the supported platforms. So a specified script is portable enough to be executed on all device types running the supported platforms. This makes it a very strong tool for testing the same app on different platforms.

15 Reports of the tests are presented as a list of the recorded actions. This list shows the command as executed on the device, a symbol indicating its status and a screenshot after execution. It also shows a highlighted area on the screenshot.This is the area where the user executed his action. Better said it is the UI element that Seetest identified when it was recording. Another interesting feature that Seetest offers is the ability to set up a device hub. With this device hub a costumer of Experitest can create a private cloud with device types connected to them to test on. This hub can be accessed from anywhere in the customer’s VPN. This potentially encloses all the device types that are available at the customer for testers to work with. It would still not enclose all available device types in the world. But one could envision creating a subset of device types that cover what is generally available on the market. Experitest created an interesting tool that takes the visual approach in UI recognition and and script specification. However the image recognition do not always make it somewhat difficult to work with. The correct UI elements are not always correctly recognized. Distributing tests over different device types running different platforms is very interesting. The added feature of creating a user’s own device cloud make it a tool worth investigating. Down side, it is a commercial tool that comes at a price.

2.3.5 Telerik Test Studio A different approach in testing an application is used by Telerik. They offer a test studio for iOS apps called Test studio [9]. This tool can reliably test native, hybrid and web apps. A test can be created on the device type itself. There is an app available in the app store that gives a developer the possibility to specify a test. Through this app a user can specify a set of actions to be executed. They are executed in order. The results are then sent back to a web portal. The test specifications can also be sent to the web portal. The app that telerik offers can also synchronize scripts that were created on another device type. The cross device type playback of a recorded script and the web portal synchronization give flexibility with respect to ownership of the device. In theory anyone with the credentials for an account can execute the scripts that are connected to that account. Telerik explicitly mention on their website that they do not use image based detection to find a UI element and execute actions on them. They use object based recording. It would be easy to assume that they use the accessibility features of iOS like Robotium uses from Android. Using image based recognition like Seetest would mean that a redesign of the UI of a developer’s app would lead to a redesign of the test specification. However UI redesign does not necessarily mean redesigned functionality of an app. The web portal stores and show all the created reports of a certain app. It lists the crash reports of an executed test. Besides that it also gives a simple overview of how many test succeed or fail. It does not show screenshots of the screen of the device type under test like monkeyrunner and uiautomator do. In term they show a list of commands and show if it succeeded or failed. Another welcome feature is the overview of all reports that have been executed. This gives a simple overview of which tests have been executed where and what were their results.

2.3.6 Monkeytalk The last tool we will look at is Monkeytalk [10] created by Gorilla logic. They offer an open source tool that is free to modify and use. It offers support for tests across different platforms. They support the largest native platforms, namely iOS, Android. At time of writing these are the largest platforms on the market. They furthermore also support testing websites and web apps. Tests can be executed on all the supported platforms with a provided Integrated Development Environment (IDE). Tests

16 are written in a simple command language. This language is a simple form of english in which a script can be specified. Another scripting language in which a developer can specify the test is javascript. Another strong point is that a script can be recorded directly from a device. A developer pushes a record button and the commands come back and are combined in a script. It uses accessibility features to recognize UI elements in these scripts. A recorded/specified script can be executed on all the platforms that monkeytalk supports. Since developers are busy people it seems like a good idea to leave test specification to other parts of a development team. However this needs a side note. Writing down a test script that works on all supported platform is rather difficult when someone does not consider naming conventions for the UI elements across platforms. A none developer generally might not know this. Since there are different fallback methods like, content description and text on labels makes it possible to give all UI component names that are understood by all platforms. This however needs to be carefully communicated with all developers across a team. This generally requires more effort than other tools. To execute a test a user has to prepare an app with a submodule (or so called agent) that listens for the commands to execute. Having the source code is required for this tool to work. The agent is a basic network component that listens for test commands through a network connection. This requires a little bit of extra preparation. But in return a developer can execute a test whenever they want wherever they want. Given that the IDE can reach it through a network connection. This makes it a very strong tool. In theory a developer could execute a test on all the device types in the world given that it has a network connection. In practice there is no network connection that covers the entire world. As briefly mentioned before an app has to be reachable by the IDE. The reports that monkeytalk provide are a html pages with different tests on it. The reports show the commands that were executed and if they were successful or not. When a command fails a screenshot is taken of the screen at the time of an error. This gives a good visual overview that makes it rather easy to see what went wrong. Failures are detected faster then when having to read through a stack trace that the and Android IDE gives when a developer is debugging the source code of an app.

2.4 Conclusion After looking at different fields of research we should have enough information to define a "perfect solution" of a testing tool that tackles the fragmentation issue. There is no out of the box solution yet. However existing implementations of certain theories tell us that it is feasible to develop a tool that covers the needs of a developer of an app. A good statement to make here is that to create a tool that makes testing less time consuming is very time consuming. The perfect solution has an encridibally large feature set. From which some features have not been thought out yet in great detail. We have however learned that the perfect solution is a cloud based tool that preferably connects all the device types in the world. A test run can be started from a device and remotely. In the report that a test run presents is immediately clear what went wrong in the app under test. It furthermore gives an overview of behavior of an app across different device types. We looked at some of the different tools that are available on the market for testing purposes. We furthermore look at how they implement the theories we found before looking at these tools. They all differ in support of different platforms, methods of recognizing UI elements and presenting results. Some support only Android or iOS. Others chose portability and support script execution across multiple platforms. We will now briefly summarize the core features we will use in our implementation of tApp. More on tApp in chapter3.

17 2.4.1 UI element recognition Monkeyrunner and UIAutomator take the approach of controlling a device instead of the actual UI of an app. With respect to these two Monkeytalk takes the approach of focussing on controlling the app instead of the device. The feature that Telerik’s Test studio and Monkeytalk offer in the form of actual script specification (or recording) on an actual device makes script specification easier and more visual. When offering support across different platforms the only challenge that remains is detecting the same UI element in the same app on a different platform. Many of them succeed in this by using accessibility. However none of them recognize everything correctly close to 100% of the time.

2.4.2 Presentation of issues An approach that nearly all tools take in presentation is listing executed commands and their statuses. Showing a screenshot at time of error and success, like monkeyrunner, uiautomator, Seetest and Monkeytalk show, makes inspecting reports rather visual and intuitive. What all the tools seem to do is present report overviews in a web portal. This gives an overview of what happened in a test on a device type. What all tools appear to be missing is what happens across these different devices. How does an app behave across different device types? It might work well on some, and not at all on others. However no tool really shows this in an overview.

2.4.3 Cover representative group of device types Most of the tools we examined need to be connected to a workstation for testing to be successful or even start. Seetest takes an interesting approach by giving the ability to setup a private device cloud. But this is still limited to the device types that the organization who wants to setup this cloud owns. Monkeytalk takes an interesting approach to the distribution of a test. The network component that listens for commands ad sends result gives flexibility to where the device is in a network. This makes the range of where a device can be and who owns it larger with respect to a usb cable connected to a workstation.

2.4.4 Nice to haves A combination of all the features that all the tools offer are fairly useful. However there is no tool that offers the right combination of features to make it tackle the fragmentation problem that exists. Support across multiple platforms is welcome but this requires some compromises in development of an app. UI element recognition does not appear to be trivial. A welcome web portal that shows app behavior across platforms and device types is not really present. Connection to a workstation is not ideal if a developer wants to test an app on all the mainstream device types on the market. Monkeytalk developed by Gorilla Logic is closest to what a developer wants in a test tools. In chapter 3 we will discuss the basic principles we have seen in the available tools to specify and execute a test. We will discuss why the current tools lack some features to effectively tackle the fragmentation problem. Besides that we will propose a new tool, based on Monkeytalk’s ideas, that combines the best features from all examples we have seen in the existing tools.

18 3 Concept

Given the testing challenges mentioned in the previous chapter, our main research question is as follows: How can we provide tooling support that reduces the costs, and enhances the quality, of testing mobile apps on the Android platform? We will answer this question practically and pragmatically, by developing such a tool called tApp. Since we cannot cover the entire list of potential requirements emerging from the above research question, we focus on a subset of these. Specifically we aim to provide efficient and effective ways for users and developers to explore the correspondence, or mapping, between a set of tested apps, and a set of device types. This offers high-level insight into how the various aspects of tested apps behave on the tested devices. As such, our redefined research question is How can we provide insight into the test results of a set of apps on a set of device types? Concrete examples of the use-cases covered by our proposal are answering questions such as: Which aspects of a given app work well (or not) on a set of device types? Which is the set of device types where a given app aspect passes (or fails) testing? To begin to understand how we can get an answer to this research question we first need a conceptual flow from test specification to test report. This chapter explains a test pipeline that separates different steps in tApps workflow. It furthermore defines the stakeholders of the test tool, the requirements that they have and a priority which we use to prioritize the work that needs to be done. Based on the observations in chapter2 the functionality tApp offers is separated in three different components. Figure2 below shows the main pipeline of the most ideal test tool. Each of the roles uses the system in a different way. Furthermore a stakeholder might not be interested in all of the output that each component offers. Therefor the next section identifies three stakeholders.

Figure 2: Test pipeline

tApp offers a three step process for testing a mobile app. A developer starts with the specification step. In this step one or more scenarios are written for a certain app. A scenario in this case is a list of actions to be executed on the device. Think of actions like pushing a button, and more. We’ve seen multiple approaches in chapter2 for this specification. We could specify a scenario in code, record from a device itself or specify it in English. The scenario and the executable app are combined and exported to the distribution. This is a component which is responsible for multiple actions. They do not need to be dependent on a platform. In the ideal tool a scenario can be specified once and executed on all supported platforms.

19 When looking at Monkeytalk and Seetest, we noticed that some sort of web portal is useful for a good overview of a set of executed tests. tApp adopts this notion by offering a distribution perspective. In this perspective a tester can register a device, browse all registered devices and prepare and enrich an exported specification from the specification step. The package can be enriched with a selection of known device types to run on, a set of sensors that can be switched on or off. The distribution finally collects the results and stores them. An analyst is the final stakeholder in tApp’s workflow. He or she can access test results from various executed test scenarios. He can furthermore query the result set and see how an app behaves on different device types. They other way around is also possible. An analyst can see how a device behaves with different apps. A choice is offered to visualize these results so that it is clear which devices and/or parts of an app need more reviews from a developer.

3.1 Description of stakeholders Because of different use of components in tApp we identify three stakeholders. This division implicates that tApp will probably incorporate or offer three different views for tApp. These views can be used separately by the stakeholders. The three stakeholders and a description of their concerns can be found in table1 below.

Stakeholder Description App developer A developer that wants to create tests and find out if his/her app shows stable behavior on a wide range of devices. Tester A test donates some time with his/her device to give developers the possibility to test the behavior of an app. Analyst Customer, for which the app under test was built, who wants to see a test result about which devices can run an app and explains behavior in a set of scenarios.

Table 1: identified stakeholders

3.2 Stakeholder requirements The general system requirements are decomposed further on a per-role basis. Table2 below refines the general requirements for each of the roles identified in the previous section. Furthermore the requirements link back to the goals given in chapter 1.2. The G and the major number before the dot refer to goals in chapter 1.2. The minor numbers are sub goals that cover a small part of the main goals.

20 Stakeholder Requirement App developer Interoperability/Extensibility (G5.2): a test should be executable on the largest available operating systems on the market (Android, iOS). Usability (G4.1): It should be possible to easily specify a test suite that is (automatically) executable on many devices. Customizability (G2.1): A developer can customize device options of a test package for optimal test coverage (manipulate sensors and network connections). Portability/Interoperability (G5.1): A developer should be able to publish tests which are executable on most of the available devices on the market. Tester Usability:

1. (G2.2) See relevant information on the behavior on a combination of apps and devices.

2. (G4.2) Run a specified test with as few actions as possible.

Scalability (G5.3): Run a test on many devices without difficulties. Analyst Usability:

1. (G2.3) See whether the app that is released is stable enough to see the light of day. 2. (G1.1) tApp should show clear and easy to read test reports

Testability:

1. (G1.2/ G2.4) Get a overview of flaws that exist in the app to be released. 2. (G3.2) Get an overview of stability of similar apps on the same devices with respect to behavior of apps on similar devices in past test results.

Level of detail (G2.5): an analyst should be able to change the level of detail that a report shows with as few effort as possible Simplicity (G3.1): See behavior of an app on a specific device with respect to other devices.

Table 2: requirements for each stakeholder

3.3 Summary There are three stakeholders that interact with the system. The app developer has created an app and want to easily create a test, run it on a device customize device settings and publish these to the world in the specification perspective. The tester wants to run the actual exported scenarios and see a general overview of a test on devices in the distribution perspective. Finally an analyst wants to be able to see an overview of app behavior and details of a single run in the reporting perspective. That is globally what table1 and2 describe. Table2 is used extensively in the remainder of this

21 thesis to keep track of what we covered from these requirements.

22 4 Design

In the previous chapter we identified or main research question and identified the stakeholders for tApp. It focusses on developers and testers of mobile apps that want to guarantee that their app is functionally perfect. Model3 below proposes a design for a possible implementation of tApp. This chapter will furthermore explain the global functionality that each component/subsystem in the model offers. We will extensively use flow charts to illustrate all the different components of the tool. We will point out the workflow that guides the route from the specification of a test to getting desired test results. We will explain how each component in figure3 should be used. This model below shows the test pipeline from the previous chapter in more detail. A detailed list of all the requirements can be found in appendixA.

Figure 3: tApp’s proposed architecture

23 4.1 General data flow The three steps in the test pipeline defined in chapter3 can be seen in figure3. A developer creates a specification in an editor. Here he/she also inserts a location to an executable app. The app and specification are exported to the distribution system in a test package. This package is further explained in chapter 4.1.2. In the distribution a developer can prepare his test for publishing and enrich the scenario with specific devices to run on and settings for these devices. A tester can list and run tests from the device package. It shows all published projects provided by the distribution system and gives the option to download and run a project. All data necessary for running a test is retrieved from the distribution in the form of a test package. When the test is finished results are stored in the distribution system. This supports our goal to support easy of test execution mentioned in chapter 1.2. When a test is specified, distributing it is achieved by selecting a test to execute from a list and push start to actually execute it. The reporting system retrieves the results from the distribution and visualizes the results for the analyst. These visualizations support our bidirectional device-to-tested-app mapping introduced in chapter 1.2.

4.1.1 Project and versioning To distinguish between different scenarios and their results we introduce a project in tApp. An app and a specification can change. This also means that the test results from certain scenarios can become obsolete. A project is a component that stores a certain scenario and test results that it receives. When a specification and/or app change the previous test results are archived. This introduces the notion of a version and with it history. A version is a combination of a specification, executable app and test results. When the specification of a scenario changes the old version is replaced by a new one. The project is a container that stores these versions and makes the distribution able to distinguish between them. The distribution gives an analyst the option to visualize the history. This is useful for a good overview of how certain problems with a version of an app are fixed in newer versions. This improves the easy of result inspection with respect to different versions of an app introduced in chapter 1.2.

4.1.2 Test and device package To make access, storage and listing published projects easier tApp introduces two structures that make it more convenient to handle test execution and specification . We start by defining a test package. This is a container that wraps a test specification, executable app and devices and settings. These can be downloaded and executed on a device. A mobile device should be capable of listing and running the available test packages. For this a device package is introduced. This package lists al the published tests and gives the option to start an app. Before running a test it also notifies the distribution system that it can expect test results in the future. This is to prevent test runs without test results in the distribution component.

4.2 Detailed data flow When a developer starts tApp he first has to create a project, or open an existing one. When a project is opened a user sees a view which contains three perspectives. A developer can choose which of the perspectives below he/she wants to open. • the specification perspective (decomposed in section 4.3.2).

24 • the distribution perspective (decomposed in section 4.3.3).

• the reporting perspective(decomposed in section 4.3.4). Each perspective is a step in the test pipeline explained in chapter3 and is further decomposed in the remainder of this chapter. The following flow chart in figure4 shows the basic workflow in tApp. There are a few actions that contain a ’*’. These actions are decomposed in chapter 4.3. They explain the usage of the different perspectives and their workflow in more detail.

Figure 4: General workflow

A developer should start in the specification perspective. In this perspective it is now possible to specify a scenario. This can be a script or a suite. In a suite a developer can add multiple smaller scripts to be executed in sequence. This can be done by recording actions on a device. A developer can switch between editors at will. More details on the specification perspective in section 4.3.2. When a developer has exported their scenario in the form of a test package he/she can switch to the distribution perspective. It offers the possibility to publish the test package that they just

25 created. Optional device settings can be added to the project for a selected scenario. Besides device specific setting he/she can add a range of device types in tApp’s database that he wants to run his scenarios on. They could also set permissions to (dis)allow other devices to run the test for optimal customizability. Everything is now ready to publish the test. More details on the details of the distribution perspective in section 4.3.3. Finally there is the reporting perspective. In this perspective an analyst can look at al the test reports that are available. He/she can customize what kind of reports he want to see and also the level of detail as introduced in chapter 1.2. Think of behavior of an app on different devices or different devices running an app. More details in section 4.3.4.

4.3 Refined design This chapter explains different necessary components. They consist of refined workflows from the previous sections in this chapter.

4.3.1 Create a project Initially a user has to create a project or open an existing one. The developer initially has to provide a project name, the platforms it wants to creates tests for and a connection to one or prepared app binaries for each platform. This makes it easier to query the mapping as introduced in chapter 1.2. The following model refines the action "Create new project" illustrated in figure5.

Figure 5: Create new project

26 4.3.2 Specification perspective In the specification perspective a developer and/or tester can specify his tests in an editor. A developer can specify a test by:

1. recording actions on a physical device. 2. writing a script. A device could be connected to tApp through a network connection. A developer adds a device by entering its IP address. tApp and the device can now communicate. tApp can record scripts through this connection. The developer can push a record button. When this button is touched record mode is on. The device recording is converted, under the hood, to a script. In the script view a developer can make changes to the scenario by changing this script.

Figure 6: Specify test

The changes made in script or by recording can be viewed realtime in all two editor views. This written scenario can then be executed on all devices by exporting it as a test package. More on this in section 4.3.3.

27 4.3.3 Distribution perspective When a developer has specified a test he/she now has to publish this test to the world. The spec- ification perspective gives the option to export a test package. With a switch to the distribution perspective he can tweak the publish settings. A developer can set extra options like a specific de- vice, or set of devices, to execute the tests on. The only requirement here is that these devices have to be registered with tApp. Beside choosing a device, there is also an option to choose different sensor and connection settings of a device that should be switched on or off during the test. Model 7 refines the action "Prepare test publish" in figure4.

Figure 7: Publish test

With the push of a button a developer can then publish the scenario including the device settings. In the distribution perspective a developer can prepare everything that is necessary to start a public test run with this exported test package. When a tester wants to run a published test tApp asks to download a device package that runs the tests and sends back test results to the data collection component. A specific test can be downloaded as a test package that contains the published executable scenario. The returned data is stored and analyzed by the distribution perspective which is able to show the test result to the user. This approach mainly supports goal 1 and 4 in chapter 1.2, namely ease of result inspection and easy of

28 test execution.

4.3.4 Reporting perspective After running a scenario the reporting perspective gives an analyst an overview of the results of all (past) tests and devices they were executed on. When a test is completed an analyst should see in the blink of an eye what works good or not. Besides that it should be able to get an overview of devices on which the app does (not) work. tApp should additionally be able to show a report that can be presented to a customer or user who might have less technical knowledge. Model8 below shows how this supports the goals ease of result inspection (G1) we mentioned in chapter 1.2.

Figure 8: Show test report.

The test results that were obtained by the test runs are displayed in a simple but detailed manner. This makes sense as we defined the level of detail as one of our main goals in chapter 1.2. A tester is able to view different types of reports of the test runs. The report can be customized based on the viewers needs. This is useful to make querying (Goal 3) the mapping and controlling the level of detail (G3) defined in chapter 1.2 easier. The test viewer shows:

• Details per device: which app runs stably and which do not. • Details per app: which device runs an app stably and which do not. • Details per OS version: which version are capable to run an app in which are not.

4.4 Summary In this chapter, we proposed a design for a test tool based on the concept described in chapter3. We deducted this concept from features that existing tools, described in chapter2, offer. We defined

29 three perspectives that support specification, distribution and reporting of tests. We have taken ease of result inspection (G1), controlling level of detail and querying the mapping (G2 and G3) and ease of execution (G4) into account. With these goals and the design based on them in mind we will explain the necessary components for a successful implementation in the next chapter.

30 5 Application

In the previous chapter we proposed a design for our test tool. It devised the tool in three perspectives; namely the specification, distribution and reporting perspective. This chapter gives an overview of the decomposed functionalities that tApp offers an elaborates on what is necessary to come to an implementation based on our design from chapter4. We start this chapter by defining what a scenario or test is in tApp. When we have this definition we explain how this scenario flows from empty scenario until report. We explain how a developer can specify a report. After that we explain to distribute this specification. And finally this chapter elaborates on how a report viewer is implemented. During this process we keep the concept and goals goals from chapter3 in mind.

5.1 Specification We repeatedly mentioned the word scenario or test in the previous chapters. However the definition was kept abstract. Previously we identified a scenario as a test that has to be executed on a device type. Concretely a scenario (in tApp) is a is a Monkeytalk script, or chained list of scripts called a suite. This script contains commands that can be executed on a device type. Desirably the commands should be platform independent to support execution among multiple platforms. We mentioned this in requirement G5.1 in table2. In chapter2 we mentioned monkeytalk as one of the tools that follow an interesting approach to the test cycle. Because of the approach Gorilla logic takes with monkeytalk [10] and the fact that is open source we will use their system to implement the features of tApp we identified in the previous chapters. Monkeytalk scripts support multiple platforms. Our initial research question mentioned Android as our primary focus. However keeping multiple platform support open for the future is useful. A script can be specified by recording actions on a device [11] or writing in monkey script [12] and javascript [13]. A script or suite are stored in respectively a mt file or mts file. These are files with mt(s) extension. The mt stands for MonkeyTalk. The extra s stands for suite. These scenarios can be exported in a test package. The exported package can be executed on many devices. A test or scenario should be seen as a set of chained actions that fulfill a task in an app.

5.1.1 Test package To start a specified test as we defined in the previous section, a test package is downloaded by a device type. This is done using a device package which is explained in section 5.2.2. This test package contains all the data that is specific to an app and the test that is about to be executed. The test package contains: 1. Location to a binary. An URL to where the app binary that is used for testing purposes can be found. 2. A test suite. This suite contains the test scripts that a developer has specified. These can be executed on the binary. 3. Result location. An url that contains the location of where the results should be sent to. 4. A list with device settings. This device list contains settings that should be toggled on or of for the current scenario. The device package then runs the test projects based on the data in the test package. When the scenario is executed it collects the results and sends them back to tApp’s distribution unit. This

31 system has a data collector that stores the test results. The reporting component uses the information collected by the data collector. More details on the device package can be found in section 5.2.2.

5.2 Distribution When a developer has specified a scenario that meets his/her requirements it has to be distributed to available device types. This section explains how a specification, or actually an exported test package, can be distributed to registered devices. This device package approach makes test execution easier and improves scalability as mentioned in requirements G4.2 and G5.3 in table2.

5.2.1 Test execution As mentioned earlier tApp offers functionality to specify and distribute a test scenario for a mobile app. This test can be executed given a binary app of some platform. For the actual execution of the test scenarios there is instrumentation available for monitoring the recorded data. Afterwards the result data is stored in tApp’s distribution system. Before starting to think about how we can look at reports we first need to define how monkeytalk can test for us. To export a test a developer needs to provide a prepared binary. This binary is downloaded and started by the device package. The device package then executes the test in a background service provided by Android.

5.2.2 Device package For the connection between tApp and a device we need to define a component that is able to register the current device type, download and start an app, run a test on the device and send back test results. The device package is a communication component between a device and the tApp system. This is typically and app that runs on a device type. Starting the package initiates an initial registration of the device for storage in our database. The device package can browse through published tests, start a test, collect device data and send back results. A device package contains: 1. a component that collects device info. To be able to predict behavior and divide mobile devices in equivalence classes we need information about hard- and software. This can be obtained through the android.os.Build [14] component.

2. a component that can change device settings. Think about (re)running a test with different sensors (dis)engaged. Think about turning sensors and network connections on or off. 3. a component that collects test results. This component stores test results in such a way that they can be send back to tApp for analysis. With this information a test report is created.

4. a component that can run a scenario. A server like component that can send requests with commands that initiate the automated test from the binary. The device package shows all public test runs. Specific test settings and properties are downloaded in a test package when a tester starts a test. A tester can choose to participate in a test. A message is shown to a user which can then (dis)allow participation in a test. This approach helps us simplify requirement G4.2 and requirement G5.3 from table2 on easy of test execution with as few actions possible.

32 5.2.3 Collected device information When a tester creates an account he/she gets the possibility to connect the current device type to their account. This step analyzes the device type’s information. This analysis of a device only needs to happen once. When a test package is downloaded the device package only needs to check if the current device is the one that the user connected to his/her account. If it is an unknown device, tApp need to register another device and analyze its information. The distribution system processes the registration data and stores it. The report viewer can then generate test reports based on the users desires using the collected data from different devices. tApp will try and collect the following info if available:

1. Hardware

(a) CPU type and speed. (b) Memory type and amount. () GPU type and speed. (d) Screen dimension and resolution.

2. Software

(a) OS information. Think of version, amount of memory, CPU power. (b) Fingerprint that identifies an OS build. Could be used for equivalence division purposes. (c) Device manufacturer and type.

All this information is used to generate a more detailed test result later on. This approach simplifies requirement G2.2 from table2 about finding relevant information of device types on the market.

5.3 Reporting When a developer has specified a test and a tester has provided a prepared binary everything is ready for execution of the test. The device package gives the tester control when execution of a test. The collected data from these tests can be examined in the reporting perspective. We first start by defining why an analyst wants to view a report in the report viewer. And besides that how an analyst can query the data tApp obtained from executed tests. We first define what the main goal of tApp is and how we can see this in a report. Then we elaborate on what the device package measures and how. Finally we define some boundaries of what types of reports tApp is able to show.

5.3.1 Stable behavior What a developer ultimately wants is stable behavior of his/her app. This means that the app runs like the developer specified it in his/her requirements. A tester should be able to perform a tasks and actions that a developer implemented without the app crashing or slowing down. The user should not be obstructed by the app in his/her ability to finish a desired task. An app is stable if it succeeds in the majority of the specified tests in tApp. What this majority is needs to be assessed by the analyst. Because of this the reporting perspective must show reports that are easy to read.

33 5.3.2 Verify expected values of components A test in monkeytalk is execution of a set of commands [15] on a device. When these are successful a test is considered successful. A command can be a physical action like pushing a button. It can however also be the verification of a certain piece of data in the app. Monkeytalk gives testers the ability to verify expected values. When a test command is issued two types of verification occur: • Does a component exist in the view of an app. • Do certain expected values of these components match the ones that appear in the app.

For the verification of expected elements monkeytalk offers different verification commands to test these values. These commands offer support for overviews of flaws (G1.2/G2.4) that exist as men- tioned in table2. 1. Verify - Verify that the component’s value is equal to the argument.

2. VerifyNot - Verify that the component’s value is NOT equal to the argument. 3. VerifyRegex - Verify that the component’s value matches the Regular Expression provided in the argument. 4. VerifyNotRegex - Verify that the component’s value does NOT match the Regular Expression provided in the argument.

5. VerifyWildcard - Verify that the component’s value matches some wildcard expression provided in the argument. 6. VerifyNotWildcard - Verify that the component’s value does NOT match some wildcard ex- pression provided in the argument.

5.3.3 Execute native code Monkeytalk offers more support in inspecting behavior on a specific device type as requirement G3.1 in table2 dictates. For reaching the deeper, harder to test, areas of an app monkeytalk offers the execution of native code. A developer can add an action in his/her app to read values in user settings and preferences. An Android app can call certain functions on a custom class. The only requirement there is that the class is available in the app under test. With this custom code one could think of verifying values in storage, settings and everything the native platform offers. This gives more insight in flaws of an app that cover more parts of an app then visually available.

5.3.4 Boundaries So to conclude we will summarize the boundaries that tApp has when testing apps. tApp has different levels of test reporting:

• Minor detail results: Runs OK/Crash. • General detail results: Verify expected values of views. • High detail results: Test expected values not visually present on the screen.

34 5.4 Equivalence class A tester can register his/her device and participate in a test. In this test tApp collects some device info to connect a test run to a device for reporting purposes. With this info we can create a database with test runs on different devices. In addition a developer/tester should be able to request a prediction based on a device that might not be in the test history of an app. To achieve this querying the information that tApp offers needs to be simple and testable. We cover goal G3 from chapter 1.2 and more specifically requirement G3.1 and G3.2 from table2 with this approach. To say something sensible about behavior on the large amount of different devices we want to divide them in different classes. We call them equivalence classes, and construct them in such a way so that device types within a class are seen as similar, or equivalent, from a testing perspective. This division in classes should be easy to use and provide a level of detail that makes it obvious what is successful (or not) in a test. A possible compromise to think about is that the possible increase in complexity. Since this could conflict with requirement G3.1 which states that tApp should be simple to use. tApp will try to construct these classes based on the test results that it receives. But we first need an accurate definition of the equivalence class and more specifically when devices are considered equivalent to create an accurate devision. Equivalence is based on device type info collected as explained in chapter 5.2.3. Some of the device traits might have bigger impact on the behavior of an app then others. An example is that some device types do not have a GPU (Graphical Processing Unit) at all. Also screen dimensions and resolution might not have that big an effect on the behavior of an app. Two devices are considered equivalent when they have: • Looking at hardware

– When looking at the CPU ∗ same clock speed. ∗ same number of processor cores – When looking at memory: ∗ about the same amount of memory. ∗ about the same memory speed. – When looking at GPU: ∗ about the same amount of video memory (if any). – When looking at screen: ∗ same screen dimensions. ∗ same resolution.

• Looking at software

– When looking at the Operating System: ∗ Exactly the same version. (e.g. version 2.2 or 4.0.4). – When looking at device fingerprint: ∗ Exactly the same fingerprint. – When looking to Manufacture and type:

35 ∗ Exactly the same.

It seems impossible to have a perfect match. Classes equivalence should not be always based on exact value identity. In some cases, what we want is that values (within the same class) are equal within a range. Besides that the points above should be prioritized. The equivalence function of devices and apps could also be made customizable. Equivalence could then be customized by the tester based on specified test result parameters. Think of the following two functions:

Eqdevice : Device×Device → {true, false}

Eqapp : App×App → {true, false}

A Device and/or an App in these functions are sets of all possible applications and devices in existence. The table below shows the priorities that might be important based on a tester’s wishes. When the system is build and practice appears to be different then these priorities might change. These priorities are general priori.

Hardware trait information priority CPU Medium speed High amount of cores Low

Memory Medium amount High speed Low

GPU Low amount of memory Low

Screen Medium dimensions Medium resolution Medium

Software trait information priority OS version info High major version High is it customized. High

OS build fingerprint Medium

Manufacturer and type High

Table 3: Priority of device traits.

36 5.4.1 Predict behavior When we have constructed equivalence classes we can use them to aid testers in specifying and visualizing reports. Based on collected hard- and software information and past test runs tApp could give a prediction based on the info that the device offers. This prediction should contain a clear message that it is a prediction and not a 100% certainty. The question is how and with what will predicting behavior help. The actual test result could then be used to verify whether the test shows the same behavior on a device or not. If this is not the case than strange behavior on a device can be noted. We can then start to identify "strange" devices that do not behave the same as devices with similar hard- and software specs. This would really help analysts with ease of result inspection as mentioned in table 2 as goal G1. Furthermore the past test results can be used to help with test presets that worked in the past. Think of retry delays on a known device. tApp would be able to improve the ease of test construction mentioned in table2 as requirement G4.1.

5.5 Summary In this chapter, we identified what a scenario or test is in tApp. How a developer can specify a test and how it is exported is explained in section 5.1. We explained how a specification is executed by a tester in chapter 5.2. Finally the reports that an analyst can generate and view is explained in chapter 5.3. The next chapter shows how the implemented parts of tApp can be used and practices and illustrates this with examples.

37 6 Use cases

Based on the design suggested in chapter4 we implemented tApp. We tried to cover al the require- ments we designed and elaborated on in chapter5. This chapter first describes general usage of tApp based on ideas from the previous chapters. After a brief introduction it explains detailed cases to show which requirements we covered from our initial design. And even more interesting what proved to be difficult to implement. For these cases we tried to test a set of existing apps from Peperzaken. The apps we used are: a demo app provided by monkeytalk (the test framework used), an app for Fitness organization Fit for Free and an app for the air-conditioning department of LG Electronics. The Fit for Free app is an app that provides all kinds of fitness facilities for its users. Think of viewing training schedules, fitness news, fitness device information how to use it. The LG Klimaat app is an app for air-conditioning mechanics. They can browse for manuals and look for error from air-conditioning devices. As an added feature the app helps mechanics with how to fix them through the app. More information on the Fit for free [16] and LG Klimaat [17] can be found in secton 6.5. The design proposed in chapter4 and decomposed in chapter5 is implemented as a web applica- tion, or (web) hub. This web hub acts as a general starting point for the three subsystems defined in chapter3. This helps us support the ease of test execution we defined a as a goal in chapter 1.2. The hub gives users the ability to specify, export and distribute tests to the devices that are registered in the system and view reports of executed tests. It is furthermore possible to query and view filtered reports based on a tester his/her wishes. More details on reporting can be found in in section 6.4. This also explains how it covers ease of test inspection (G1) , level of detail (G2) and queries (G3), defined in chapter 1.2. Communication from a device to the web hub is executed through an Application Programming Interface (API) provided by tApp.

6.1 General usage We will now explain and illustrate the main usage of tApp. Details on all the aspects of the workflow are explained in the remainder of Chapter6 from section 6.2 until section 6.4. We will go into more detail with illustrating examples in chapter 6.5 after explaining how the most important components work.

6.1.1 Dashboard The very first view where a developer or tester starts is the dashboard (shown in figure9). This main page, or dashboard, is the central point of tApp. It shows the projects that the current user created and details about existing tests.

38 Figure 9: The dashboard

The dashboard shows all the available projects and a description. Below that a tester can see a summary of recent execution reports that are available in these existing projects. The top right is always the location where the navigation to the perspectives and functionality that tApp offers can be found. tApp currently shows three options in this summary: Browse devices (explained in section 6.3), Browse reports ( explained in section 6.4) and a green button for creating a new project (explained in section 6.1.2).

6.1.2 From nothing to test report The general workflow as described in chapter3 is illustrated through a demo app provided by the Monkeytalk framework. We will globally show some details of each perspective and component in the following sections. A developer starts by creating a project. For this to work he/she has to enter a name for the project; the description and icon image are optional. tApp creates the project and redirects a user to a summary page for this project. The summary shows some high level details as the amount of scenarios and reports that exist for this project. It furthermore shows more details on which scenarios exist and have been exported, or still need to be exported. This gives a developer the overview of which scenarios are available on connected devices and which need some attention. We want to achieve ease of test specification and execution (G4.1) mentioned in table2. It also summarizes details on the recent test executions for this project and a means to navigate to the recent report. Both steps are illustrated in figures 10 and 11 below.

39 Figure 10: Project creation overview.

Figure 11: Project summary

A developer can now choose between the three perspectives that tApp offers. The navigation menu, in the top right of the hub, shows the buttons that lead to the specification, distribution and reporting perspectives. Next to that there is a button that navigates back to the active project summary. To show that the navigation is always visible it has been highlighted in figure 11 and with a green area.

40 Figure 12: The specification perspective.

A tester start in the specification perspective. Here he can create a new scenario. Recall from chapter 5.1 a scenario is a script (or suite) that is executable on a device. There is no device/platform specific knowledge necessary here if a developer wants to specify a scenario. A specified script can be executed on all supported platforms. The perspective is shown in figure 12. To the left there is a navigation menu with two sections. One for creating a new scenario and one for listing the existing ones. In the upper section a developer can create a new scenario. This asks a developer to enter a name and a type (script or suite). This stores an empty scenario in the system for a developer to edit. The upload button lets the developer upload an mt script created with the Monkeytalk IDE [18]. This asks for a name and a .mt or .mts file. tApp than respectively creates a script or suite based on the extension of the uploaded file. More on how the export works in section 6.2.3 The right hand side shows a basic editor for the scripts and/or suites. Above the actual text editor there are couple of buttons. With the save button the system stores the current script with the active scenario. The export button navigates to the distribution perspective, more on this later. The archive button is an entirely different case. When a scenario is executed on a device a report is stored with a reference to the current scenario. This button simplifies goals G2 from chapter 1.2, and more concretely requirement G2.5 from table2. When a developer clicks the archive button all the reports are given an archive timestamp. To keep the report relevant the archived reports get a copy of the current scenario. When the active scenario is changed the old reports still know the contents of the scenario when it was executed. This helps an analyst to see flaws, and (how they are fixed) in the new versions of the developed app under test. Recall we specified ’overview of flaws’ in requirement G1.2/G2.4 in table2. When a script is specified to developer’s desires then it needs to be exported. This can be done in the distribution perspective. This perspectives shows all the scenarios (or for a selected project) and gives an option, per platform, to add a prepared binary/executable to the scenario. The scenario becomes visible in the device package when a scenario is added. This makes test execution as mentioned in table2 as requirement G4.2 easier. When a device package runs a test scenario it sends back a report to the hub. The hub then gives the ability to view the stand alone reports as shown in figure 13.

41 Figure 13: The reporting perspective

The left hand side shows all the scenarios for this project. After clicking on a scenario it unfolds and shows two options: active and archived reports. In the first options all the reports for the current scenario are show. The second option shows a list of archives. Clicking one opens its details. A report is connected to a scenario and a binary. When one of those changes the report is less useful. For this the archiving function is implemented. All the "old" reports are given an archive date and are connected to a copy of the current scenario. The active scenario is editable and all reports with this new scenario are stored as the active reports. With this separation old reports remain useful. The list of archives contains a list of reports that in turn open the details for those reports. A feature for the future would be a view that presents what changed along a certain range of archives. This covers the goal about ease of result inspection in chapter 1.2. For reason of lack of time this feature was left out in the initial version. The right hand side shows the report details. These are split up in a summary and results per script command. The summary shows the scenario, a device it was executed on and a execution timestamp. Besides that it shows the number or commands and results for this command. The used result status returned by the app are error, failure and skipped. These are shown in color so that an analyst immediately sees if a report succeeded or not. The ease of report inspection is greatly increased because of this (G1.1 and G1.2/G2.4 in table2). It now becomes apparent what the status of the report is without reading the actual data of the report. The three status codes are: Error is the status of an error in the script syntax instead of an execution failure. Failure is the status of an issues with the execution of a command. Skipped is a status of a command that is not executed, mostly comments.

6.2 Device package To connect a device to the system and communicate with the device types we need to implement a device package explained in chapter 5.2.2. This is basically an app for the platform that we want to connect. Currently we only support Android. But there is no restriction in supporting other platform in the future. A device owner logs in with the system. The android app currently has three views (illustrated in figure 14) that are used:

42 • the project overview,

• the scenario, or project detail, view and • the scenario details view.

(a) Project overview. (b) Scenario overview (c) Scenario details

Figure 14: Device package (screenshots taken from S3).

The entire was kept simple because we decided to keep test execution simple and start it with as few steps as possible (G4.1 and G5.1 in table2). The first view shows shows an overview of the created projects from the current user. A user can tap one to go to the project detail view. This shows the exported scenarios for the selected project. The developer can choose a scenario he wants to run. The next view is the final view before the actual execution. The scenario detail view shows the name of the scenario, the script export that it has to download and run and finally the binary to use for the execution. This view has two buttons. The install button installs the binary listed in de detail view. When the current binary is not installed the run button is inactive. It is activated when a user has successfully installed the binary. When this is done the run button starts the test execution. More on the execution process can be found in section 6.2.4.

6.2.1 Device data We argued before that based on device traits, hardware and software, a tester might see behavior in the world that the system might have missed. This covers requirements G2.2, G3.2 from table

43 2. For this the device package obtains some device traits at registration with the system. It does this through the Android.os.Build class [14]. This gives the most important information we defined in section 5.4. It however gave us less then we expected. Simple things as a string a describing a manufacturer is different in the product lines of these manufacturers. For example two device types like the Galaxy S4 made by Samsung do not provide the same Android build information. The manufacturers name in it might be different, the amount of memory in this build information is not the total amount, but the amount that is currently free to use for the owner of a device type. This makes it very difficult to say something about equivalence classes between device types. There is no guarantee that two identical device types of the same manufacturer have the same finger- print or even type specification. Google gave everybody the freedom to fill it in their custom Android build versions as they please. Because of this the decision was made to skip the implementation of equivalence classes (section 5.4) and predicting behavior (section 5.4.1) filtering.

6.2.2 Preparing an existing project For tApp to be able to run a test we need to prepare an app for execution. We need to install a monkeytalk agent in our app we want to test. This basically comes down to what monkeytalk requires to listen and react to commands that it receives. Monkeytalk offers an explanation [19] of what steps are necessary for tests to work. This preparation is necessary to give an app the ability to respond to commands sent from an external source and report its execution status after execution. The agent starts a simple web server at localhost on port 16862. When we run the device package on the same device as the app we want to test we can reach the agent at its local web server. A developer and tester can reach the device connected to a specific network. When this network would cover the entire planet we have the ideal situation in which a developer and tester can execute tests on all device that are available on the market. This covers requirements G4.1 and 5.1 from table2.

6.2.3 Script/Suite exports Monkeytalk normally directly communicates with the agent instead of tApp. The developers at monkeytalk have documented a protocol [20] to communicate with an agent. It sends commands from the app and receives reports back. tApp makes smart use of this by using a ’man in the middle’ approach. The sequence diagrams in figure 16 show the execution of a test in tApp. Figure 15 shows the flow in the original version of monkeytalk. These sequence diagrams show how tApp covers two of our goals from chapter 1.2. The registration steps gives us insight on what kind of device type an app runs. They give us details on what kind of processing power the soft- and hardware offer. Furthermore we have more control over the execution process. In the original version of monkeytalk an IDE starts a test on a device type (or app in figure 15) that is currently connected. The device package in tApp is capable to request a test package. The distribution then returns all the data necessary to download the app under test and the commands to execute. When finished the device package sends the test report back to the distribution. With this a device type can be anywhere with respect to the distribution. There is no limitation in a wired connection with the workstation the system runs on. These two notions cover requirements G3.1 with respect to the registration. The test execution flow of the device package in tApp covers requirements G5.1, G5.3. These requirements are specified in table2.

44 Figure 15: Test execution in monkeytalk

Figure 16: Test execution in tApp

We catch the commands a runner wants to send and send them to the agent ourselves. We then intercept the reports the agent wants to send back. And send them back ourselves again. This improves the ability to customize conditions under which a test can be executed. Taking this ’man in the middle’ approach covers requirement G2.1 from table2. When a scenario is saved tApp also creates a zip file containing the commands that need to be send to the app under test. These commands are extracted by parsing a script/suite file and exporting each line into a json file. For this we needed to implement a small change in the monkeytalk runner [21]. The runner is a simple java client that runs a test given certain parameters. The change we

45 made was when we specify an export directory in the runner’s parameters it would not send the commands to a specified agent location but export it to this export directory in json formatted files containing one command per file. tApp then creates a zip file with these json commands. When a device package downloads this zip file with commands we can send them to the agent, get a status of the command execution and return it to our system.

6.2.4 Device package internals The device package uses a couple of simple steps to communicate with the hub and execute a test . The device package gives a tester the option to install a binary. When a tester taps the run button in a scenario detail view the process starts. First a test package in the form of a zip file is downloaded. This contains an exported scenario. It contains a set of files that represent the specified .mt(s) script/suite in a format that the monkeytalk agent understand. The contents of the zip file are extracted to the storage of the device it runs on. The command files have a filename with the index of the command in the script and the script name in it to guarantee the correct order of execution. The device package sorts this command list based on the index that are in the first numbers in the filename. The device package now starts a background service [22]. This background service starts the installed app. It then sends the extracted json commands to the agent (from the background) and awaits a status that the agent returns. When all commands are executed it creates an array of all the status messages it got back from the agent. The test service then sends the status array back to the tApp hub. The entire flow is illustrated in figure 16. Communications with the hub are implemented using an API.

6.3 Device browser The device browser shows all the devices that are registered with the system. The left hand side presents a small summary of the registered device. When click one the system shows detailed hard- and software results. In the future it could support some sort of remote test control. Think of this as a request to a device to start the execution of a certain scenario. The browser (shown in figure 17) gives an analyst some extra insight in similar device registered with tApp. This could increase the awareness of them an improve ease of test inspection (G2.3 and G1.1) and ability to see app behavior among a group of device types (G3.1). These requirements are explained in table2.

46 Figure 17: Device browser

6.4 Report browser In the report perspectives a tester can see a scenario execution report on a single device. It useful however to see a combination of devices and apps or projects. The report browser lets a tester browse the reports by selecting one device/project in combination with multiple projects/devices. This view is meant as a view that presents how a device runs all selected projects or how a project runs on a set of selected devices. This cover requirement G3.1 and G3.2 from table2. A developer can select on or multiple device(s) and project(s). This can, respectively, be achieved by clicking the blocks in the first and second sections of figure 18. The first to sections list all the devices and projects in the system. The only limitation with the selection is that a tester can compare only one item with a set. That means one project with a device set, or one device with a project set. Below the selection sections there is a summary section of the selection. The blue button with ’generate report’ opens the report based on the selection of an analyst. This supports the ability to see behavior of an app with respect to other device(s) and or project(s) mentioned in table2 as G3.1 and G3.2.

47 Figure 18: Select devices/projects

The browser then lists all scenarios and reports (in the top and left sections of figure 19) that were used to generate the overview. Besides that it presents the details of the test reports in three graphs (on the right hand side of figure 19). Above the graphs tApp shows the amount of reports that where found matching a developers selection. It furthermore shows how much succeeded and how much failed. Besides that it also shows the commands that where executed in all the reports matching the selection. Again the status of these commands are shown as: ok, failed or error.

48 Figure 19: Report browser overview

The first graph (a) in figure 20 shows the success rate of a project. It shows the executions of all scenarios of single project on a set of devices, or executions of a set of projects on a single device. The x axis shows the set of device type or projects. The y axis show the amount of executions on that device type or project. Each item on the x axis is split up into two components: successful and failed execution. The green bar shows successful execution, the red bar is a failed execution. This gives an overview of the general behavior of an app on many devices or many apps on a single device. When there is a large red bar and a small green one for a project or deice type it makes sense to investigate what causes failures in these cases. The second graph (b) shows a so called radar graph. The red line shows how a single selected device or project behaves with a set of device types or projects. It shows the device types of projects that show errors. The items around the radar are the selected set of device types or projects. The closer the red line moves toward a device the more error prone that device type or project is. That means the more failed executions a test has shown using the selected projects or device. The third graph (c) shows a distribution of the errors among the reports that exists for the selected combination of project, device types or device type, projects. It shows how many error occurred with these scenarios with respect to the total amount of executions. This would mean that when a single project or device type is present as a large chunk it the pie chart it is responsible for many failed executions. This helps a analyst with looking at certain scenarios or device types that fail a lot and point developers to flaws in their apps.

49 (a) Report success rate of item (device/project). (b) Error prone item (device/project).

(c) Error distribution over reports

Figure 20: Graphs in report browser

6.5 More complex examples This section will explain and elaborate on how we have to use tApp to get from nothing to a set of reports using tApp. This is illustrated using a couple of detailed examples It also describes all problems encountered and other findings. We have sketched a basic overview of what we can expect of the system. Now it is time to demonstrate this in more detail using the apps mentioned in the introduction of chapter6. The next section (6.5.1) explains usage of the system in more detail using three more complex scripts and suites. These cases will show how we get from the will to test an app to an actual report and show how tApp works in practice.

50 6.5.1 Experiment For the detailed examples we defined three cases in the introduction of chapter6. The three cases are tested on four devices. The monkeytalk case is merely a demo that the Android client connected to tApp works. The Fit for Free and LG Klimaat cases are more complex examples using existing apps from Peperzaken. It is worth mentioning that nobody considered using monkeytalk for testing during the development of the apps. So it was not taken into account that the app should be optimized for usage with Monkeytalk.

Figure 21: The devices used in this test.

The devices used in these three cases are summed up below. Each of these itemized elements has the registered name and android version behind them between parenthesis. The reason for using three samsung devices because practice shows that Samsung is on 15 of the top 20 devices that are measured to use the Peperzaken apps. The devices used (in figure 21 from left to right) are: • HTC (Nexus One, Android 2.3.6) • Samsung Galaxy Ace (Gt-S5830, Android 2.3.5)

(Gt-9505, Android 4.2.2) • Samsung Galaxy Tab 2 (Gt-P3110, Android 4.1.1)

6.5.2 Explanation of cases: ’Monkeytalk demo’, ’Fit For Free’ and ’LG Klimaat’ For the Monkeytalk demo case we first login and verify that we get a ’hello, ’ message and then logout. We will then go to the forms tab and push the elements there. In the hierarchy tab we will scroll to Zirconium and tap it and verify it is ZR and element 40. We finally get back to the login screen. This case is merely to show that tApp is able to communicate and run scenarios on a device. The very first scenario here contained usage of the Scroller component in Monkeytalk.

51 However there is a bug in this component at the moment of writing this, so this part about Zirconium was taken out. Secondly the Fit for Free scenario we have defined a simple scenario. A user logs in to the app. If this succeeds a verification screen is shown with more data tied to the logged in account. These are the pass number of the user, his email address and mobile phone number. When this is done we reach the dashboard. On the dashboard the user taps the profile button. In here the user verifies his membership data again. He now taps back to go back to the dashboard. When this is done he/she logs out. Finally in the LG Klimaat we will use two scenarios. A user first registers with his client number. He then has to provide some extra subscription data. Finally he logs out. This first scenario shows how the login process works. With the second scenario we use the pre-condition that a user is logged in and registered. A user now wants to know something about error code 155 and 115. He first wonders how he can find an error code and presses “Handleidingen” or “Manuals” in english. He now verifies that he indeed reached the manuals page he chose. He taps "Open parameterlijst" but sees nothing that helps him and taps back. Now he/she enters the error codes he found in the text box and taps “Oplossen” or “Solve” in english. He enters both the error codes and reads about the data that he gets. He now pushes back and closes the application.

6.5.3 Monkeytalk demo The scenario for the Monkeytalk demo described in the previous section (6.5.1) is a six step scenario: • Login using username and password • Verify ’hello, ’ and Logout • Open form tab and interact with some of the elements

• Back to login tab

52 (a) Login view (b) Forms view

Figure 22: MT App example

Specification For the first step we need to translate this to a scenario that tApp understands. This will be a suite that runs the four scripts described above. These script will be called: Login, VerifyWelcomeAndLogout, Forms and BackToLogin. The suite that will run all these will be called doMT. We first specify all these scripts and one suite to run them all in the Monkeytalk language. This leads to the scripts and suite in appendix B.1. We notice the first difficulty with tApp. We have not developed this app ourselves. We therefore do not always immediately know which component in a view is shown on screen. The forms page in part b of figure 22 shows a set of radio buttons. In monkeytalk this can be any number of things. The documentation [?] mentions a RadioButtons com- ponent. This however is not recognized by monkeytalk at execution. When we use the ButtonSelector component type we do execute the actions we specified.

Distribution For the distribution part we only need to specify a binary (or .apk file in this case). This can be achieved by simply clicking ’Add app binary’. The system then shows a file tree in which a developer can browse the files on his device and select the appropriate binary.

53 Figure 23: Distribution perspective.

Reporting The test runs on all the four devices went well. However there were issues with just one command. Verifying the ’Hello, ’ appeared problematic in the end. This was not by accident. The app shows ’Welcome, ’ instead of ’Hello, ’. The responses we got from the four devices were different: • ’Label logout_txt VerifyWildcard value: Expected ’Hello, *’ but found ’Welcome, Karsten!’

• ’Label myTitle VerifyWildcard value: Expected ’Hello, *’ but found ’Login” • ’Label VerifyWildcard value: Expected ’Hello, *’ but found ’Logging in⦔ The last two are interesting since they show a completely different response. This can however easily be explained: when the login button is tapped it immediately verifies for the text ’Hello, ’. The login process however might not have finished yet. So the screen was still showing the login view. Or a spinner with ’Loging in’ as text like the last bullet shows. Luckily Monkeytalk offers timeout and thinktime parameters that can easily tackle this issue. We have more control over the execution process that we apparently need. figure 24 shows one of the reports executed on the ’Samsung Galaxy Tab 2’.

54 Figure 24: Report from Samsung Galaxy Tab 2.

Browse report overview Another report view that tApp offers is the overview of test executions on multiple devices. For the monkeytalk demo we can see the graphs it creates in figure 25. The first thing that we notice in figures b and c is that we have 100% failure of all the reports we executed. We already identified that we got another response on the welcome page then we expected. So that explains the failures. The overall notion here is that specifying a scenario is kind of complex. The thinktime and timeout setting necessary for a scenario to succeed differ per device. It is not that strange, but makes general scenario specification difficult. The radar graph in part (c) of figure 25 shows again that all reports yield a negative result. This graph shows this. The fourth view (d) in figure 25 shows that apparently all reports are equally error prone. This view shows the distribution of error over different scenario reports. In this example all reports failed. The pie graph shows this as 25% error in each scenario.

55 (a) Pick four devices and the monkeytalk demo project (b) Report success rate

(c) Error prone devices (d) And percentage of error in scenarios

Figure 25: Report browser with MonkeytalkDemo as project and the four devices

6.5.4 Fit for Free This scenario shows how tApp and monkeytalk handle verification of values on components in the Fit for Free app. The scenario described for Fit for Free describes a five part scenario: • Login with pasnumber and postal code

• Verify known data • Verify Profile data in profile view • Go back to dashboard • Log out

Specification This scenario shows some difficulties in the script language. If we look at the third step: verify profile data we see a problem. When we look at figure 26 we can see a “Mijn profiel” or “my profile” button. When we specify a monkeytalk command: Button “Mijn profiel” Tap. We get the message that is unable to find it. How does one tackle this issue when he/she is not the developer of the app? That is rather difficult. This can be solved by either recording from a device and let that device take care of component identification. Or list them in some sort of graphical editor.

56 (a) The Fit for Free dashboard (b) Verification issue

Figure 26: Fit for Free examples

Distribution The distribution part is easy again. Just provide the Fit for Free binary and we are ready to go.

Figure 27: Fit for free distribution perspective

Reporting Fit for Free shows us a difficulty that we have seen before. The script we specified for verifying the profile has quite some "difficult" statements in it. Look at: Label ProfileOverview- DataCellphoneTextView Verify 0612345678. We are looking for a label component with ID Pro- fileOverviewDataCellphoneTextView. It is impossible for a non developer to know this if he cannot see it somewhere when he is specifying a script or suite. This is quite a flaw in the system. It could be tackled by keeping monkeytalk in the back of your mind when developing an app. A developer could use easier IDs for the components in a view. Good to mention is that in this case no consideration was taken into account to optimize for Monkeytalk usage. Would the decision be made to do this, it would have made specification easier. It would however still need a developer to mention this to a tester without knowledge of the internals/source code of the app under test. The issue can also be found in figure 26(b).

57 Figure 28: Issue with Fit for Free

Browse report overview The Fit for free app shows us a couple of issues. Figure 29 (a) shows us that the ’Galaxy S4’ and ’Galaxy Tab 2’ are devices that give a lot of errors compared to the amount of success. The radar plot in figure 29 (b) also shows that there are two devices that have issues in running the scenarios. The fact is that the two devices with issues are faster devices when looking at hardware. The verification script states that verifying a pass number can be done with command Label Pasnummer Verify . The slower devices are still busy loading data and actually show the label with name pasnummer. The faster devices however load this data almost instantly. This leads to the failure found in figure 28. This can again be solved by using a thinktime or timeout parameter. However that would still raise the question. How do we then verify the subscription number? The text on the label is gone on faster devices. But if we wait longer, they are gone on all devices. We could choose to verify it on label * which would verify it on the first label it finds. This could work. But the text above that says welcome are probably also labels. We could specify an index with format #. But how do I know I should use the label at index #0 and not another? Issues like this could be solved by having monkeytalk optimization into account. If the IDs of components are easy and obvious. This issue would exist in minor numbers than they exist now.

58 (a) Report success rate (b) Error prone devices

(c) Distribution of error in scenarios

Figure 29: Report browser with FitForFree as project and the four devices

59 6.5.5 LG Klimaat As mentioned before the LG app is split up into a two part scenario: Login and Solve. It will also execute scenario 2 directly after it as a larger scenario. The login scenario contains two parts: • Enter client number • Provide additional data The second scenario is describes using two parts. The first scenario is actually a precondition for the second. But because of a bug in this version of the LG app we can show a case in which big errors can be illustrated. • Get manual info and opens parameter list • Solve two errors that do exists This scenario shows a reusability of scripts with the Solve example. The two parts of this scenario describes this. It reuses a scenario that opens a solution in the app and a reaction to this based on the success. The scripts respective to the item list above can be found in appendix B.3.

Specification We first specify the following five scripts for the scenarios: Login, ProvideData, GetInfo, Solve, VerifyGoodSolution. The thing to mention here is the Solve scenario (Algorithm 14 in appendix 14). It uses a variable code that is used for entering in the error code text box. The suite to run this scenario can be found in appendix B.3. What we can see here is that test Solve is reused twice. The only think is different is the reaction with a VerifyGoodSolution. The contents of the scripts can be found in appendix B.3.

Distribution The distribution is currently the easiest part of the process. Simply provide a binary and everything is ready. A close up of the added binary and how tApp presents it can be found in figure 30.

Figure 30: Distribution for LG Klimaat

Reporting We found a problem with the app that tApp would have found if used in practice. We specified that after login a user taps "manuals" and the searches for a solution. This works flawless on a Galaxy S4 as seen in figure 31a. The other device however appear to fail after the activation part. When looking at figure 31b and 31c we can see that the test run fails right after the activation part. After talking to a project manager at Peperzaken this appears to be a known issue with registration. Registration uses an email as unique ID. We tried to register three times with the same email address. That succeeded once, but also failed three times. A case in which tApp proved to identify issues when they arise.

60 (a) Successful on Galaxy S4

(b) First failure on Galaxy Tab 2

(c) Screenshots with the issue

Figure 31: LG klimaat: login issue 61 Browse report overview The report browser endorses the executions of the LG tests. The Success rate indeed shows that one device has a success scenario and the rest of the tests fail. The Error prone device clearly show a 100% failure on three of the four devices. There are three reports that are responsible for the errors that occurred with the executions in this project. The graphs can be found in figure 32 below.

(a) Report success rate (b) Error prone devices

(c) Distribution of error in scenarios

Figure 32: Report browser with LG as project and the four devices

62 7 Discussion

We illustrated tApp’s functionality with some detailed examples in the previous chapter. With that we mentioned which requirements the implementation choices cover. Recall chapter 1.2 and table2. The itemized the goals that tApp strives to achieve. When looking at the develop process of tApp we have identified some challenges and issues that we encountered along the way. However tApp also offers some interesting improvements with respect to other existing tools as described in chapter2. Some parts of the process went unbelievably well. Think about communicating with an app in the background on a device. Another rather challenging part was patching monkeytalk so that it would support the indirection I wanted between device and app. Exporting commands instead of sending them over a network approved to be challenging. In the next section we will discuss in some detail what requirements we covered with our solution. Because of time constraints and the amount of technical challenges some features that proved difficult to realize were skipped. More or this in the next sections.

7.1 Challenges The implementation of tApp currently supports the largest part of what we described in the previous chapters. tApp is capable of creating projects. Create and specify a test and export it as a test package. The device package is capable of obtaining a test package and execute it. The report are stored in the distribution. Finally the report viewers present individual reports The most difficult challenges were in the specification and reporting part of tApp. Another challenge was to get the reporting service consistent and easy to use. The ease of test inspection (or goal G1 in chapter 1.2) were difficult to implement right. Running commands from a background service appeared very challenging. Another challenge was test execution with an app. In special preparing an app is not always obvious. Goals G4 and G5 from chapter 1.2, ease of test execution and scalability, were not always straight forward.

7.1.1 Specification One thing that makes usage of tApp in a production settings rather "difficult" is the fact that a tester really need internal knowledge of how views in an app are built. The system needs to be able to identify components in an app. It worked rather out of the box when looking at chapter 6.5.1. A developer needs a programming focus that leads to easy and obvious component IDs. The initial focus of tApp was that maybe a tester with less development knowledge would be able to specify a test, this is explained in Goal G4.1 in table2. The editor however needs some more guidance than it offers know. On the other hand if a developer would use simple and predictable Component IDs scripts could be really easy to create and reusable. This could even be achieved by analyzing the view and telling a user how a view is constructed. We could argue that taking away component identification would improve tApp’s specification perspective significantly. Device recording was also a better way to specify a test without knowing the app. However the default execution recording from a device offered by monkeytalk was already rather sensitive to the network connection. Recordings did not arrive at all, or even sending a PING command to the app would not result in recording in the app. Because of this coverage of Goal G4 from chapter 1.2 is rather difficult and complex. This could greatly improved in a future version.

63 7.1.2 Validation of specification As explained in the previous section it is difficult to recognize UI conponents for a specific app. When an app evolves the test specification evolves with it. Since a tester and a developer theoretically don’t have to be one and the same person. This makes it rather challenging to keep the specification in sync with the expectation that a developer and a tester have of it. A possible improvement is a verification method for a specification with respect to an app. This might be realized by using a simulator or emulator that all of the largest platforms offer. These assisting development tools are generalizations of the device types that exist on the market. Another solution is to remove the archiving feature from tApp. A developer and tester have to create a new specification for each change in an app. This creates awareness of which scenarios have which effects with respect to a certain version of an app. The final conclusion here is that specification is hard. Having a solid specification that can handle the evolution of an app is even harder. tApp could benefit from research in this field.

7.1.3 Portability of specification The tools provided by Monkeytalk offer portability of a test specification to multiple platforms. There are some techniques on how to analyze and interact with the user interface of an app. These techniques differ per platform and are somewhat difficult to use with respect to multiple platforms. It assumes that the apps are developed somewhat the same. The recognition of UI elements is somewhat difficult. To throw in another challenge is the large diversity of device types that exists for certain platforms. These device types have different amounts of processing power, screen resolution, software support and more. What we are trying to point out here is that in theory "specify once, execute everywhere" works. However in practice this is rather difficult. The device types, platforms they run on and development of an app all benefit to the large amount of variables which might make execution of a test complex. Sure a specification created for device X works on device Y in theory. But in practice there are no guarantees. Simplifying tApp with respect to test scenario portability would greatly improve the usability of tApp. Something to consider when continuing development of tApp.

7.1.4 Reporting A discussion with which the team at Peperzaken struggle for a while is what kind of report do we need to satisfy a project manager and/or client. This discussion has been going on for a while. tApp in its current form offers some basic results that appeared to be liked here. However there are some thoughts about improvements. They rather liked to see what went good/bad and the screenshot of that accompanied erroneous results. This also covers goal G1 from chapter 1.2. With respect to this tApp stands out between the other tools mentioned in chapter2. It gave them and idea what failed and needed attention. Another welcome feature was the graphs that show behavior among the registered devices. This covers goal G4 on easy of result inspection from chapter 1.2. Test inspection in the other tools in chapter2 offer reports for an execution of a test on a single device. tApp however gives stability information on a range of (similar or different) devices. a rather welcome feature. Again recall the fragmentation issue we introduced in chapter1. It might not solve the fragmentation issue, but that is not in the scope of tApp. However it does provides useful insights in the fragmentation issue. They would have liked to see a remote trigger for execution of a scenario. They would like to ask the device package to execute a specific test that appears to give faulty behavior. This would be a

64 nice feature for the future. The graphs in the report browser support The pie plot in the report browser was less useful in practice. Instead of showing error reports distribution with respect to the total errors in a pie chart it should show scenarios. And fixed colors like shades of red or something like that would improve this graph instead random color. I kind of really agree to that. That was a strange design choice from my side. However also one that is easy to fix.

7.1.5 Beyond reporting A developer has created an app that should be tested. A tester needs to test it. Currently tApp shows which scenarios succeed and which possibly fail. Behavior over devices can be inspected using tApp. When thinking a step beyond success and failure. It would be welcome if a developer and/or a tester could observe where things go wrong in the application’s logic. This could be achieved by presenting a stack trace of the execution of the app. How to obtain this data is not that relevant. What is relevant is the fact that presenting context with respect to the failure is considerably easy. In a future release of tApp it would be nice to do some research on how to present/visualize app logic related context with respect to a certain scenario. This would greatly improve the usability of the reports that tApp offers.

7.1.6 Running from background This was a really challenging problem when I first thought of it. A is guarded by several permission when installing apps or accessing content of a phone. So doing network requests in the background to an app must be impossible due to some difficult permission that does not allow this. That was the fear I had. But hat was the biggest ungrounded fear ever. It worked flawless on the first try. Never had any issues with it again. The only ones I had was because I was careless with my background service in Android.

7.1.7 Test execution and app preparation Monkeytalk offers support for the largest operating systems on the market. tApp uses this feature to cover requirement G5.2. For tApp to support a platform the only thing necessary is a device package for that platform that can communicate with the agent in the app under test. However to make it possible for tApp to communicate with an app it needs special preparation. This is rather easy and well explained [19]. For the cases we described in the previous chapter it worked rather easy. The initial idea was to use two apps from Peperzaken that are used frequently. The tablet app for local news stations however could not be used in tApp. When we look at the app in figure 33 we can see a screen that contains three subviews. Under water they are implemented as three screens in a window so to speak. When we install the agent in the app an try to run a scenario we get the exception that it cannot locate the root view. This is however an implementation choice. This is one of the issues with monkeytalk in particular. It is out of scope to try and tackle this issue now. It is however quite a major one.

65 Figure 33: RTV Noord HD

7.1.8 Settings Because of time constraints and difficulties with identifying the work load was reason for me to drop this feature. Settings of device sensors are not that easy to change the major ones like Wifi, 3G, GPS are. But others are more difficult to manage. There are solutions to tackle this issue. A possible solution is to let the device package provide a list of required settings and refuse to run until a tester has physically changed them all in a settings screen. The iOS platform by apple is even more restrictive in automatically changing settings. It just won’t let us switch sensors on or off without the user’s consent. The overall conclusion was to skip the feature even if it would be really nice to have.

7.1.9 Record from device. As mentioned in section 7.1.1 support for specification across multiple platforms is difficult. Recording gestures from a device provides improved coverage of requirement G4.1 in table2. Monkeytalk offers an Integrated Development Environment (IDE) that has the possibility to record commands from a device type. This can be done through a network connection. In the end this is however quite sensitive to this connection. The logs we inspect on a device type tell us that the record command reaches it. However we never get a record status or even a command that the device type recorded. The command line client might offer a better solution to this issue. We have however undertaken no attempt to investigate how it works or even if it offers more stable functionality over the IDE. The IDE probably uses the command line client under the hood. Because of this the feature was moved to a future version due to bad performance and results it offers. We sacrifice some quality in the coverage of requirement G4.1 from table2.

66 7.1.10 Equivalence classes and predict behavior. Chapter 5.2.3 and 5.4 shows us that we needed to store some device specific information. In chapter 5.2.3 we argued that soft- and hardware information is necessary to identify a device type and identify equivalence between device types. This however appeared to be a very difficult thing to achieve. The Android build API [14] offers data about device traits. However the only thing we really can stably get are the platform and version of the device type, so Android 2.3.3. But learning wether or not customized skins by HTC, Samsung or other manufacturer is installed is not easily retrievable. Android simply doesn’t know if a manufacturer does not provide this. Which they are not obliged to provide by Google. The hardware traits are equally difficult. When looking at the processor we can get the amount of cores. But info about processor speed is again restrictive from Android’s perspective. Video memory is not even present in most devices on the market. Screen information like resolution is possible to get. But the density that android uses is not really a decisive factor for resolution and saying something sensible about this screen. Available memory in a device is also quite subjective. The system can get the amount of memory from a device, but this is the available memory at that time. It is not the total amount of available physical memory. The combination of a limited amount of relevant test devices lead to the decision to make this a future improvement. It was supposed to be useful and important feature. But implementing a solution of sufficient quality would take a lot of time. Besides that it is not obvious that we can retrieve all the information a tester needs to say something sensible about equivalence (chapter 5.4) and behavior prediction (chapter 5.4.1). The conclusion is that Google gave manufacturers to much freedom in specifying platform information. This lead to a compromise in the coverage of requirement G4.1 from table2.

7.2 Improvements When looking at the cases explained in chapter6 we must conclude that tApp works rather well. There are some things to work out before it would be useful in a production environment. But with respect to the tools we examined in chapter2 tApp offers some extra quite useful features in practice. In this section we will briefly discuss the goals we set in chapter 1.2 for tApp to achieve. In each of the subsections below we will elaborate on how tApp covers the requirements in table2 obtained from these goals.

7.2.1 Ease of test result inspection (G1) An improved feature over other tools we examined is the ability to view reports across device. Many of the tools currently on the market only present a report of a test execution on a device type. the report browser in tApp is able to present reports on a user provided set of device(s) and project(s) as explained in chapter 5.3. Because of this feature tApp offers good coverage of requirements G1.1 and G1.2 in table2.

7.2.2 Level of detail (G2) and Queries (G3) Section 7.1.8 mentioned that changing device type settings was moved to a future version. This means tApp does not cover requirement G2.1 from table2. In term however there is excellent support for requirement G2.2. The report browser gives a tester or analyst a simple report of behavior on a selected et of device(s) and project(s). Based on the graphs in the report browser and the presentation of a report on a single device it is quite possible to see whether (or not) an app is stable enough to

67 be released to the public. We mentioned this in requirement G2.3 in table2. Support for viewing differences between different versions (as explained in chapter 4.1.1) of tApp is not implemented. This is something for a future version. The report browser described in chapter 6.4 and 6.5 could use some improvements. It offers decent support for requirement G3.1. The first The third graph shows the reports that cause errors. They could however benefit from some sort of navigation to the actual report to see what they encompass. The scenario and report list could benefit from navigation.

7.2.3 Ease of test execution (G4) Programming an app needs consideration to optimize views for monkeytalk. Specification is some- times difficult when someone does not know how the internal structure of a view is specified. Guessing component types is difficult, if not impossible. Distribution is easy. It would however be nice to connect existing binaries to multiple scenarios instead of constantly a new one.

7.2.4 Scalability (G5) Monkeytalk offers scripting support for execution across multiple platforms. tApp uses this feature to make scripting support between platforms possible. The only requirement is a device package for the platform under test. With respect to the original monkeytalk that Gorilla Logic offers it is a little bit more work because we have to implement the device packages for communication with tApp. Connecting to an IDE through a network connection is less complex. But in the end we are limited to the range of network connection. tApp is however not limited by this network range. So requirement G5.2 is somewhat more complex to cover. But distributing a test among registered devices is rather easy. This covers requirements G5.1 and G5.3 from table2.

7.3 Summary In this discussion we anylzed which of the goals from chapter 1.2 and which requirements from table 2 we covered. We based this discussion on the design in chapter4 and implementation choices from 5. We finally illustrated how tApp should be used in chapter6. In the next chapter we will conclude how tApp behaves with respect to the existing tools we examined in chapter2.

68 8 Conclusion

In the previous chapter we discussed about the design of tApp as tool for testing mobile apps. We discussed the pros and cons of tApp as a tool with respect to existing tools (explained in chapter2) In chapter5 we implemented a proposed design from chapter4 and showed how to use it in chapter 6. Looking back at the development and research process necessary for tApp as it is know results in mixed feelings. It was a lot more challenging than anticipated beforehand. Some parts of the implementation went really smooth. Some things proved more difficult. It requires many behavioral changes in developing and app and specifying tests for it. It is strange to have to think about testing when developing an app. If a developer doesn’t testing becomes very difficult across multiple platforms. Testing should always be in the back of the mind of a developer, but changing development behavior seems awkward when thinking of it. One could argue it is necessary, but it is also quite radical if it is not a "standard" part of a company’s business process.

8.1 Specification The examples illustrated in chapter 6.5.1 (in particular Fit for Free and LG Klimaat) about specify- ing an app show that one of the initial ideas that less technical people can specify a scenario might be more difficult to implement in a development process. They can write scenarios on paper. But making them work with tApp in the monkeytalk language is another issue. When considering imple- mentation across multiple platforms and the communication between development teams for those platform it becomes painfully obvious this is challenging to implement. That would mean that we have to simplify view component identifiers but, more importantly, also synchronize them between development teams for different platforms. This requires quite a behavioral change that, if we could introduce it in a company, might take time. But tests with the tools provided by Monkeytalk already show questionable behavior. With respect to the other tools examined in chapter2 tApp offers quite a simple language to specify scripts. It is an easier language than for example monkeyrunner, uiautomator and robotium. They need programming experience to implement. tApp makes excellent use of these scripting features borrowed from Monkeytalk. When looking at specification tApp is quite easy to use with respect to other tools. Telerik however offers recording from a device. Since monkeytalk offers this too, it is quite possible to imagine recording support for tApp in the future.

8.2 Distribution A very positive feature that is one of the most important goals for tApp is ease of distribution of tests. It is really easy to publish a test to a device and run it and get a report. I had anticipated that this would become a very difficult issue. This however worked in a flash. What proved challenging to get right was the actual execution of a test in that background. How do we send the commands and how do we store the response we get.

8.2.1 Settings Investigation of the Android platform showed me that identifying what sensors a device supports are difficult to get in software. Internet connection, GPS and such are rather mainstream. But changing all of them is rather restrictive. It would be a really nice to have. But a description with a scenario telling the user to switch a sensor on or of could also give a developer the customizability of the

69 device he runs on. iOS however won’t let a device do this without a user’s consent. I decided to skip this feature because of the possible difficulties to fix this. It’s a nice to have for fixing and identifying concrete issues with these sensors and features that depend on it. But it was not an essential feature in my opinion.

8.3 Reporting tApp is really strong in the reporting field. Other existing tools offer reports of an execution of a test on a single device. These reports range in quality from a list with checkmarks for success and failure to a list of commands with screenshots of the device type on failure. However no tool actually shows behavior among a set of device type(s) and app(s). tApp offers this feature in some detail. It could benefit from some improvements in comparing tests among similar device type(s) and/or project(s).

8.4 Summary Recall from chapter3 that we introduced a refined research question:

How can we provide insight into the test results of a set of apps on a set of device types? We proposed tApp as an answer to this question. Specification is general enough to support multi platform execution. However this is somewhat difficult with respect to UI component registration. This could be simplified by implementing recording from a device. Distribution of the test an app binary to test is simple. However preparing the app with the monkeytalk agent requires some experience. Finally tApp offers simple and easy to read reports of a single execution on a device types. A great improvement in tApp is that an analyst can select a device/project and a set of device(s)/project(s). With this he/she can compare executions among different device type(s) and/or project(s). We have not yet created the perfect test tool, but tApp and its testing frameworks in general look promising. It is a questionable idea to pull testing away from someone with internal knowledge of how the views are built. Usage of tApp also showed this. The editor could be improved with respect to automation of UI component registration. This seems difficult when looking at multiple platforms. The reports that we get from monkeytalk are rather high level. It is possible to create custom actions and program them in an app. Think about an action that checks a user’s settings for the occurrence of a value and verify if it is what we expected. But if this gives significantly more insight is not apparent. The final conclusion is that we can provide insight in execution among different device types. Distribution of tests is rather easy. The reports present details that help a developer, tester and analyst with a possible overview of how the fragmentation issue relates to their app.

70 References

[1] K. Haller, Mobile testing, in ACM SIGSOFT Software engineering notes, Page1 - 7, November 2013 [2] Srikanth, Baride and Kamlesh, Dutta, in ACM SIGSOFT Software engineering notes, Page 1 - 4, May 2011 [3] Yael Dubinsky, Aharon Abadi, Challenges and Research Questions for Testing in Mobile Devel- opment, report on mobile testing by IBM, 2013 [4] Ridene, Youssef, Barbier, Franck, A Model-Driven Approach for Automating Mobile Applica- tions Testing, [5] Monkeyrunner for Android Developers, Google, 2013, accesses in oktober 2013, http://developer.android.com/tools/help/monkeyrunner_concepts.html [6] UIAutomator for Android developers, google, 2013, accessed in oktober 2013, http://developer.android.com/tools/help/uiautomator/index.html [7] Seetest mobile testing framework, Experitest, 2013 accessed in oktober 2013, www.experitest.com [8] Robotium - google group, 10-12-2009, accessed oktober 2013, https://code.google.com/p/robotium/ Seetest, Experitest, 2013, accessed oktober 2013, http://experitest.com/ [9] Telerik Test studio for iOS, Telerik, 2013, accessed oktober 2013, http://www.telerik.com/automated-testing-tools/ios-testing/ios-application-testing.aspx [10] MonkeyTalk | Free, Open Source, Mobile App Testing Tool, Gorilla Logic, 2013, accessed from april until oktober 2013, http://www.gorillalogic.com/monkeytalk [11] Recording from a device, Gorilla Logic, 2013, accessed from april until oktober 2013, http://www.gorillalogic.com/monkeytalk-documentation/monkeytalk-user-guide/command- recording [12] Scripting in Monkeyscript, Gorilla Logic, 2013, accessed from april until oktober 2013, http://www.gorillalogic.com/monkeytalk-documentation/monkeytalk-language- reference/scripts [13] Scripting in Javascript, 2013, accessed from april until oktober 2013, Gorilla Logic, http://www.gorillalogic.com/monkeytalk-documentation/monkeytalk-language- reference/scripting-javascript [14] Android Build info, Google, 2013, accessed oktober 2013 http://developer.android.com/reference/android/os/Build.html [15] Command reference, Gorilla Logic, accessed from april until oktober 2013, http://www.gorillalogic.com/monkeytalk-documentation/monkeytalk-language- reference/command-reference [16] FFF playstore, Peperzaken, 2013, https://play.google.com/store/apps/details?id=nl.fitforfree.serviceapp

71 [17] LG Klimaat playstore, Peperzaken, 2013, https://play.google.com/store/apps/details?id=com.lg.airco

[18] Install IDE, Gorilla Logic, accessed from april until oktober 2013, http://www.gorillalogic.com/monkeytalk-documentation/monkeytalk-getting-started/install- ide [19] Monkeytalk agent usage, Gorilla Logic, accessed from april until oktober 2013, http://www.gorillalogic.com/monkeytalk-documentation/monkeytalk-getting-started/install- agent [20] Monkeytalk wire protocol, Gorilla Logic, accessed from april until oktober 2013, http://www.gorillalogic.com/monkeytalk-documentation/monkeytalk-language- reference/monkeytalk-wire-protocol

[21] Monkeytalk java runner, Gorilla Logic, 2013, accessed from april until oktober 2013, http://www.gorillalogic.com/monkeytalk-documentation/monkeytalk-user-guide/java-runner [22] Android background service, Google, 2013, accessed from april until oktober 2013, http://developer.android.com/reference/android/app/Service.html

72 A List of requirements

This appendix shows all the requirements and priority for the implementation.

ID Requirement Priority The tool is split up in three separate subsystems/components to manage a project: 1. specification. 2. distribution.

3. reporting. FRG-1 High Each component can be executed on a separate machine without the FRG-2 user noticing it. Medium A developer can create a test script as a set of actions to be executed FRG-3 in sequence on a device. High A developer can create a test suite as a set of scripts that can be FRG-4 executed in sequence on a device. High A developer can create a project that stores: 1. a test suite.

2. device specific settings. FRG-5 High FRG-6 tApp supports creating scenarios for Android. High

A.1 Specification (FRS)

ID Requirement Priority FRS-1 A developer can specify a test script in a visual editor and in a script. High A developer can specify a test suite, or scenario, in a visual editor and FRS-2 a script. Medium A test script can be specified by recording the state of a connected FRS-4 device. Low FRS-5 A device can be connected to tApp through a network. Medium FRS-6 A device can be connected to tApp through a usb cable. Low A developer can see in the visual or code editor the interaction with FRS-7 his/her device (if one is connected). Low The specification component gives the option to delay an action for a FRS-10 certain amount of time. High The specification component gives the option to retry an action after a FRS-11 certain amount of time. High The specification component gives the ability to set a timeout after FRS-12 which the monkeytalk runner stops retrying. High

73 A.1.1 Distribution (FRDI)

ID Requirement Priority FRDI-1 A user can register multiple devices to his account. Medium A developer can specify sensor and connection settings for different test runs like: 1. Internet connection on/off

2. GPS on/off 3. Bluetooth on/off FRDI-2 Medium A developer can publish a test run that can be accessed by registered FRDI-3 mobile devices. High A developer can request participation of a specific device type to a test FRDI-4 run. Medium A tester is able to enroll in specific device test runs when they have the FRDI-5 device registered in tApp. Low The distribution component can store test reports created by a device FRDI-6 package. ...

A.1.2 Reporting (FRRE)

ID Requirement Priority A tester can view test reports of behavior of a specific app on all FRRE-1 known devices to tApp which had a test run. High A user can view test reports of a specific device with all known apps to FRRE-2 tApp who had a test run. High A tester can request a report about predicted behavior of a specified FRRE-3 app on one of his/her registered devices. High

A.1.3 Device package (FRDP)

ID Requirement Priority A device package contains a component for: 1. Collecting device info.

2. changing device settings. 3. Collecting test results. 4. Starting a scenario. FRDP-1 High The test package should be able to install the binary an app under test FRDP-2 which is specified in its settings. High

74 All cached and stored data created by the app under test can be FRDP-3 removed by the device package. Low FRDP-4 The device package can register a device to an account. High The test package can analyze all the hard- and software in a device and FRDP-5 store this with a device to account registration. High FRDP-6 Test results are collected and sent back to the distribution unit. High FRDP-7 The test package can do a test re-run with different device settings. Medium

A.1.4 Test package (FRTP)

ID Requirement Priority FRTP-1 A test package contains the location of an app under test. Medium FRTP-2 A test package contains a test suite with a different scenario. Medium A test package contains an URL with the location to store completed FRTP-3 test results. High FRTP-4 A test package contains a list with device settings for setting up a test. High

A.1.5 Non-functional requirements (NFG)

ID Requirement Priority NFG-1 The distribution unit will not compromise sensitive device data to High unknown third parties. NFG-2 The test report viewer will give an overview of the possible reports that Medium a user can generate and view. NFG-3 A report can be exported to portable formats, like pdf and/or html, Low that can be shared with users of tApp that do not have an account. NFG-4 A user can use tApp as a single tool that integrates the three parts Medium without the user noticing it.

A.1.6 Evolution requirements (ERG) ID Requirement Priority ERG-1 tApp should be able to offer support for "new” mobile Operating High Systems in the future besides Android. ERG-2 All components should be portable and interoperable so that they can High be migrated to a different machine without compromising tApp’s overall performance and look and feel.

75 B Scripts/suites used

This part lists all the scripts that were used for the scenarios in chapter 6.5.1.

B.1 Monkeytalk demo

Algorithm 1 Login

1 TextArea username EnterText Karsten 2 TextArea password EnterText Westra 3 Button LOGIN Tap

Algorithm 2 VerifyWelcomeAndLogout

1 Label ∗ VerifyWildcard "Hello , ∗" 2 Button LOGOUT Tap

Algorithm 3 Forms

1 TabBar ∗ Select forms 2 ItemSelector ∗ Select Nitrogen 3 CheckBox ∗ Off 4 CheckBox ∗ On 5 ButtonSelector ∗ S e l e c t B 6 ButtonSelector ∗ S e l e c t A 7 ButtonSelector ∗ S e l e c t C 8 S l i d e r ∗ S e l e c t 75 9 S l i d e r ∗ S e l e c t 25 10 S l i d e r ∗ S e l e c t 50

Algorithm 4 BackToLogin

1 TabBar ∗ Select login

Algorithm 5 doMT

1 Test LOGIN Run 2 Test VerifyWelcomeAndLogout Run 3 Test Forms Run 4 Test BackToLogin Run

76 B.2 Fit for free

Algorithm 6 Login

1 TextArea Pasnummer EnterText 12583950 2 TextArea Postcode EnterText 5343cd 3 Button Inloggen Tap

Algorithm 7 VerifyData

1 Label Pasnummer Verify 12583950 2 TextArea "Mobiele nummer" Verify 0612345678 3 TextArea Email Verify fitforfree −[email protected] 4 Button Overslaan Tap

Algorithm 8 VerifyProfileData

1 View DashboardProfileButton Tap 2 Label ProfileOverviewPasnumberTextView Verify 12583950 3 Label ProfileOverviewDataCellphoneTextView Verify 0612345678 4 Label ProfileOverviewDataEmailTextView Verify fitforfree −[email protected]

Algorithm 9 Logout

1 Device ∗ Back 2 Button "Log uit" Tap

Algorithm 10 doFFF

1 Test Login Run %thinktime=2000 2 Test VerifyData Run %timeout=3000 3 Test VerifyProfileData Run 4 Test Logout Run

77 B.3 LG Klimaat B.3.1 Scenario part 1

Algorithm 11 Login

1 TextArea Klantnummer EnterText 123456789 2 Button Activeren Tap 3 Button Ja Tap

Algorithm 12 ProvideData

1 TextArea Naam EnterText "Karsten Westra" 2 TextArea Telefoonnummer EnterText 0612345678 3 TextArea Emailadres EnterText [email protected] 4 TextArea Functie EnterText Developer 5 Button "Activatie afronden" Tap

B.3.2 Scenario part 2

Algorithm 13 GetInfo

1 View Handleidingen Tap 2 Label ∗ Verify Handleidingen 3 View "Open parameterlijst" Tap 4 Device ∗ Back 5 Device ∗ Back

Algorithm 14 Solve

1 Vars ∗ Define code 2 TextArea ∗ EnterText ${code} 3 View SolutionsErrorCodeImageButton Tap

Algorithm 15 VerifyGoodSolution

1 #TODO: nu clue how to verify... 2 Device ∗ Back

78 Algorithm 16 doLG

1 Test GetInfo Run 2 Test Solve Run 115 3 Test VerifyGoodSolution Run 115 %thinktime=1000 4 Test Solve Run 155 5 Test VerifyGoodSolution Run 155 %thinktime=1000

79