<<

Thesis no: MSSE-2016-14

Performance, , and Reliability (PSR) challenges, metrics and tools for A Case Study

Akshay Kumar Magapu Nikhil Yarlagadda

Faculty of Computing Blekinge Institute of SE–371 79 Karlskrona, Sweden This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fulfillment of the requirements for the degree of Master of Science in Engineering. The thesis is equivalent to 20 weeks of full time studies.

Contact Information: Author(s): Akshay Kumar Magapu E-mail: [email protected]

Nikhil Yarlagadda E-mail: [email protected]

External advisor: Saket Rustagi Project Manager Ericsson India Global Services Pvt. Ltd. Gurgaon, India.

University advisor: Michael Unterkalmsteiner Department of

Faculty of Computing Internet : www.bth.se Blekinge Institute of Technology Phone : +46 455 38 50 00 SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57 Abstract

Context. Testing of web applications is an important task, as it ensures the functionality and quality of web applications. The quality of web applica- tion comes under non-functional testing. There are many quality attributes such as performance, scalability, reliability, usability, accessibility and se- curity. Among these attributes, PSR is the most important and commonly used attributes considered in practice. However, there are very few empiri- cal studies conducted on these three attributes. Objectives. The purpose of this study is to identify metrics and tools that are available for testing these three attributes. And also to identify the challenges faced while testing these attributes both from literature and practice. Methods. In this research, a systematic mapping study was conducted in order to collect information regarding the metrics, tools, challenges and mitigations related to PSR attributes. The required information is gathered by searching in five scientific databases. We also conducted a case study to identify the metrics, tools and challenges of the PSR attributes in practice. The case study is conducted at Ericsson, India where eight subjects were interviewed. And four subjects working in other companies (in India) were also interviewed in order to validate the results obtained from the case com- pany. In addition to this, few documents of previous projects from the case company are collected for data triangulation. Results.A total of 69 metrics, 54 tools and 18 challenges are identified from systematic mapping study. And 30 metrics, 18 tools and 13 challenges are identified from interviews. Data is also collected through documents and a total of 16 metrics, 4 tools and 3 challenges were identified from these documents. We formed a list based on the analysis of data that is related to tools, metrics and challenges. Conclusions.We found that metrics available from literature are overlap- ping with metrics that are used in practice. However, tools found in liter- ature are overlapping only to some extent with practice. The main reason for this deviation is because of the limitations that are identified for the tools, which lead to the development of their own in-house tool by the case company.

i We also found that challenges are partially overlapped between state of art and practice. We are unable to collect mitigations for all these challenges from literature and hence there is a need for further research to be done. Among the PSR attributes, most of the literature is available on perfor- mance attribute and most of the interviewees are comfortable to answer the questions related to performance attribute. Thus, we conclude there is a lack of empirical research related to scalability and reliability attributes. As of now, our research is dealing with PSR attributes in particular and there is a scope for further research in this area. It can be implemented on the other quality attributes and the research can be done in a larger scale (considering more number of companies).

Keywords: Web applications, Web testing, Performance, Scalability, Reli- ability, Quality.

ii Acknowledgments

We would like to thank our supervisor Michael Unterkalmsteiner for his tremen- dous and quick support whenever needed. We also thank Ericsson for providing us the opportunity to conduct case study and interviewees from other organiza- tions for participating in the interviews. Special credits go to our family, friends for providing us the support to make the thesis completed.

The authors

iii Contents

Abstract i

Acknowledgments iii

1 Introduction 1 1.1 Web testing ...... 1 1.1.1 Functional testing ...... 2 1.1.2 Non-functional testing ...... 2 1.2 Problem statement ...... 3 1.3 Thesis structure ...... 3

2 Background and Related Work 5 2.1 Web applications ...... 5 2.2 Web testing ...... 7 2.2.1 Functional testing ...... 8 2.2.2 Non-functional testing ...... 10 2.3 Selected attributes ...... 13 2.4 Research scope ...... 14 2.5 Related work ...... 15 2.5.1 Literature related to metrics ...... 15 2.5.2 Literature related to tools ...... 15 2.5.3 Literature related to challenges ...... 16 2.5.4 Research gap ...... 17

3 Method 18 3.1 Research purpose ...... 18 3.1.1 Objectives ...... 18 3.2 Research questions ...... 19 3.2.1 Motivation ...... 20 3.3 Research method ...... 20 3.3.1 Systematic mapping study ...... 21 3.3.2 Case study ...... 31 3.4 Data analysis ...... 36 3.4.1 Familiarizing yourself with the data ...... 36

iv 3.4.2 Generating initial codes ...... 37 3.4.3 Searching for themes ...... 38 3.4.4 Reviewing themes ...... 38 3.4.5 Defining and naming themes ...... 39 3.4.6 Producing the report ...... 39 3.5 Validity threats ...... 39 3.5.1 Construct validity ...... 39 3.5.2 Internal validity ...... 40 3.5.3 External validity ...... 40 3.5.4 Reliability ...... 41

4 Results and Analysis 43 4.1 Facet 1: Metrics for testing PSR attributes ...... 44 4.1.1 Systematic mapping study ...... 44 4.1.2 Interviews and documents ...... 48 4.1.3 Criteria for selection of metrics ...... 52 4.2 Facet 2: Tools for testing PSR attributes ...... 53 4.2.1 Systematic mapping study ...... 53 4.2.2 Interviews and documents ...... 56 4.2.3 Tool drawbacks and improvements ...... 61 4.3 Facet 3: Challenges faced by software testers ...... 62 4.3.1 Systematic mapping study ...... 62 4.3.2 Interviews and documents ...... 67 4.3.3 Does mitigations available in literature mitigates challenges in practice? ...... 72 4.4 Facet 4: Important attribute among PSR ...... 72 4.4.1 Interviews ...... 72

5 Discussion 75 5.1 Metrics for testing PSR attributes of web applications ...... 75 5.2 Tools for testing PSR attributes of web applications ...... 77 5.3 Challenges in PSR testing of web applications ...... 79 5.4 Most important attribute among PSR ...... 82 5.5 Implications ...... 82

6 Conclusions and Future Work 86 6.1 Research questions and answers ...... 86 6.1.1 RQ 1: Metrics used for testing the PSR attributes . . . . . 86 6.1.2 RQ 2: Tools used for testing the PSR attributes ...... 87 6.1.3 RQ 3 Challenges identified while testing the PSR attributes 89 6.1.4 RQ 4: Important attribute among PSR ...... 90 6.2 Conclusion ...... 90 6.3 Research contribution ...... 91

v 6.4 Future work ...... 92

Appendices 105

A Systematic maps 106

B SMS overview 108

C List of metrics 117

D List of tools 121

E List of challenges 122

F Interview questions 124 F.1 Technical Questions ...... 124 F.1.1 Tools ...... 124 F.1.2 Metrics ...... 125 F.1.3 Challenges ...... 126 F.1.4 General ...... 127

G MTC and IA identified between case company and other com- panies 128 G.1 Metrics ...... 128 G.2 Tools ...... 129 G.3 Challenges ...... 129 G.4 Important attribute ...... 129

H Consent form 130

vi List of Figures

1.1 Types of testing ...... 2 1.2 Thesis structure ...... 4

2.1 Requirements classification ...... 7 2.2 Types in functional testing ...... 8 2.3 Types in non-functional testing ...... 10 2.4 Research scope ...... 14

3.1 Systematic mapping study process ...... 22 3.2 Case study process steps ...... 31 3.3 Pyramid model for interview questions ...... 33 3.4 Steps for thematic analysis ...... 36 3.5 Themes formed in Nvivo tool for interviews ...... 42

4.1 Number of sources addressing the research attributes ...... 44 4.2 Thematic map for metrics from SMS ...... 45 4.3 Thematic map for metrics from interviews ...... 48 4.4 Thematic map for metrics from documents ...... 52 4.5 Thematic map for tools from SMS ...... 53 4.6 Thematic map for tools from interviews ...... 57 4.7 Types of tools obtained from interviews ...... 57 4.8 Thematic map for tools from documents ...... 61 4.9 Thematic map for challenges from SMS ...... 62 4.10 Number of articles addressed each theme from SMS ...... 63 4.11 Thematic map for challenges from interviews ...... 67 4.12 Number of interviewees addressed the themes ...... 68 4.13 Thematic map for challenges from documents ...... 71 4.14 Thematic map for important attribute from interviews ...... 73

5.1 Overlap and differences in metrics among all data sources . . . . 76 5.2 Overlap and differences in tools among all data sources ...... 78 5.3 Overlap and differences in challenges among all data sources . . . 81 5.4 Overlap and differences in challenge areas among all data sources 81

vii 5.5 Overlap and differences in metrics between state of art and state of practice ...... 83 5.6 Overlap and differences in tools between state of art and state of practice ...... 84 5.7 Overlap and differences in challenges between state of art and state of practice ...... 85

A.1 Research parameters vs research attributes in SMS ...... 106 A.2 Research methods vs research attributes in SMS ...... 107 A.3 Research methods vs research parameters in SMS ...... 107

viii List of Tables

3.1 Keywords used for search string formulation ...... 23 3.2 Search strings used for selection of literature ...... 25 3.3 Search results before and after applying exclusion criteria . . . . . 28 3.4 Initial search selection results ...... 28 3.5 Search results after removing duplicate articles in each database . 29 3.6 Data extraction form ...... 30 3.7 Details of interviewee ...... 34 3.8 Overview of selected companies ...... 35 3.9 Research questions and their respective data collection technique . 35

4.1 Performance metrics ...... 46 4.2 Scalability metrics ...... 47 4.3 Reliability metrics ...... 48 4.4 Metrics obtained from documents ...... 52 4.5 Identified PSR tools ...... 55 4.6 Commercial tools obtained from interviews ...... 58 4.7 Frameworks obtained from interviews ...... 58 4.8 Monitoring tools obtained from interviews ...... 59 4.9 Open source tools obtained from interviews ...... 60 4.10 Tools obtained from documents ...... 61

B.1 SMS overview ...... 108

C.1 Metrics description ...... 117

E.1 List of challenges ...... 122

G.1 Identified metrics between case company and other companies . . 128 G.2 Identified tools between case company and other companies . . . . 129 G.3 Identified challenges between case company and other companies . 129 G.4 Identified important attribute between case company and other companies ...... 129

ix List of acronyms and abbreviations

GUI Graphical

IA Important Attribute

MTBF Mean Time Between Failures

MTC Metrics Tools Challenges

MTTF Mean Time To Failure

MTTR Mean Time To Repair

PS Performance Scalability

PSR Performance Scalability Reliability

SMS Systematic Mapping Study

SR Scalability Reliability

UI User Interface

WWW

x Chapter 1 Introduction

1.1 Web testing

The internet has evolved a lot in the recent years and number of users depending on the internet resources is also increasing exponentially. The internet acts as a medium between web applications and users. These web applications have become very popular and are attracting many users towards it because of its flexibility and ease of access from anywhere. Because of its popularity, many software companies are using web applications as a new trend in the modern world for providing their services directly to the users. These are used in various fields such as education, entertainment, business, manufacturing, cooperative work, scientific , and in order to satisfy the requirements of the users [1]. The complexity in web applications is increasing day to day in order to satisfy the requirements. Along with this, organizations are also deploying web applications into the market without proper testing because of time pressure and early releases [2].Some web applications are failing to satisfy the needs of users because of their poor quality. The users are getting unsatisfied and leave web sites with a negative impression which leads to a loss in the number of users, sales and business opportunities. A recent study [3] stated that because of the poor quality of a website, some of the users of the site stopped buying the product from web site while some other users stopped using the product entirely. Because of poor quality of the website, the organization has lost majority of its customers. Hence, users are the ultimate judges for the success or failure of web applications. Users mainly evaluate website in terms of availability, reliability, response time, cost and accuracy [4]. Hence, satisfying the users is becoming one of the important challenges and there is a need for the companies to keep in mind about these criteria while developing web applications. Along with this, withstanding in the market against competitive products is also an important factor to be considered. User satisfaction and defeating the competitive products in market are the two major challenges that are needed to be addressed by software companies while developing web applications. This can be achieved by producing and delivering a quality assured web application into the market. This can be achieved by sub- jecting these web applications to various tests during the phase

1 Chapter 1. Introduction 2 in software development life cycle [5]. The testing phase is a significant activity which ensures and reliability. There are two types of testing that can be done for web applications represented in figure 1.1 and discussed next.

Figure 1.1: Types of testing

1.1.1 Functional testing Functional testing mainly concentrates on validating the functional requirements of web applications. These requirements are focused on the activities and in- teractions the software shall support in order to fulfill the users’ requirements. Functional testing consists of three approaches: white box testing, black box test- ing and grey box testing [6]. These approaches are further discussed in section 2.2.1.

1.1.2 Non-functional testing Non-functional testing mainly concentrates on validating non-functional require- ments of web applications. Even though functional testing is important, quality testing is a must criterion to be considered in a competitive market [7]. The core functionality is important, but without quality, functionality of the system is less concerned by the users. So the focus of the research is more shifted towards non-functional parameters testing [1]. Testing non-functional attributes mainly depend on the runtime environment of an application. Recently, focus of the researchers and industries shifted towards non-functional testing of web applications. Numerous methods are being developed to test the quality attributes effectively for web testing [3].Web applications with high quality are seeking the attention of customers, so the demand for developing quality web applications is increasing in the current market [8]. Various testing parameters can be considered to measure the quality of web applications which include GUI testing, , PSR, safety, security and many more [9]. Users rate the Chapter 1. Introduction 3 response time as an important factor. If web applications fail to provide response within eight seconds, then 30% of the users leave the application [10, 11]. For example, in case of banking system security, performance and reliability factors are important [2]. Performance, scalability and reliability attributes are selected for our research as these are important attributes which are commonly used for testing non- functional requirements in many systems [1, 2]. PSR attributes are perceived as a critical factor by the users and hence these attributes can judge the quality of web applications [2]. A lot of research has been done on these three attributes in terms of software systems i.e. traditional software [11]. However, PSR testing of web applications has got more attention recently with the rise of web applica- tion’s popularity.

1.2 Problem statement

Web testing has gained more popularity in the last few years but there are few empirical studies that focus on tools and metrics used for testing PSR quality attributes [1, 12]. The challenges faced while testing web applications are also not known as there are limited amount of studies that concentrated on challenges regarding PSR attributes [1, 13, 14]. In order to gain knowledge regarding these terms and to know how to perform testing without any problems, there is a need for further research in this area. To solve this problem, we conduct a study to provide a list of MTC for practitioners and researchers to move on further. This research investigates and analyzes metrics, selection criteria for metrics, tools, drawbacks in tools from software testers perspective, and challenges from both literature and practice. Along with that mitigation strategies available for challenges in literature are also analyzed. An empirical study is conducted at Ericsson, India to investigate the practical challenges faced by the testers and also to identify tools, metrics used by software testers for web application testing.

1.3 Thesis structure

This section gives a general idea about the chapters present in this thesis. The thesis consists of seven chapters and each chapter has its own importance. The main focus and the details of each chapter are provided below and shown in the figure 1.2. Chapter 1. Introduction 4

Figure 1.2: Thesis structure

• Chapter 2 focuses on the topics involved in the research, which provides a basic information or basic idea to the reader and it also explains some of the previous works that have been carried out in the studied area.

• Chapter 3 presents the research methodology opted for conducting the research. It also provides the implementation process of research method and research operation.

• Chapter 4 provides the results obtained from research methods and analysis.

• Chapter 5 discusses about the significant information retrieved from the anal- ysis.

• Chapter 6 summarizes and concludes the contemporary research and presents the future work that can be carried out in later stages. Chapter 2 Background and Related Work

This chapter describes the details about the background of proposed research, the selection of attributes, research scope and related work carried out in the proposed research. Section 2.1 describes the concept of web applications. Section 2.2 provides the concept of web testing. Section 2.3 concentrates on the selected attributes among several quality attributes for the research. Section 2.4 focuses on scope of the research. Section 2.5 discusses the previous research on testing of PSR attributes (related to MTC).

2.1 Web applications

World Wide Web (WWW) has developed with an exponential growth in the past twenty-two years. Web applications are evolved from static applications to dy- namic applications and distributed applications [15]. Nowadays web applications can be retrieved or viewed from desktops to smart phones [16]. The growth in number of users and devices leads to increase in the use of web applications phe- nomenally. The evolving nature of web applications and their services to the users made it to became a fundamental part of their life. The main difference of web applications from traditional applications is that they can be accessed from any device and from anywhere without the need of installing any module, which explains the success of web applications [17, 11]. Web applications provide services in many areas such as e-commerce, e-learning, e-business and entertainment, socializing and many more [14, 1]. People started using these web applications as a medium for day to day communication and are becoming a part of their life in recent years [6].Due to the vast growth of web, many companies and businesses are relying on web applications. And also the structure and easiness of these web applications can decide the success or failure of the enterprises. As observed by Hugo Menino et al., Yu Qi et al. [17, 18], the accuracy and performance of these web applications are considered as one of the factors in deciding the success of enterprises. Many are introduced for developing web applications, as each of the technology have its own pros and cons. Hence, based on the suitability and requirements, the appropriate tech- nology will be selected. According to Nikfard [6] applications are classified into

5 Chapter 2. Background and Related Work 6 different classes based on their context and the information they are providing.

• Deliverable

• Customizable

• Transactional

• Acceptable

• Interactive

• Service-oriented

• Accessible

• Data warehouse

• Automatic

Web applications are the applications which run on the server side and the interface of applications can be accessed through client side. Web applications are said to be a collection of files where the user can create a request and the request will be sent to the server, then the server processes the request and generates a response to the client side. The communication between client and server is executed as mentioned above. Web applications are basically hosted on web servers and the requests generated by the clients are handled by the hosted servers. Until now, we have considered only the basics of web application; the complex part is developing and testing the web applications. Web development is a tedious task, as even a minor error can cause chaos to the entire web application. Hence, web applications are needed to be developed by keeping security in mind, so that exploiters cannot access confidential data. In general, web applications have many other disadvantages but the discussion of these is however out of scope of this thesis. There is a lot of development done on web applications such that these applications are generally accessed with an internet connection. Nowadays web applications can also be accessed offline. Web applications are retrieved or viewed by using client side browser. There are many other browsers available in the market and web applications are not supported by all the browsers. Web applications can be accessed from any plat- form. One major reason for attracting many users towards web applications is the fact that they can be accessed without additional software installation [19]. As web applications are valued by both the customers and organizations, proper testing is needed to be done prior to the deployment in live environment [19]. Web testing and their types are discussed in section 2.2. Chapter 2. Background and Related Work 7

2.2 Web testing

Testing is the technique used to ensure the functionality and quality of a software. Testing of web applications is said to be web testing and it is a branch of software testing. The aim of web testing is to find application failures. Generally, the failure is due to the presence of faults in the application. According to Di Lucca and Fasolino [14], the faults can be caused due to running environment, and interaction between client and server. According to Arora and Sinha [20], web testing is different from traditional software testing since web applications may undergo many maintenance changes and even daily updates. Web applications does not have any specific or default number of users. They vary as the number of users accessing web application can increase or decrease at a given point of time. It is very difficult to find the existing errors in web applications as it is a multi-tier architecture [21]. Web applications are dynamic in nature so the complexity in web application is increasing in order to fulfill numerous requirements [22]. To maintain the functionality and quality in web applications, they are needed to be tested. There are two types of requirements for building web applications as represented in figure 2.1.

Figure 2.1: Requirements classification

• Functional requirements The functional requirements are related to the ap- plication, so the functionalities of the application are described through these requirements. To test the functionality of web application, the re- quirements are validated against the application. If the requirements are satisfied, then the applications are said to be passed in the test. The func- tionality testing of web applications is further discussed in section 2.2.1.

• Non-functional requirements The non-functional requirements are more re- lated to the environment of application and helps to improve the quality of the application. These requirements are verified and validated by the non- functional testing. Non-functional testing is further discussed in section 2.2.2. Chapter 2. Background and Related Work 8

2.2.1 Functional testing Functional testing is used to validate functional requirements against the applica- tion. The faults in web application are identified through this testing. Functional testing validates the flow of the application to check whether it is progressing in a desired manner or not. Functional testing validates web links and their faults such as broken links, page error, validation of inputs, validation of forms, links and breadcrumbs validation, validating dynamic verification and content testing. It also validates the existing fields in web application such as when an incorrect format of data is entered. So functional testing verifies the whole application in order to check whether the application is working without any faults. Functional testing of web application depends generally on four aspects such as testing levels, test strategies, test models, testing process [6]. The functional testing of web ap- plications consists of three test approaches: White box testing, Black box testing, Grey box testing. The white box testing mainly focuses on testing the structure of application and percentage of code covered during testing. Black box testing is related to testing of application behavior in which test cases are written to test the function- ality of application. The grey box testing is a combination of both white box and black box testing. In this, the environment and the functionality of the applica- tions are tested. The grey box testing is more feasible for testing web applications as it identifies the faults and failures that are existing in the environment of web application and also the flow of application [6]. Functional testing consists of six sub-types in web application as shown in figure 2.2.

Figure 2.2: Types in functional testing Chapter 2. Background and Related Work 9

2.2.1.1 Basically when developers code the applications, an initial testing is performed and it is termed as smoke testing. It is carried out to verify whether the written code works or not. If it fails to work, then the developer will come to know that there exists a fault in the written code and make sure to rectify it [14, 6].

2.2.1.2 A certain function or unit piece of code is tested after it is developed. The testing mainly concentrates on the specific functions and does not test the other features depending on it. This kind of testing is termed as unit testing [14, 6].

2.2.1.3 The testing of new functional code with the previously implemented or modified code is said to be regression testing. This testing is performed when there exists a small change or addition of code to the existing code to verify whether the modified or added code creates any faults in the application [14, 6].

2.2.1.4 After certain functionalities are developed, all of them are integrated together to perform the integration testing. It is performed to ensure that the application is working as expected even after integrating all the individual components [14, 6].

2.2.1.5 System testing is used to find defects in the entire web application. Generally, the system testing is conducted in three approaches: black box, white box and grey box. The black box approach is used to identify the failures in user functions and external behavior of the application. White box approach is used to identify defects related to incorrect links between pages and grey box testing is used to identify effects related to application navigation structure and also external behavior of the application [14, 6].

2.2.1.6 Acceptance testing is performed to ensure that the user requirements and business requirements are satisfied by the developed application and it can be ready to deploy and use in the live environment. This testing is conducted as per the acceptance criteria prepared prior to testing, so it validated whether the developed application meets the acceptance criteria or not [23]. Functional testing can be done in both manual and automated ways. A test plan is designed on how to perform tests. Later test cases and suites are prepared Chapter 2. Background and Related Work 10 to test web applications. For web applications there is also a tool (capture and replay), which is used to capture or records the functionality and replay or retest it [7]. Using the capture and replay tool many types of scenarios can be tested. So by repeated testing, it can be ensured that the application can work as expected in the live environment.

2.2.2 Non-functional testing Ensuring only functionality of a web application is insufficient in present compet- itive market. As in every field, quality of web applications is a major concern. So validating the non-functional requirements is necessary. According to Hossain, Nikfard [3, 6] there are seven non-functional testing types in web applications, which ensures the quality of web application. For performing testing on these attributes, there is a need for some benchmarks and test strategies to be defined. The different attributes in non-functional testing of web applications are as shown in figure 2.3.

Figure 2.3: Types in non-functional testing

2.2.2.1 Performance testing Performance testing is performed to know the application performance in terms of response time, availability etc. Response time is defined as the time taken to receive a response from the server when a user submits a request to the applica- tion. So to assure the performance of web application in the live environment, a set up with virtual users is generated, simulating the behavior of real users by using scenarios to perform certain operations in order to measure the performance attribute. As web applications are very dynamic in nature the performance test Chapter 2. Background and Related Work 11 is a continuous process. So through activity logs, the application performance is needed to be analyzed [14, 6]. The important subtypes in the performance testing are discussed next.

2.2.2.1.1 Load testing is defined as stability of the system to handle maximum amount of work load without any significant degradation in performance [12]. It is performed under minimum configuration and maximum activity levels to examine the time taken, in order to complete the set of action. A set of users is simulated and tested to get various scenarios. Load testing provides how much load the application can withstand by responding to every request it takes. It helps in identifying bottlenecks and failures of the system. And the failures recognized through load testing is due to faults in running environment conditions [14, 6].

2.2.2.1.2 Stress testing is carried out to verify whether the application is able to withstand if the load is put beyond the point. It helps in identifying bottlenecks of the system such as memory leakage, transactional problems, resource locking, hardware limitations and bandwidth limits [24]. Most of the failures or errors detected through stress testing are due to faults in running environment conditions like hardware and software [14, 6, 25].

2.2.2.2 Scalability testing Scalability testing is defined as the flexibility of a system to deal with the changes caused due to an increase in the load without violating from predefined objectives [1]. It is performed to validate the balance of the load on the resources when a certain load is achieved. The hardware resources are added and tested to know the change in response time and effect of adding the resource to the application. Failures detected through scalability testing are due to fault in running environ- ment and hardware resources. The scalability can be implemented in two ways: vertical and horizontal scalability [21].

2.2.2.2.1 Vertical scalability It is achieved by adding extra resources like memory, processors etc. to an existing server. It is also known as scaling up. The vertical scalability has both positive and negative impact on the system [21]. The positive impact on the system is that, it increases the performance and the system manageability as resources like memory and processor are added to it. Whereas the negative impact of vertical scalability is that it decreases the availability and reliability of the system as the load balancing may become difficult among more number of resources.

2.2.2.2.2 Horizontal scalability It is also known as scaling out. It is ob- tained by adding extra servers to the system. Like vertical scalability and it also Chapter 2. Background and Related Work 12 have both positive and negative impact on the system [21]. The positive impact on the system is that it improves the availability, reliability and performance as more number of servers are added to it and if one of them fails, others can work. Whereas the negative is that it reduces the manageability of the system as managing more number of servers will become difficult.

2.2.2.3 Usability testing Usability testing is performed to measure the usability of the application in terms of ease of use, content, navigability, color and look etc. Usability testing is nec- essary for web applications in order to define how easy is the application to use. Based on the obtained results, the application can be improved further. Failures found in the usability testing are due to faults that are identified in application [14, 6].

2.2.2.4 Compatibility testing is carried out to validate the execution of application in different environments and to know in which environment the application is failing to execute. Not all web applications are capable to run in every browser, so a test strategy is defined prior to testing which consists of the details about the set of browser that have to be tested. Failures found through this testing are due to faults in application and the running environment [14, 6].

2.2.2.5 Security testing is carried out to validate the application in terms of how secure it is from intruders or hackers as they may steal the confidential information. Security testing is a challenging task in web testing as even though proper care is taken for application, there might be a chance of existence of attack vectors in the application. Using security flaws as a medium, intruders can access confidential information. So security testing is to be done with utmost care. Failures found through this testing may be due to faults in the application and in the running environment [14, 6].

2.2.2.6 Accessibility testing Accessibility testing is performed to validate content of the application must be accessible even in low configuration systems and also to check whether the content can be retrieved by physically handicapped people. The accessibility is a necessary attribute in the application, if it is accessed by many users. So, failure detected through accessibility testing is due to faults in application and in the running environment [14, 6]. Chapter 2. Background and Related Work 13

2.2.2.7 Reliability testing Reliability testing is carried out to validate the application based on the time it can live and time taken to recover in case if it fails. Reliability testing is necessary for web application. It helps to gain the information regarding how long the application can be available and how much time it requires in order to recover from the failure. The failures identified in this testing are mainly related to the environment problems [14, 6].

2.3 Selected attributes

The focus of this thesis is on non-functional testing of web applications. The se- lection of all non-functional attributes increases the scope of the study. So to limit this, we have selected three attributes to conduct the research. The attributes are Performance, Scalability and Reliability (PSR). These three attributes are collec- tively called quality factors. An overview of these three attributes is provided in section 2.2.2. The motivation regarding selection of these attributes is provided below. The important attributes in the quality criteria are provided by Svensson et al. [26] where they conducted interviews in 11 different software companies. The results are as follows: usability, performance, reliability, stability and safety attributes are top five important attributes compared to other quality attributes. Even though usability is top most quality attribute, we did not considered it as there is a lot of research done on this attribute previously [27, 28]. The importance of performance, scalability and reliability attributes are men- tioned by Iyer et al. [13] where they investigated the issues in the testing methods of PSR. Smith and Williams [29] described the need for the quality of service in web applications and it is obtained by focusing on the scalability and performance attributes. The importance of scalability and its role in gaining the user satis- faction is explained clearly. Data analyzed from literature describe scalability as a part of performance attribute [30]. As mentioned by Svensson [26], second and third attributes are performance and reliability. These are considered in our research because a small amount of research is done on testing of these attributes in web applications collectively till now [13, 31, 32, 33]. Based on the above rea- sons, from the available attributes we mitigated the broad scope by restricting it to three quality attributes for our thesis. They are Performance, Scalability, and Reliability (PSR). One of the main reason for selecting the PSR attributes is, as the case com- pany is more focusing on these three attributes. The remaining two attributes stability and safety are also not considered in our research because the case com- pany does not consider these attributes while testing the web applications. The case company is more focusing on the attributes such as performance, scalability, Chapter 2. Background and Related Work 14 reliability, usability and compatibility. This is the main reason for considering these three attributes(PSR) for our research. According to [27, 28], usability attribute has a lot of research and hence we did not focused on it. Whereas com- patibility is more related to the browser and so it differs from PSR. And as Svensson et al. [26] provided top five quality attributes, in which compatibility is not one of them. Due to these reasons we mainly focused on PSR attributes for this study [34].

2.4 Research scope

This section illustrates the scope of the research as it consists of three areas. It is depicted diagrammatically in figure 2.4.The three areas are testing, software, and quality attribute. As testing is a broad area, we limited our scope to web testing and is further narrowed down into quality testing of web applications which are PSR attributes.

Figure 2.4: Research scope Chapter 2. Background and Related Work 15

2.5 Related work

This section discusses about work that has been carried out over the past years, explains the gap in this study and also discusses about how that identified gap is filled. This section contains four subsections which focuses on the related work of tools, metrics, challenges and describes about the identified gap.

2.5.1 Literature related to metrics Dogan et al. [35] focused on identifying the tools and metrics available for testing web applications. They have identified few metrics based on the criteria such as cost and effectiveness in general for testing web applications. Kanij et al. [32] explained some of the metrics considered while testing the per- formance attribute. They have addressed two metrics for performance attribute and explained the need for further research in finding the related metrics. Xiaokai Xia et al. [36] proposed a model to evaluate and analyze the perfor- mance attribute of web applications. As the proposed model mainly concentrates on finding the issues that are unidentified during the testing phase. In order to identify the unidentified issues, they have considered a set of metrics for building the model which mainly concentrates on the attribute performance. Deepak Dagar and Amit Gupta [37] mainly focused on addressing the types, tools and methods used for testing web applications. As part of the research, they have also explained some of the metrics related to the performance testing. Rina and Sanjay Tyagi [38] compared the testing tools in terms of some met- rics. In this research, they have evaluated the testing tools using the parameters by conducting an experiment for choosing the appropriate tool for conducting testing. This research mainly deals with the performance attribute. R.Manjula et al.[39] mainly concentrated on the reliability attribute. They have explained the need for reliable web applications and proposed a reliability based approach to evaluate web applications. As part of this research, they have also mentioned some of the reliability parameters used for building this approach.

2.5.2 Literature related to tools P. Lazarevski et al. [12] conducted a case study to evaluate the performance testing tool Grinder. Along with that, they also explained some of the other tools mainly related to performance testing and performance monitoring. The research is limited to AJAX based web applications and the tools which may relate to other web applications are not considered. Wang and Du [40] proposed a framework by integrating the functionalities of the tools like JMeter and selenium together. This framework mainly concentrates on addressing different types of testing like UI testing, backend testing, load Chapter 2. Background and Related Work 16 testing etc. They have also mentioned some of the tools related to performance testing, but failed to address the other quality attributes. Hamed and Kafri [23] mainly compared web applications of two different tech- nologies and .NET by using performance testing tools and performance metrics. They evaluated both the technologies in terms of response time and throughput. The authors find that the Java technology performs better when compared to .NET technology in web applications. Arora and Bali [41] conducted a research on the automated tools available for performance testing. As a process of the research they have conducted a literature survey and identified 18 different automated tools for performance testing of web applications. Garousi and Mesbah [42] conducted a mapping study to identify the tools available for testing web applications. And Dogan et al. [35] also focused on identifying the tools available for testing web applications by conducting sys- tematic literature review. They have identified few tools along with the factor availability of the tool. They mainly concentrated on the performance attribute. Arora and Sinha [20] stated the need for testing web applications and also focused on two different techniques such as state- based and invariant-based test- ing. They have mainly focused on web testing and provided information related to tools of both functional and non-functional attributes of web applications which are not clear. Rina and Tyagi [38] compared some of the performance testing tools in terms of some metrics. In this research, they have evaluated the testing tools by con- ducting an experiment in order to select the suitable tool for conducting the testing.

2.5.3 Literature related to challenges P. Lazarevski et al.[12] explained some of the tools mainly related to the perfor- mance testing and performance monitoring. Along with these tools, they have also addressed the drawbacks and limitations existing in selected tools. As this research is mainly addressed the issues related to the performance attribute. Iyer et al. [13] explained the process of conducting the web testing and issues in the process while dealing with the quality attributes. They have focused mainly on the quality attributes like performance, scalability and reliability. They have limited their research only to find the issues related to testing methods. And explained the need for further research on the topic PSR. Junzan Zhou et al. [43] explained about the testing methods and some of the traditional testing tools available for performance testing. Along with this, they also mentioned some of the challenges related to the area of performance testing tools. Arora and Sinha [41] stated the need for testing web applications and also about some of the tools and methods. They mainly focused on web testing and Chapter 2. Background and Related Work 17 provided some of the challenges related to the functionality of web applications. From the above sections, the literature related to PSR attributes are only provided by one article which focuses on the issues in the testing methods [13], whereas the remaining articles from the literature discussed on the topics related to tools, metrics and challenges separately. From the literature, we have noticed that only a few authors concentrated particularly related to tools, metrics and challenges of PSR. Research related to the PSR testing of web applications is new to the field of software engineering as there are very few studies existing. We came across one such article [13] from the literature, which mainly focuses on the issues of testing methods for PSR attributes. As of our knowledge till now, we did not find any research dealing with tools, metrics and challenges in PSR attributes.

2.5.4 Research gap The quality attributes play a key role in web testing. In order to deploy the web application on the server, a testing process is conducted in which a set of metrics are considered to validate the web application. These metrics used for testing are not fixed for web application, but they vary from organization to organization i.e. small to large. Organizations are facing problems with the selection of metrics for testing due to the constraints such as resources, cost, and time. There exists little research on the selection of metrics by software testers in organizations [32]. Along with this, the general issues and challenges (related to tools, development, metrics, and time) faced by the software testers while testing the PSR attributes of web applications. Some of the challenges related to tools are mainly because of existing drawbacks in them. Drawbacks in tools are also not particularly provided in literature and a new set of features needed by testers while using the tools for testing the quality attributes are not known clearly and need of identifying the existing testing tools [44] that support quality attributes, hence there is a need for further research [13, 41, 14]. All these identified issues represent a gap in the research on web testing of quality attributes (PSR) and provide a motivation for our research. We mainly focus on identifying the challenges (related to tools, development, metrics, and time) faced by the software testers while testing the PSR attributes of web applications. And it is also deals with finding of tools and metrics used by software testers for testing PSR attributes [34]. Chapter 3 Method

This chapter mainly focuses on the purpose of the research and process carried out in achieving it. It consists of six sections and the structure is as follows:

• Section 3.1 describes the purpose of the research

• Section 3.2 provides the research questions selected for addressing the aim

• Section 3.3 focuses on the research method selected for answering the selected research questions

• Section 3.4 concentrates on the techniques used for collecting the data

• Section 3.5 explains the method used for analyzing the collected data

• Section 3.6 discusses the validity threats.

3.1 Research purpose

The purpose of this research is to identify the challenges faced by the software testers while testing performance, scalability and reliability attributes in web applications and also to identify the available tools and metrics for testing the PSR attributes of web applications.

3.1.1 Objectives For achieving the purpose of this research, six objectives were identified. They are as follows:

• O1: To identify the common metrics, that are needed to be considered by software testers in general while testing the PSR attributes of web applica- tions.

• O2: To identify a list of tools available for testing the PSR attributes by the software testers and also to find the tools used by software testers in practice.

18 Chapter 3. Method 19

• O3: To identify drawbacks in the tools used by software testers and also the improvements suggested by them.

• O4: To identify the list of challenges faced and mitigations used by the software testers while testing the PSR attributes of web applications.

• O5: To analyze the identified mitigations which is useful for the software testers to address the challenges faced while testing PSR attributes in prac- tice.

• O6: To identify the most important attribute among the selected three at- tributes (PSR) used by the software testers.

3.2 Research questions

In order to achieve the goals of the research, we have framed the following research questions.

• RQ1: What metrics exist for testing PSR attributes of web applications? • RQ1.1: What metrics are suggested in the literature for testing PSR attributes? • RQ1.2: What metrics are used by software testers in practice for test- ing PSR attributes? • RQ1.3: Why are particular metrics used or not used by software testers?

• RQ2: What tools exist in general for testing PSR attributes of web applications and drawbacks observed in the tools from practice?

• RQ2.1: What tools are suggested in the literature for testing PSR attributes? • RQ2.2: What tools are used by the software testers in practice for testing PSR attributes? • RQ2.3: What are the drawbacks of the tools used by software testers in practice and improvements suggested by them?

• RQ3: What are the challenges faced by software testers in general and mit- igation strategies available in literature for challenges while testing PSR attributes of web applications?

• RQ3.1: What are the challenges faced by software testers and what are the mitigation strategies available in literature for testing PSR attributes? Chapter 3. Method 20

• RQ3.2: What are the challenges faced by software testers in practice while testing PSR attributes? • RQ3.3: Does the existing measures from the literature can solve the challenges faced by software testers in practice?

• RQ4: Which attribute is considered the most important among PSR attributes by software testers in practice?

3.2.1 Motivation The main motivation for formulating the research questions are as follows [34]:

• RQ1: The main reason for framing this RQ is to identify the metrics available for testing PSR attributes. Kanij et al. [32] discussed a few metrics and the need for considering other metrics while testing, but only little research was carried out in this area. This research question helps to identify the metrics that are used in practice and in literature for PSR.

• RQ2: There are many tools available for testing web applications, but a clear description about the availability, type of tool, language support, metrics and supportable platform for each tool are not provided in the literature for PSR attributes. The drawbacks existing in tools are identified through experience by using tools and also it is possible to collect new information regarding drawbacks which is not available in literature. So through this RQ, we provide a clear description of the tools and their characteristics which is helpful for software testers or practitioners while selecting tools and also provides drawbacks observed by software testers.

• RQ3: The software testers may face challenges while testing the web applica- tions. In our study, the testing process at Ericsson is consuming more time. Hence, this RQ helps in identifying the challenges faced by software testers while testing and likely to provide a mitigations based on the literature.

• RQ4: The most important attribute among PSR is identified through this research question. This helps the software testers when there is a need to deliver the product early and when there is little time available for testing. This in turn helps to test the most important attribute first and the other attributes can be tested later based on the available resources and time.

3.3 Research method

This section focuses on the methodology used for carrying out the research. In the software engineering discipline there are mainly four research methods available [45]. The research methods are: Chapter 3. Method 21

• Experiment

• Survey

• Case study

• Action research

Each research method has benefits and liabilities. Based on suitability and flexi- bility the research methods are selected. The motivation regarding the selection and rejection of other methods are explained below. Experiment: An experiment can be used to identify the cause and effect rela- tionship between selected variables. These variables are of two types dependent and independent variables, the effect on the dependent variables can be identi- fied by changing the independent variables by conducting an experiment. The experiments are generally conducted in a specific or controlled environment. So repeating the same effect is difficult i.e. creating a similar environment is very difficult and results also cannot be generalized. As our research questions are not geared towards identifying cause-effect relationships, hence experiment is not an appropriate method for our study. Survey: Survey is a method to get a generalized data from many resources globally. The data can be collected from a selected population by sampling from a large population. Survey is not an appropriate research method for our study because of the time schedule, as it takes long time to collect the responses from various respondents. As in our case, there is an opportunity to conduct a study in real environment. Case study: Case study is a method to better understand a phenomenon and explore the research area. As case study is conducted in real-world settings and consists of the high degree of realism [45]. As our research is focusing more on identifying challenges faced by software testers while testing the non-functional attributes (PSR) so the data should be collected from the subjects who experi- enced these challenges while testing. As we have access to resources and suitable to our research questions we opted case-study as our research method. Action research: Action research is a method to investigate and improve the process of research area. It is a type of case study in which the researcher inves- tigates and changes the process whereas case study is just observational [45]. As our research is more focused on exploration of the topic we have not opted action research as our method.

3.3.1 Systematic mapping study A systematic mapping study is a secondary research method which provides an overview of the topic to the researchers and gives a brief idea about the topic. It provides the frequency of publications in the research area. In this research we Chapter 3. Method 22 selected systematic mapping study to provide the data regarding the available tools, metrics and challenges regarding testing of PSR in web applications. Also the study helps us to design prompts for the interview i.e. it helps us to be on track on the interview by maintaining bullet points which are important for the research and the main data to be collected are in terms of the research goal. The overview of SMS study is provided in Appendix B. The systematic mapping study is conducted by following the guidelines pro- vided by Petersen et al [46]. The steps carried out in the systematic mapping study are shown in the figure 3.1 and a description of each step is provided in the next subsections.

Figure 3.1: Systematic mapping study process

3.3.1.1 Design of systematic mapping study protocol A protocol is designed in order to conduct a systematic mapping study. This protocol consists of the following sections. Chapter 3. Method 23

3.3.1.1.1 Selection of keywords To form a search string, first we need to identify the keywords to be included. From the research questions, we extracted keywords and along with the keywords their synonyms are also considered. The identified keywords based on RQ’s are shown in table 3.1.

Table 3.1: Keywords used for search string formulation Category Category Keywords Research Id Questions C1 Web applica- Web applications, web applica- RQ1, RQ2, RQ3 tion tion, website, websites, webap- plication, website, websites, webapplications C2 Quality Performance, scalability, relia- RQ1, RQ2, RQ3 attributes bility, reliable, scalable C3 Testing Testing, verify, verification, RQ1, RQ2, RQ3 validate, validation C4 Tool Tool, tools, framework RQ1 C5 Metric Metric, metrics, measures, RQ2 evaluate, evaluation C6 Challenges Challenges, challenge, mitiga- RQ3 tions, strategy, strategies

3.3.1.1.2 Formation of search strings

3.3.1.1.2.1 Boolean operators for search For searching the literature, two kinds of operators are considered. They are AND and OR. The operator AND is related to the keywords that must contain in the literature search i.e. the obtained results should contain all the keywords mentioned. The operator OR must contain any one word used in the search, but not necessarily every word used in literature search i.e. results obtained from the search contains at least one keyword referred in the search. These operators act as connectors between the formed keyword sets.

3.3.1.1.2.2 Formation of string We use the Boolean operators and key- word sets which result in the search strings. For example, (set1) AND (set2) the formed search string is: • ((Web testing) OR (Web application testing) OR (Website testing)) AND ((Assessment) OR (Assess) OR (Evaluate) OR (Evaluating) OR (Measur- ing) OR (Measure) OR (Metrics) OR (Metric) OR (Web metrics)) AND ((Quality)OR (Attributes) OR (Performance) OR (Scalability) OR (Scal- able) OR (Reliable) OR (Reliability)) AND ((Web applications) OR (Web- site) OR (Web application) OR (Webpage). Chapter 3. Method 24

3.3.1.1.3 Selection of scientific library databases Based on popularity and relevance the selected databases for searching the articles are: The selected databases for searching are:

• INSPEC

• IEEE

• SCOPUS

• ACM

• WILEY

3.3.1.1.4 Study selection criteria The criteria for selecting and rejecting the literature are provided in this section.

3.3.1.1.4.1 Inclusion criteria The inclusion criteria consist of factors we considered for accepting or selecting the literature.

• Literature regarding web application testing only.

• Literature which is peer-reviewed.

• Literature which is in English only.

• Literature from the year 2000- 2016.

• Literature which consists PSR attributes or in a combination i.e along with other attributes.

• Literature which focuses on individual attributes (i.e. only related to PSR) are also considered.

3.3.1.1.4.2 Exclusion criteria The exclusion criteria consist of factors we considered to reject the literature.

• Literature consisting of unrelated abstract from the topic.

• Literature which cannot be accessed or unavailable. Chapter 3. Method 25

3.3.1.1.5 Formulation of Search strings As keywords are provided in the above section 3.3.1.1.1, based on keywords search strings are formed and provided in table 3.2.

Table 3.2: Search strings used for selection of literature Database Category Search strings Tools ("web application" OR "web applications" INSPEC OR "web sites" OR "web site" OR website* OR webapplication*) AND (performance OR reliab* OR scalab*) AND (testing OR verify OR verification OR validation OR validate) AND (tool* OR framework) Challenges ("web application" OR "web applications" OR "web sites" OR "web site" OR website* OR webapplication*) AND (performance OR reliab* OR scalab*) AND (testing OR verify OR verification OR validation OR validate) AND (challenge* OR mitigations OR strat- egy OR strategies) Metrics ("web application" OR "web applications" OR "web sites" OR "web site" OR website* OR webapplication*) AND (performance OR reliab* OR scalab*) AND (testing OR verify OR verification OR validation OR validate) AND (metric* OR measures OR evaluate OR evaluation) Tools ("web application" OR "web applications" IEEE OR "web sites" OR "web site" OR website* OR webapplication*) AND (performance OR reliab* OR scalab*) AND (testing OR verify OR verification OR validation OR validate) AND (tool* OR framework) Challenges (("web application" OR "web,applications" OR "web sites" OR "web site" OR,website* OR webapplication*) AND (performance OR reliab* OR scalab*) AND,(testing OR verify OR verification OR validation OR validate) AND (challenge*,OR mitigations OR strat- egy OR strategies)) Chapter 3. Method 26

Metrics ("web application" OR "web applications" OR "web sites" OR "web site" OR website* OR webapplication*) AND (performance OR reliab* OR scalab*) AND (testing OR verify OR verification OR validation OR validate) AND (metric* OR measures OR evaluate OR evaluation) TITLE-ABS-KEY ((“web application" OR "web applications" OR "web sites" OR "web site" OR website* OR webapplication*) Tools AND (performance OR reliab* OR scalab*) SCOPUS AND (testing OR verify OR verification OR validation OR validate) AND (tool* OR framework)) Challenges TITLE-ABS-KEY ((“web application" OR,"web applications" OR "web sites" OR "web site",OR website* OR webapplica- tion*) AND (performance OR reliab* OR scalab*) AND,(testing OR verify OR veri- fication OR validation OR validate) AND (challenge*,OR mitigations OR strategy OR strategies)) Metrics TITLE-ABS-KEY ((“web application" OR "web applications" OR "web sites" OR "web site" OR website* OR webapplica- tion*) AND (performance OR reliab* OR scalab*) AND (testing OR verify OR ver- ification OR validation OR validate) AND (metric* OR measures OR evaluate OR eval- uation)) (+("web site" "website" "web application" "webapplication")+(performance reliability Tools scalability scalable reliable)+(testing verify ACM verification validation validate)+(tool framework)) (+("web site" "website" "web application" "webapplication")+ (performance reliability scalability "quality attribute" "quality Challenges requirement")+(testing verify verification validation validate)+(challenge strategy mitigation)) Chapter 3. Method 27

(+("web site" "website" "web application" "webapplication")+(performance reliability scalability "quality attribute" "quality Metrics requirement") +(testing verify verification validation validate) +(metric measure evaluate evaluation)) Tools “web applications” OR “web application” OR WILEY “web site” OR “web sites” OR website* OR webapplication* in Abstract AND perfor- mance OR scalab* OR reliab* in Abstract AND testing OR verification OR verify OR validation OR validate in Abstract AND tool* OR framework in Abstract Challenges “web applications” OR “web application” OR “web site” OR “web sites” OR website* OR webapplication* in Abstract AND perfor- mance OR scalab* OR reliab* in Abstract AND testing OR verification OR verify OR validation OR validate in Abstract AND challenge* OR mitigations OR strategy OR strategies in Abstract Metrics “web applications” OR “web application” OR “web site” OR “web sites” OR website* OR webapplication* in Abstract AND perfor- mance OR scalab* OR reliab* in Abstract AND testing OR verification OR verify OR validation OR validate in Abstract AND metric* OR measures OR evaluate OR eval- uation in Abstract

3.3.1.1.6 Execution of search strings and applying exclusion criteria Initially, the designed search strings are executed. Along with that, the exclusion criteria are applied on the obtained results from the databases. The exclusion criteria mainly contain the year, document type, content type, and language and subject area. The search results obtained before and after exclusion criteria are shown in table 3.3.

3.3.1.1.6.1 Initial article selection and exclusion of duplicate arti- cles The relevant literature is selected by reading the abstract, If the abstract is relevant to the research then the article is selected and if not relevant then the article is rejected. Initial search results and articles remaining after removal of Chapter 3. Method 28

Table 3.3: Search results before and after applying exclusion criteria Database Category Search results af- After applying ex- ter execution clusion criteria Tools 377 352 INSPEC Challenges 154 147 Metrics 407 380 Tools 377 368 IEEE Challenges 162 155 Metrics 583 564 Tools 801 397 SCOPUS Challenges 368 165 Metrics 869 412 Tools 521 481 ACM Challenges 481 447 Metrics 446 407 Tools 80 75 WILEY Challenges 50 41 Metrics 126 108 duplicate articles are shown in table 3.4 and table 3.5.

Table 3.4: Initial search selection results Database Category Total search re- Selected articles sults Tools 352 98 INSPEC Challenges 147 42 Metrics 380 66 Tools 368 76 IEEE Challenges 155 23 Metrics 564 52 Tools 397 74 SCOPUS Challenges 165 27 Metrics 412 47 Tools 481 8 ACM Challenges 447 14 Metrics 407 1 Tools 75 2 WILEY Challenges 41 3 Metrics 108 6 Chapter 3. Method 29

Table 3.5: Search results after removing duplicate articles in each database Database Total number of Repeated Remaining articles articles articles INSPEC 206 36 170 IEEE 151 24 127 SCOPUS 148 28 120 ACM 23 0 23 WILEY 11 0 11 Total 539 88 451

A total of 451 articles are obtained from all the databases. These 451 articles also include duplicates i.e. for example articles obtained from INSPEC database are overlapping to some extent with articles obtained from the IEEE database. A total of 134 duplicate articles are identified after merging articles obtained from all the five different databases. After removing all duplicate articles, a total of 317 articles remain.

3.3.1.1.7 Selected literature for systematic mapping study After study- ing the introduction and conclusion of all the 317 articles, we selected 97 articles which are relevant to our study. The articles are selected by both the authors with cross verification in order to avoid missing of the relevant articles. Verification is done during the article selection process, i.e. at the time of screening and search string execution. At the time of search string execution one author executes the framed search strings. Whereas, the second author again recheck and re-execute the search strings in order to verify whether the obtained results are similar in both the cases. During screening of articles both the authors discussed about articles whether the articles are relevant to the study.

3.3.1.1.8 Study and assess the articles The selected articles are studied thoroughly in order to identify the tools, metrics and challenges for PSR at- tributes. Based on specification, the articles are selected and the important data is highlighted while reading.

3.3.1.1.9 Data extraction strategy For data extraction, we use a template shown in table 3.6. This template consists of data field in which it further consists of data key and value pair.The overview of SMS study is provided in Appendix B. Chapter 3. Method 30

Table 3.6: Data extraction form Data key Value Research question General Study ID Integer Article title Name of article Year of publica- Article published date tion Author name Name of the authors Publication Domain in which article published venue Research Method used by authors in the study method Process Which metrics are mentioned in the Metrics article? RQ 1 Which metrics are used for evaluation in the article? Which metrics are described in the ar- ticle? Which attributes are mentioned? Attributes RQ 1, RQ 2, RQ3 Which attributes are described in the article? Which tools are mentioned in the ar- Tools ticle? RQ 2 Which tools are used and evaluated in the article? Which tools are given brief descrip- tion? What challenges are mentioned or de- Challenges RQ 3 scribed in the article? For which challenges the article more focused on?

The data is extracted by both the authors for each article separately. And the extracted data from both the authors is crosschecked with each other to find accuracy in the data extraction. Most of the data extracted by both the authors are overlapped to some extent. Whereas, the extracted data for other articles vary between authors and the reasons for the difference are discussed. Finally, both the authors came to a conclusion after a discussion. The data is extracted from our selected primary studies. The mapping of research parameters, research Chapter 3. Method 31 attributes, and research methods in SMS are provided in the Appendix A.

3.3.2 Case study The methodology we followed for conducting the research is explained in this section. In order to carry out the case study, we followed the guidelines provided by [45] for our research. There are mainly five steps involved in the case study and they are represented in figure 3.2.

Figure 3.2: Case study process steps

3.3.2.1 Case study design Case study consists of two important concepts. They are case and unit of analysis. In our research, the case we selected is a typical large scale company Ericsson, India and unit of analysis are mainly software testers in the projects who are working for web applications. The reason for selecting the above case is avail- ability to the researcher and suitability to the research. The case can also be replicated to include validity to research.

3.3.2.2 Protocol preparation In order to collect data, a protocol is prepared prior to the collection of data. The protocol mainly consists of two sections data collection and data analysis. Data collection deals with methods which are used to collect data and analysis section deals with the method which is used for analyzing the collected data. These two sections are explained in the sections 3.3.2.3 and 3.4. Chapter 3. Method 32

3.3.2.3 Data collection Data collection can be done in three levels mainly for a case study [45]. They are first degree, second degree and third degree.

• First degree: In this, the researcher can collect required data from the subjects directly by interacting with them.

• Second degree: The researcher collects the raw data from the subjects without interacting with them. The raw data is collected through monitoring and observing the subjects.

• Third degree: In this the researcher collects the information from the previ- ously available artifacts.

In our research, we collected the data from two levels as a part of data triangula- tion, which helps to validate the data obtained from two sources. The two levels are first degree and third degree. In first degree, we opted for interviews and in third degree we opted for analysis of the project documentation i.e. test reports.

3.3.2.3.1 Interviews In order to conduct interviews, the first step is to select the appropriate type of interview to be carried out. We selected semi-structured interviews for our research, as it can provide the dialogue making capability through which a discussion can be done in the interview. The population for interview was selected based on convenience sampling. As convenience sampling [47] provides the way to select the subjects based on availability and convenience to researchers. After the type of interview is fixed, we designed a questionnaire which consists of two types of questions. First one is demographic questions which are as follows.

• Age

• Qualification

• Experience

• Number of projects worked

• Testing experience Second one is technical questions which deals with questions related to research questions. The technical questions are divided into four parts.

• The general questions related to web testing and quality attributes

• The questions related to tools and subjects experience of using the tools Chapter 3. Method 33

• The questions related to metrics and subjects experience of measuring the metrics

• The questions related to challenges in terms of tools, time, metrics and devel- opment A consent form is prepared in order to ensure confidentiality and anonymity to the interviewee. Before questioning the subjects, the consent form is given to them to read and if they agree to the terms then the interview process is started. Two copies of consent form are maintained for each interview i.e. one is for subject and the other copy is for interviewer. The consent form contains the contact details of the researchers and supervisor, so that the subject can contact them if he/she feels anything to share. The consent form is provided in appendix H. In our research, we mainly selected software testers as interviewees as the research is related to web testing. A list of testers is collected and intimation is given to them in the form of e-mail to schedule an interview with them. For the interview, a beginning script and ending script is prepared. Each interview is conducted by introducing the research topic and case study, then asking the basic demographic questions and then moving to the technical ques- tions. Following the suggestions by Runeson and Höst [45], we used a pyramid model for the interview session in which the questions begin with specific ones which are demographic questions followed by open questions which are technical questions as shown in figure 3.3 and the questions are provided in Appendix F.

Figure 3.3: Pyramid model for interview questions

The interview is audio recorded and is conducted in a private room where there are only two persons present during interview, the researcher and subject. After the interview is done the audio recording is listened again and transcribed into notes for analysis purpose. The transcribed notes are provided to respective subject and get reviewed by them if there are any mistakes they can inform us. This strengthens the validity of the recorded data. A total of 12 interviewees are selected in this case, of which eight interviewees are from case company and Chapter 3. Method 34 four interviewees are from other three organizations. All the three organizations are software companies from India and the interviewees are mainly experienced in web testing and the number of interviewees addressed from each company are provided in table 3.8. The details of the 12 interviewees are given in the table 3.7. Among these interviewees one to eight represent interviewees from case company and remaining four interviewees from nine to 12 are from other organizations.

Table 3.7: Details of interviewee Inter- -viewee Qualification Role Number Experience Interview ID of in testing dura- projects tion (min- utes) 1 B Tech in Test quality 8 5.9 34 CSE architect 2 BE in ECE Senior solution 5 8.5 40 integrator 3 MS in soft- Verification 3 10.5 38 ware systems specialist 4 B. Tech Senior QA 10 6+ 34 5 B. Tech Senior solution 8 7.5 36 integrator 6 MCA Senior solution 2 5.8 50 integrator 7 B Tech in Verification 6 10 43 CSE specialist 8 B Tech in Senior QA 3 7 32 CSE 9 B.sc in com- Tester special- 6 8 45 puter science ist 10 MCA Senior tester 5 7.4 35 11 BE in me- Tester special- 6 9+ 36 chanical ist 12 B. Tech Software tester 2 3 38 Chapter 3. Method 35

Table 3.8: Overview of selected companies Company Domain Number of interviewees Case company Telecom 8 Company 1 E-commerce 2 Company 2 E-commerce 1 Company 3 Retail 1

3.3.2.3.2 Documentation Along with the interviews, we have also selected this technique as another data source for collecting the information which is use- ful for our study. In this case, the available documents of previous projects from the case company are collected based on convenience sampling [47]. As conve- nience sampling provides flexibility to collect the documents which are available to researchers. The collected documents consist of the information related to the process of non-functional testing carried out in the previous project of the case company. It is not possible to obtain all the information from the interviews, there might be chances of missing information during the interview process i.e. the interviewee might not provide all the information or does not remember. A total of 18 documents from the previous projects are collected for data trian- gulation and also used to identify the information that is not addressed in the interviews. The documents mainly consist of test reports which address scala- bility and performance testing. Along with that, it is also used to validate the results obtained from the interview. These documents help in identifying what work has been really carried out in the company. The selected data collection techniques are used to answer the research ques- tions and to fulfill the objectives, these are presented in table 3.9. In the table 3.9, ’X’ represents the selected data collection technique.

Table 3.9: Research questions and their respective data collection technique Research Sub research SMS Interviews Documents Objective questions questions fulfilled RQ 1.1 X RQ 1 RQ 1.2 X X O1 RQ 1.3 X RQ 2.1 X O2 RQ 2 RQ 2.2 X X Q 2.3 X O3 RQ 3.1 X O4 RQ 3 RQ 3.2 X X RQ 3.3 X X O5 RQ 4 - X O6 Chapter 3. Method 36

3.4 Data analysis

The data collected from the systematic mapping study and interviews are ana- lyzed using thematic analysis. The thematic analysis is a method, from which the useful data can be identified, analysis can be done and the themes can be observed and reported. The thematic analysis is selected as it is mainly used for reporting the reality, meanings and experience of participants [48]. As there are many qualitative data analysis techniques such as content anal- ysis, grounded theory but the reason for selecting thematic analysis is because thematic analysis identifies important data from large set of data corpus, it pro- vides a way to analyze data collected from different time and situations. So improvement in data analysis can be done through thematic analysis [49]. According to Braun and Clarke [49], there are six phases in thematic analysis as represented in figure 3.4 and the way we approached these six phases are described next.

Figure 3.4: Steps for thematic analysis

3.4.1 Familiarizing yourself with the data 3.4.1.1 Systematic mapping study In this phase, the data which is extracted while reading the literature are given a further reading. The highlighted data is cross-studied by both the authors to get an initial idea. Chapter 3. Method 37

3.4.1.2 Interview In this phase, the interviews we recorded are listened first and we transcribed the data into the document by playing the recorded audio at a very low speed. During transcribing process, the field notes is also verified to check whether the accurate data is transcribed or not. After transcribing the data, the authors thoroughly read the extracted data to get an initial idea. The transcribed documents are imported into Nvivo tool which is a qualitative analysis tool from which we can read the data and analyze it more easily. A repeated reading is done before generating initial codes.

3.4.2 Generating initial codes 3.4.2.1 Systematic mapping study In this phase, the collected data from literature are imported into Nvivo tool. This tool is used for analyzing, coding and visualizing the data. It is a qualitative analysis tool mainly used for data analysis purposes, as it presents the data in a simple and organized way. The open coding and closed coding is used for categorizing the data. Open coding focuses on identifying the codes during and after the analysis. These codes are formed based on the evolved ideas during the analysis process. In closed coding, a set of codes are formed before analysis. As these codes are formed based on the framed research questions and on the aim of the research. For example: In open coding, we coded different categories of tools from the data collected from interviews for tools. These are based on the collected data and not defined before the collection of data. They are commercial, opensource, internal, framework, freeware or trial, monitoring. So open coding helps to iden- tify the codes from the data. In closed coding for example the main aim of the RQ 1.1 is to identify the tools that are available from the literature. Based on the research question we identified three codes such as performance, scalability and reliability before analysis. As the categorization is basically depending on the research questions. So for RQ 1.1, RQ 2.1, RQ 3.1, RQ3.3 we used literature. So the classification of codes is initially depending on the RQ’s mentioned above. During the coding process a set of new codes are also obtained from literature which are different from initial codes. The obtained codes are used in the implementation of data extraction strategy as provided in the section 3.3.1.1.9.

3.4.2.2 Interview As said above, the same tool is used to analyze the data obtained for interviews, initially the transcribed documents are imported into tools and data is initially coded based on RQ1.2, RQ 1.3, RQ 2.2, RQ 2.3, RQ3.2, RQ3.3 and RQ 4. The Chapter 3. Method 38 codes are formed based on the RQ’s and from reading the text from transcriptions, we have also created other codes which are not initially formed. A total of 12 transcriptions are used for analyzing the data.

3.4.3 Searching for themes 3.4.3.1 Systematic mapping study After the initial codes are defined, a set of themes are obtained. The obtained themes are again divided into subthemes, which simplifies the process of classifi- cation. The data which does not relate to the existing themes, are formed into new themes.

3.4.3.2 Interview As the documents are imported into the tool, while reading the documents if any sentence or paragraph is found as important then an initial node is created using the tool. If any data relevant to the created node is found while reading in the later stages, they are coded into that node. As the tool provides a feature to code the selected sentence or paragraph, so we used this feature to include the coded data into a node. In this way all the interviews are analyzed and arranged them into codes. As the above step is an initial generation of codes, after thorough analysis of collected data the nodes are again rearranged into useful themes based on the extracted data. There is also an extra information which is collected from interviews, we have also created the themes for the extra information. And the themes are classified as per the four research questions and each theme is again divided into sub-themes. These subthemes help in classifying the information into low level. The themes are organized by the tool which makes the process easy to conduct.

3.4.4 Reviewing themes 3.4.4.1 Systematic mapping study and interview The themes obtained from the above step are reviewed by the authors of the re- search. We have compared the coded data with the data corpus to find whether the formed themes are relevant to the topic. The formed themes are again rechecked to make sure that all the coded data are available in relevant themes. So based on this, some themes are added and some are removed or merged into one theme if they find any data that is not relevant to the theme. Later the themes are used for generating the thematic maps. Chapter 3. Method 39

3.4.5 Defining and naming themes 3.4.5.1 Systematic mapping study and interview Choosing an appropriate name for the theme is a challenging task. The name of the theme itself explains the content with which it is dealing. So we used several names for the theme initially and fixed to one name once we are satisfied with it. So we just used normal names to identify easily what that theme consists of. We have used brainstorming technique (between authors) and applied it several times until we find a proper name and later we fixed to one name to each of the theme and all the names of the themes are provided in the result section 4.

3.4.6 Producing the report 3.4.6.1 Systematic mapping study and interview The analyzed data and results derived from data are need to be reported properly to ensure the results accurately without providing any unnecessary data. The results are provided in an easy follow up way based on RQ’s and the methods. The documented results are provided in section 4. The basic thematic structure for interview is shown in the figure 3.5. This structure is obtained from the tool NVivo.

3.5 Validity threats

The results from the research should not be biased and should be trustworthy then only the particular research is said to be conducted in an ethical way [45]. In order to establish confidence on the conclusions of the research, validity threats need to addressed and mitigated. Runeson and Host [45] discussed four types of threats which need to be addressed. They are:

3.5.1 Construct validity Construct validity is related to representation of what researchers have in mind and what they have investigated related to research questions [45]. The protocol prepared for the SMS was sent to the supervisor and based on the approval the protocol was implemented i.e. the search strings were reviewed four times. The data was analyzed by the researchers and for validation purpose the analyzed data was cross-checked by two researchers that are available in our research. The limitation is some of the articles which are relevant but could not include in the selection as they are not available to download. The interview protocol was prepared and a mock interview was conducted prior to the main interview to validate the questionnaire. Based on the feedback Chapter 3. Method 40 of the mock-interview some modifications are done. The questionnaire was sent to supervisor for approval and based on the approval from the supervisor, interviews were conducted. To get the accurate data, the subjects were provided with a consent form prior to the interview which ensures the anonymity and confidential of the subject. We prepared a script for interview which consists of basic definition of PSR attributes so the interviewee may not confuse due to the terms used in questionnaire. This ensures the interviewer and interviewee are on the same track. For reduction of errors in data analysis, a tool was used for analyzing the data and interpreting the relation in the data corpus.

3.5.2 Internal validity Internal validity is related to an unknown factor affecting the studying factor. It may cause the changes but the researcher does not know the changes are caused by the third factor. Improper selection of literature can be a threat, so in our research the selection of literature was done by thorough reading of abstracts and later the main sections in the articles. The paper selection was done by both the authors, so the chance of missing relevant literature for the study is low. Both the authors performed the selection process individually and cross checked with each other. The articles which are not similar for both the authors are discussed with each other and selected the articles only if both authors were satisfied. The selection of databases is the limitation to the study. It is not possible to select all the technical databases, so only the databases which are popular and known to researchers were considered for this study. The selection of interviewees may affect the collected data, so the subjects with previous experience in web testing were selected based on convenience sampling and the selected respondents have more than three years of experience. So the chances for getting the misinterpretation of questions are very low. Since our study contains testing of three attributes PSR, there may be chance of imposing the particular attribute with another attributes or confusion may arise, so a proper care is taken to ensure whether the interviewee is considering only PSR and no other attributes. We have also used another data source called documentation. The documenta- tion is a third degree method which ensures the data collected from the interviews is validated.

3.5.3 External validity External validity is related to generalization of the results. According to Runeson and Host [45], external validity is termed as “to what extent it is possible to generalize the findings, and to what extent the findings are of interest to other people outside the investigated case.” Chapter 3. Method 41

SMS was conducted by preparing a protocol and search strings are formed prior to literature search. These search strings are consistent for all databases and selection of literature is done as per inclusion and exclusion criteria. As per our knowledge, we selected the relevant literature for the study but there may be a chance of missing important literature due to unavailability. So this is the limitation of our study. We can say that results from the SMS can be generalized as we covered all the relevant literature excluding the unavailable ones. And also the results may find interesting as the systematic mapping study mainly reveals the state of art in the research area. Further researchers can find the area which has a very less research done, which helps to keep more focus on it. The interviews were conducted with different roles of subjects and also other interviews were also conducted from other organization in order to validate the results for more number companies and individuals outside the studied company. As we did less number of interviews with other organizations, we consider these interviews as sanity check to validate the results from case company are typical or not. So we can find the partial generalized results in this study. Even though the case study is a qualitative process, replication of the results is quite difficult as same environment and situations are difficult to replicate. But according to Runeson and Host [45], the qualitative study is more of exploring the area, which is less concerned regarding replication of findings. The details of the interviewees are mentioned in table 3.8.

3.5.4 Reliability Reliability is related to what extent the research is based on the researcher per- spective. To be a reliable research, it should not be affected by researcher knowl- edge and the data should be analyzed as per the obtained data. The systematic mapping study was conducted in a certain process where we followed the protocol which was initially prepared. So the researcher does not include any new details which are not in protocol. So the author’s perspective does not affect the result of systematic mapping study. Even though there may be a chance of missing some data from the study which is a limitation for our research. The data collected from the interview were in the form of audio recordings and through these recordings we transcribed into document. The transcribed document was send to the interviewees to verify if there was any error or mis- takes available in the transcription. Almost all interviewees replied with minor to no errors in the document. So this way the data is not misinterpreted. As said above a mock interview was conducted prior to interview which validates the questionnaire and the data analysis method was applied by studying the guide- lines provided by [49]. So the results derived from this research can be replicated again which ensures reliability in the research. Chapter 3. Method 42

Figure 3.5: Themes formed in Nvivo tool for interviews Chapter 4 Results and Analysis

This chapter presents the results obtained from the systematic mapping study and case study. The collected data is analyzed by using thematic analysis. The thematic analysis is conducted by following certain guidelines provided by Braun and Clarke [49]. The data required for our research is gathered from three sources.

• Systematic mapping study For systematic mapping study a total of 97 arti- cles are selected out of which 85 articles focuses on performance, 25 articles focus on scalability and 27 articles focus on reliability. Based on our re- search questions, three facets are formed in the systematic mapping study. They are metrics, tools and challenges.

• Interviews For interviews a total of 12 interviewees are selected, eight of them are selected from Ericsson and remaining four interviewees are selected from the other three organizations. The remaining four interviewees are consid- ered from the other organizations, in order to validate the results collected from the Ericsson. Out of these 12 interviewees, six interviewees addressed reliability, ten interviewees addressed scalability and 12 interviewees ad- dressed about performance attributes.

• Documents The documents available from the previous projects are also col- lected. The documents are collected only from case organization as we are unable to retrieve documents from other organizations due to confidentiality and insufficient contacts. From case organization a total of 18 documents were collected from the previous projects. All the collected documents focus only on the performance and scalability attributes. We are unable to collect the documents that focus on reliability, they were not available to us due to confidentiality. Out of these 18 documents, 14 documents concentrate on performance and 6 documents on scalability.

The data collected from the three sources i.e. SMS, interviews, and documents are presented in sections 4.1,4.2, and 4.3. The number of sources addressing the research attributes are shown in the figure 4.1.

43 Chapter 4. Results and Analysis 44

Figure 4.1: Number of sources addressing the research attributes

4.1 Facet 1: Metrics for testing PSR attributes

In this section, we present the results of our first research question by using the data collected from systematic mapping study, interview and documents. This section is structured into three subsections, first subsection consists of metrics obtained from systematic mapping study which answers the RQ1.1, second sub- section consists of metrics obtained from both interview and documents which an- swers RQ1.2 and third subsection provides criteria for selection of metrics which answers the RQ1.3. Generally metrics are the measures used for measuring the attributes of soft- ware entities [50]. For this study, the metrics related to PSR are collected. A total of 69 metrics are identified from SMS, 30 from interviews and 16 metrics from documents. A total of 115 metrics identified from all the sources and the description of each metric is provided as list in the Appendix C.

4.1.1 Systematic mapping study This section mainly addresses the data collected from systematic mapping study. Out of 97 articles, 80 articles deal with the metrics related to PSR. The metrics Chapter 4. Results and Analysis 45 are classified into themes and based on these themes the metrics are collected from the systematic mapping study. The thematic map for metrics is provided in figure 4.2.

Figure 4.2: Thematic map for metrics from SMS

According to the thematic map, metrics are classified into three themes or schemas as shown in figure 4.2. Each schema is described below with the list of metrics. A total of 39 metrics are categorized under performance, 17 metrics un- der scalability and 13 metrics under reliability. From this study, we also identified top most two metrics mentioned in the articles for each attribute by calculating the percentage of their occurrence in a total of 80 articles. The metrics obtained from SMS in each schema are provided below.

4.1.1.1 Performance Performance schema is related to the performance attribute, in which the perfor- mance of the web application is measured using metrics. So the metrics available in the literature are collected and analyzed. We observe that response time (80%) and throughput (57%) is most covered metrics from the literature. We observed that more number of articles focused on performance related metrics than the scalability and reliability attributes. We can observe the pat- tern that performance attribute in web applications received more attention than scalability and reliability. The list of metrics is provided in the table 4.1.

4.1.1.2 Scalability Scalability schema is related to the scalability attribute, in which the scalabil- ity of web application is measured by using scalability metrics. So the metrics available in literature are collected and analyzed. According to Guitart et al.[21], scalability is considered as a sub part of performance. The scalability is per- formed to check whether the system is capable to handle the heavy load without Chapter 4. Results and Analysis 46

Table 4.1: Performance metrics ID Performance metric Frequency count 1 Response time 63 2 Throughput 45 3 Number of concurrent users 23 4 CPU utilization 19 5 Number of hits per sec 14 6 Memory utilization 13 7 Disk I/O (access) 8 8 Latency 8 9 Think time 7 10 Elapsed time (disk) 6 11 Processor time 5 12 Roundtrip time 5 13 Number of transactions per sec (http) 4 14 Number of HTTP requests 3 15 Load time 3 16 Cache hit ratio 3 17 Hit value 3 18 Session length 3 19 Capacity 2 20 Cache hit 2 21 Disk utilization 2 22 Network traffic (bandwidth) 2 23 Requests in bytes per sec 2 24 Disk space 1 25 Hit ratio 1 26 Page load time and request time 1 27 Availability 1 28 Number of connections per sec (user) 1 29 Session time 1 30 Transaction time 1 31 Connect time 1 32 Request rate 1 33 Total page size 1 34 Total page download time 1 35 First byte time 1 36 DNS lookup time 1 37 Cache memory usage 1 38 Number of successful virtual users 1 39 Available memory 1 Chapter 4. Results and Analysis 47 any performance degradation. This performance degradation can be overcome by adding additional resources. From the analysis, we observed that scalability is mainly used for measuring performance of system after the addition of resources. As it is measuring performance of the system, some metrics may overlap for both scalability and performance. From the analysis, we can observe that scalability is second most mentioned attribute among the selected literature. We have also observed that response time (80%) and throughput (57%) are the most covered metrics from the literature. The list of metrics identified for scalability are provided in the table 4.2

Table 4.2: Scalability metrics ID Scalability metric Frequency count 1 Response time 63 2 Throughput 45 3 Number of concurrent users 23 4 CPU utilization 19 5 Number of hits per sec 14 6 Memory utilization 13 7 Disk I/O (access) 8 8 Latency 8 9 Number of connections per sec (user) 6 10 Disk queue length(request) 5 11 Number of transactions per sec (http) 4 12 Disk space 1 13 CPU model 1 14 CPU clock 1 15 Number of cores 1 16 Max. CPU steal 1 17 Available memory 1

4.1.1.3 Reliability Reliability schema is related to the reliability attribute, in which reliability of the web application is measured using reliability metrics. So the metrics available in the literature are collected and analyzed. We observed that MTBF (10%) and Number of errors (10%) are the most covered metrics from literature for reliability. From the analysis, we can observe that reliability is least mentioned attribute among the selected literature. The number of articles addressing the reliability attribute is less when compared to the other two attributes. The list of metrics identified for reliability attribute is provided in the table 4.3. Chapter 4. Results and Analysis 48

Table 4.3: Reliability metrics ID Reliability metric Frequency count 1 MTBF 8 2 Number of errors 8 3 Number of sessions 5 4 Failure rate (request) 4 5 MTTF 3 6 MTTR 2 7 Errors percentage 2 8 Error ratio 2 9 Number of connection errors 2 10 Number of timeouts 2 11 Successful or Failed Hits 1 12 Number of deadlocks 1 13 Rate of successfully completed requests 1 (good put)

4.1.2 Interviews and documents The data obtained from interviews is used to answer the research question RQ1.2 i.e. metrics used by software testers in practice for testing PSR attributes. The metrics are classified into themes and based on these themes, metrics are collected from the interviews. The thematic map for metrics is provided in figure 4.3.

Figure 4.3: Thematic map for metrics from interviews

The metrics theme as shown in figure 4.3. is classified into three sub themes. They are performance, scalability and reliability. The performance theme con- tains the data coded from 12 interviewees, scalability contains the coded data of 8 interviewees and reliability from 4 interviewees. Chapter 4. Results and Analysis 49

4.1.2.1 Performance All Interviewees (12 out of 12) reported metrics for the performance attribute. All the interviewees are familiar with the metrics they use in performance testing. They even mentioned some metrics which we did not found in literature such as rendezvous point, queue percentage and rampup and rampdown time.

• Rampup and Rampdown time: ramup increases load on server and measure breakpoint and rampdown is decreasing the load gradually inorder to re- cover from ramup.

• Rendezvous point: point where all expected users wait until all are emulated, and then all virtual users send request at one time.

• Queue percentage: percentage of work queue size currently in use. The metrics which are used in practice for measuring performance attribute are given below.

• Number of transactions per sec

• CPU utilization

• Memory utilization

• Processor time

• Throughput

• Disk I/O

• Number of hits per sec

• Number of requests per sec

• Number of concurrent users

• Network usage

• Server requests and response

• Speed

• Response time

• Rendezvous point

• Transactions pass and fail criteria

• Rampup time and rampdown time Chapter 4. Results and Analysis 50

• Error percentage

• Queue percentage

• Bandwidth

• Network latency

4.1.2.2 Scalability Eight out of 12 interviewees mentioned metrics they use for measuring scalability in practice. One of the interviewee stated that scalability is the main attribute for the studied company. So they have focused more on the attribute scalability as their domain is telecom. The scalability metrics obtained from interviews are given below.

• Load distribution

• Throughput

• Number of concurrent users

• CPU utilization

• Response time

• Memory utilization

• Disk I/O

• Number of requests per sec

4.1.2.3 Reliability Four out of 12 interviewees mentioned metrics they used for measuring reliability attribute. Some of the interviewees are not familiar with reliability testing, so they did not provide any information related to metrics. Majority of the interviewees provided the same type of metrics for reliability. The metrics which are used in practice for measuring reliability are given below.

4.1.2.3.1 MTBF MTBF is the mean time between failures and defined as the time gap between the identified failure to the next failure. 4 out of 12 interviewees provided this metric. The main reason mentioned by the interviewees is that the developed application should be fault tolerant. So to make sure that the application is reliable, it is tested from different scenarios. Chapter 4. Results and Analysis 51

One of the interviewee stated that “whatever is the scenario the application should work; the fault tolerance is a must in the developed application.” – Test specialist Another interviewee mentioned that “There should be very less amount of errors or zero error in order to be web application reliable, if the application is life critical then it should definitely be reliable without any error chance.” – Senior solution integrator.

4.1.2.3.2 Number of failures Number of failures an application has can be able to tell the application reliability. So the number of failures have their lenience based on the type of application. If the application is a normal application which provides basic information it is not a big issue, but if the number of failures is more in an e-commerce type of web applications then the business will fail. So number of failures metric is used to provide the information about number of times an application has failed.

4.1.2.4 Summary The interviews are conducted from both the case company and other organiza- tions; we find that the metrics provided by eight interviewees from the case study overlapped with the metrics provided by the interviewees from other companies. The metrics obtained from case company and other companies are provided in Appendix G. As we conducted other interviewees in order to find the results from case company are typical or not.

4.1.2.5 Documents The metrics which are collected from the previous project test reports are pro- vided here. The documents are analyzed and the metrics for PSR attributes are distinguished. The documents collected consists of metrics for performance and scalability. Among 18 documents, 14 documents are related to performance and six are related to scalability. All the identified metrics of PS attributes from the documents overlapped with the metrics identified from interviews. We are unable to collect the reliability related documents as they are not available for us. Met- rics are classified into themes and based on these themes, metrics are collected from these documents. The thematic map for metrics is provided in figure 4.4. A total of 16 metrics are obtained from documents, out of which nine belongs to performance and seven to scalability. The identified metrics are given in table 4.4. Chapter 4. Results and Analysis 52

Figure 4.4: Thematic map for metrics from documents

Table 4.4: Metrics obtained from documents S. No Performance Scalability 1 Number of transactions per Load distribution sec 2 CPU utilization CPU utilization 3 Number of bytes per sec Memory utilization 4 Number of sessions Response time 5 Execution time Throughput 6 Response time Number of threads 7 Number of bytes per sec Number of users 8 Latency 9 Memory utilization

4.1.3 Criteria for selection of metrics A total of 12 interviews are conducted for identifying the criteria for selection of metrics in practice. All interviewees were provided the criteria in a similar way as the metric selection is mainly depend upon the type of web application being tested, metric dependency, customer and market requirements. One of the interviewee stated "Metric selection mainly depends on customer and market requirements, as customer may use different configuration settings and hardware we need to test on their settings and provide figures (outputs or measured values) to them that the given applications is working on provided settings" - Verification specialist. Chapter 4. Results and Analysis 53

4.2 Facet 2: Tools for testing PSR attributes

In this section, we present the results of our second research question by using the data collected from systematic mapping study, interview and documents. Tools are used to evaluate web applications against selected criteria. The tools sim- plify the testing process and it helps the testers to solve the complex problems without any strain. The tools can be of two types: manual and automated. So in our study, tools related to PSR attributes are collected from different data sources. The tools collected from all the sources are as follows: 54 from SMS, 18 from interviews and four from documents. A total of 76 tools identified from all the sources and the information about the availability, language support, plat- form support, developer, URL, Source type and quality attribute for each tool is provided as a list in the Appendix D. This section is structured into three subsections as follows, first subsection consists of tools obtained from the systematic mapping study which answers the RQ2.1, second subsection consists of tools obtained from both the interview and the documents which answers RQ2.2 and third subsection consists of the draw- backs and improvements identified in the tools from the practice which answers the RQ2.3.

4.2.1 Systematic mapping study This section mainly addresses the data collected from the systematic mapping study. Out of 97 articles, 76 articles deal with the tools related to PSR. The tools are classified into themes and based on these themes the tools are collected from the systematic mapping study. The thematic map for tools is provided in figure 4.5. So in our research, the tools obtained from the systematic mapping study are

Figure 4.5: Thematic map for tools from SMS classified into different schemes. They are performance, scalability and reliability Chapter 4. Results and Analysis 54 as shown in the figure 4.5. This classification provided an interesting outcome where the tools related to scalability and reliability are merely less mentioned in the articles. A total of 53 tools are identified for performance attribute, 23 tools for scalability and five tools for reliability. Based on the analysis, we identified tools related to performance can be used to measure the scalability attribute also. Because scalability is performed by adding additional resources and to measure the performance deviations in the system after adding resource. Whereas reliability majorly consists of markov chain model and few tools i.e. only five tools are identified from the literature. From this study, we also identified two top most mentioned tools for each attribute by calculating the percentage of their occurrence in the total 76 articles. And the list of all tools obtained from SMS along with the frequency count are provided in the table 4.5.

4.2.1.1 Performance Performance scheme is related to the tools which can be used to perform testing to measure performance. The performance tools are sub divided into load testing tools and stress testing tools, performance monitoring and profiling tools. The most mentioned tools in literature are Apache JMeter (25%) and LoadRunner (31%) based on the frequency count. The list of tools related to performance are mentioned in the table 4.5.

4.2.1.2 Scalability Scalability scheme is related to the tools which can be used to perform testing in order to calculate the performance measures after adding additional resources. The scalability tools are sub divided into monitoring tools and scalability tools. The most mentioned tools in literature are Webload (9%) and Silk performer (5%) based on the frequency count. The list of tools related to scalability are mentioned in the table 4.5.

4.2.1.3 Reliability Reliability scheme is related to the tools which can be used to perform testing to calculate error and fault measures in certain criteria’s. The criteria can be number of users, number of hits, number of sessions. The reliability tools are mentioned fewer times compared to other two attributes. The most mentioned tools in literature are test complete (3%) based on the frequency count. The list of tools related to reliability are mentioned in the table 4.5. Chapter 4. Results and Analysis 55

Table 4.5: Identified PSR tools ID Tools Attribute 1 Apache JMeter™ Performance (Load testing) 2 LoadRunner Performance(Load testing) 3 WebKing Performance(Load testing), Re- liability 4 iPerf Performance 5 Tsung Performance (Load testing, Stress), Scalability 6 WAPT Performance(Load, Stress) 7 openSTA Performance(Load, Stress) 8 SOAtest Reliability, Perfor- mance(Load,stress) 9 Microsoft Web Application Performance(Stress) Stress Tool 10 Performance(Load) 11 The Grinder Performance(Load) 12 WebLOAD Performance(Load, stress), Scal- ability 13 Silk Performer Performance(Load, stress), Scal- ability 14 Webserver Stress Tool Performance(Load, stress) 15 QAload Performance(Load, stress), Scal- ability 16 Wireshark Performance 17 Firebug Performance ( web page perfor- mance analysis) 18 Oprofile Performance ( performance counter monitor profiling tools ) 19 Xenoprof Performance ( performance counter monitor profiling tools ) 20 SoapUI Performance (load testing) 21 CloudTest Performance (load testing) & scalability 22 collectl Performance (monitoring tool) 23 ApacheBench Performance (load testing) 24 TestComplete Performance, scalability and re- liability 25 MBPeT: A performance Performance and scalability testing tool 26 collectd Performance (load testing) Chapter 4. Results and Analysis 56

27 Cacti Performance 28 FastStats Log File Analyzer Reliability, Perfor- mance(Load,stress) 29 Rational TestManager Performance 30 Pylot Peeformance and Scalability 31 loadstorm Performance (load testing) 32 Rational Performance Performance Tester 33 Testmaker Peeformance and Scalability 34 Siege Performance (load testing) 35 LOADIMPACT Performance(Load testing) 36 Advanced Web Monitoring Performance monitoring Scripting (KITE) 37 Visual Studio Performance(Load testing, stress testing) 38 testoptimal Performance(Load testing) 39 WebSurge Performance(Load, stress) 40 Application Center Test Performance(Stress, Load), Scalability 41 e-TEST suite Performance, Reliability 42 Watir-webdriver Performance 43 Selenium WebDriver Performance, Scalability 44 AppPerfect Load Test Performance(Load, Stress) 45 Yslow Performance analysis 46 BrowserMob Performance(Load), Scalability 47 NeoLoad Performance(Load, Stress) 48 perf Performance monitoring 49 Blazemeter Performance(Load) 50 Zabbix Scalability(monitoring tool) 51 Nagios Scalability(monitoring tool) 52 Opsview Scalability(monitoring tool) 53 HyperHQ Scalability(monitoring tool) 54 HP QuickTest Professional Performance(load)

4.2.2 Interviews and documents The data obtained from interview is used to answer the research question RQ2.2 i.e. tools used in practice for web testing by software testers. The tools are classified into themes and based on these themes the tools are collected from interviews. The thematic map for tools is provided in figure 4.6. Chapter 4. Results and Analysis 57

Figure 4.6: Thematic map for tools from interviews

The tools theme as shown in figure 4.6 is classified into seven sub themes. They are commercial, frameworks, freeware or trial, internal tool, monitoring tool, open source and simulators. The commercial theme contains the data coded from nine interviewees, frameworks theme contains the coded data of two interviewees, freeware or trial theme contains the coded data of three interviewees, internal tool theme contains the coded data of six interviewees, monitoring tool theme contains the coded data of five interviewees, open source theme contains the coded data of 11 interviewees and simulators theme from two interviewees. The clear representation of these sub themes and number of interviewees for each sub theme are provided using the bar chart as shown in the figure 4.7.

Figure 4.7: Types of tools obtained from interviews

4.2.2.1 Commercial Commercial theme consists of the tools which are licensed products. Nine out of 12 interviewees provided the list of commercial tools they have used from their Chapter 4. Results and Analysis 58 experience. Commercial tools are used because the company have provided them with a licensed software so the software testers have used them. The selection of tools is not in the hands of software testers; it depends on the company. One of the reasons which we identified from interviews regarding the reason behind the selection of specific tools (while leaving other tools) is based on customer require- ment and company needs. The list of commercial tools for testing performance, scalability and reliability provided in the interview is presented in table 4.6.

Table 4.6: Commercial tools obtained from interviews S. No Tool name Number of interviewees mentioned 1 HP LoadRunner (Old name: Mercury load 6 runner) 2 VMware Vcenter 1 3 QuickTest Professional 4 4 HP Quality Center 2 5 IBM RPT 1 6 Silk performer 1 7 Sahi pro 1

4.2.2.2 Frameworks Two out of 12 interviewees provided the data about scalability tools in which they use clustering frameworks to enable high scalability in the applications. These clustering framework enables to identify the node which is going to fail and make the application fault tolerant also. The details provided regarding clustering framework are listed in table 4.7.

Table 4.7: Frameworks obtained from interviews S. No Tool name Number of interviewees mentioned 1 AKKA clustering 2 2 Zookeeper clustering 1 3 Oracle RAC clustering 1

4.2.2.3 Freeware / trial Three out of 12 interviewees mentioned about freeware or trial version capability tools for testing the non-functional attribute performance. All the three intervie- wees mentioned the same tool and it can be available both in standard version Chapter 4. Results and Analysis 59 and pro version for trial. The tool is SOAP UI, where tester can generate the load and view the request and response between the application.

4.2.2.4 Internal Six out of 12 interviewees mentioned a tool or framework used in Ericsson, as it is used for non-functional testing. One of the interviewee stated that “Erics- son prefer in making their own tools rather than depending on other, so now we are working on a performance tool for non-functional testing.” – Test specialist. The reason for developing and preferring the internal tools over commercial tools in the case company is because of its flexibility to update and support the needed requirements. One of the interviewee mentioned that “As the commercial tools does not satisfy all the required needs of the company. Whereas in the case of internal tool we can develop the tool based on our own requirements” – Verification specialist.

4.2.2.5 Monitoring Five out of 12 interviewees mentioned monitoring tools which are used to monitor the usage of network, data, utilization of memory, CPU. These tools are generally used in scalability and performance testing. The tools are provided in the table 4.8.

Table 4.8: Monitoring tools obtained from interviews S. No Tool name Number of interviewees mentioned 1 M1 - Monitor One 2 2 Wireshark 1

4.2.2.6 Open source 11 out of 12 interviewees mentioned open source tools and they currently use the JMeter tool in the company. The open source tools are freely available in online and no need to buy, so many companies encourage the use of open source tools. The tools which are provide in the interview are listed in the table 4.9. Chapter 4. Results and Analysis 60

Table 4.9: Open source tools obtained from interviews S. No Tool name Number of interviewees mentioned 1 Apache JMeter 11 2 Selenium 3 3 Ixia 1

4.2.2.7 Simulators Two out of 12 interviewees mentioned the usage of simulators in performance testing of web applications. The simulators generate the load and play the sce- nario to measure the performance. The simulator mentioned by the interviewees is DMI simulator which generate the load as per request. One of the interviewee stated that “for performing like if you want to create a lot of devices in general, a lot of traps to them and test the load on the given system you can use simulators.”

4.2.2.8 Summary The interviews are conducted from both the case company and other organiza- tions; we find that the general tools used by case company and other organizations are Apache JMeter tool and LoadRunner tool. The tools obtained from the case company and other companies are provided in the Appendix G. So the data re- lated to tools is collected from interviewees are general rather than typical data.

4.2.2.9 Documents By the analysis of test reports of previous projects, we collected some tools where the tools overlap with the interviewees provided tools. The tools are classified into themes and thematic map for tools is provided in figure 4.8. A total of four tools are identified from the available 18 documents and provided in the table 4.10. These documents are mainly related to the performance and scalability test reports. Chapter 4. Results and Analysis 61

Figure 4.8: Thematic map for tools from documents

The list of the tools identified from the documents are

Table 4.10: Tools obtained from documents S. No Tool name 1 Apache JMeter 2 SOAP UI 3 Jconsole 4 Pure load enterprize

4.2.3 Tool drawbacks and improvements Seven out of 12 interviewees provided drawbacks in tools they used. Some com- mon drawbacks are mostly mentioned on tool JMeter. There are five drawbacks which are identified from seven interviewees.

• Limit in the number of virtual users for JMeter tool.

• JMeter makes the interaction between the systems very complex during the simulation process.

• The update to identified bug in the commercial tool is not possible as it is not a proprietary tool.

• JMeter also fails to handle when the number of hits from the virtual users increase. Which leads to the generation of deadlock.

• Load runner tool fails to work properly in the low configuration system as it is a high-end tool. Chapter 4. Results and Analysis 62

The case company developed an internal tool to overcome these drawbacks. The interviewees did not mentioned any improvements, they only mentioned as by overcoming limitations or drawbacks the tools can be improved.

4.3 Facet 3: Challenges faced by software testers

In this section, we present the results of our third research question RQ3 by using the data collected from systematic mapping study, interview and documents. Generally, challenges are the issues or problems faced. In this study, all the challenges related to PSR are collected and analyzed. A total of 18 challenges related to metrics, development, user and tools are identified from SMS. Whereas from the interviews, a total of 13 challenges related to tools, development, time, metrics, and network are identified. Only three challenges related to the metrics are observed from the documents. This section is structured into three subsections, first subsection consists of challenges obtained from systematic mapping study which answers the RQ3.1, second subsection consists of challenges obtained from both the interview and the documents which answers RQ3.2 and third subsection provides if any mitigations from literature mitigates challenges identified in practice.

4.3.1 Systematic mapping study This section mainly addresses the data collected from the systematic mapping study. Out of 97 articles, 33 articles deal with the challenges related to PSR. The challenges are classified into themes and the thematic map for challenges in the systematic mapping study is provided in figure 4.9.

Figure 4.9: Thematic map for challenges from SMS

In our research the challenges faced by software testers are collected from SMS and they are classified into 4 schemas. They are metrics, user, development, and Chapter 4. Results and Analysis 63 tools. The metrics theme contains the data coded from six articles, user theme contains the data coded from three articles, tools contain the data coded from 11 articles and development theme contains the coded data from 10 articles. The clear representation of these sub themes and number of articles for each sub theme are provided using the bar chart as shown in the figure 4.10.

Figure 4.10: Number of articles addressed each theme from SMS

4.3.1.1 User User schema consists of user based challenges and issues which depend upon the user behavior. The data coded from the three articles represent three different challenges related to user. So from the literature we identified three challenges related to user schema. Each challenge is described below.

4.3.1.1.1 Challenge 1 According to Abbors et al. [51], the behavior of the user need to be simulated based on the real environment. If the simulated user behavior does not match with the real user behavior, it may cause faults when deployed in the live environment which are not observed during testing phase. So the challenge is to know how to simulate the real user behavior.

4.3.1.1.2 Challenge 2 According to Arkels and Makaroff [52], the challenge is to know whether the improvement in the identified bottlenecks can improve the overall performance of the system or it leads to the cause of any another bottleneck due to different user actions. Chapter 4. Results and Analysis 64

4.3.1.1.3 Challenge 3 According to Gao et al. [53], the user satisfaction is mainly depending upon the performance of the application. The challenge is to find how users are reacting to different response times and what actions are being performed by the users related to server responses. Mitigation: To find the user reactions, a framework has developed by the authors [53], where it can monitor and retrieve user patterns from the web logs and generate the performance test cases.

4.3.1.2 Tools Tools schema consists of challenges related to tools and their drawbacks. The data coded from the eleven articles are mapped into five different challenges. So from the literature we identified five challenges related to tools schema. Each challenge is described below.

4.3.1.2.1 Challenge 1 As already mentioned in the user theme, imposing the real user behavior is a challenge [54]. According to Shojaee et al. [55], they have find two different ways to simulate the real user behavior. One way is to randomize the user data by replacing it with the recorded user input and other way is by providing the inputs manually. Both the identified ways are not efficient, first approach will not really simulate the user behavior and whereas the other approach is a difficult task.

4.3.1.2.2 Challenge 2 The challenge is related to the environment of tool i.e. the improper environment of tools may also cause problems in testing [16]. Mitigation: In order to overcome the challenge posed by the environment. The following factors need to be done properly. They are

• Installation of tool

• Tool setup

• Flexibility of the tool to perform the test

4.3.1.2.3 Challenge 3 The challenge is related to Apache JMeter tool; many testers face challenge to create more number of virtual users as JMeter only sup- ports limited number of virtual users [56]. To overcome this challenge, JMeter provided a distribution setup which consists of master and slave server. The ad- dition of distribution setup leads to raise of other challenges such as configuration issues in script while executing with distributed setup. Another challenge regarding the JMeter tool is generation of test scripts [56, 12]. There is a need for the support of external plugins, through which test scripts can be generated. Even though there is a plugin named badboy, but the Chapter 4. Results and Analysis 65 compatibility between JMeter and badboy is not efficient so the challenge need to be mitigated. Kiran et al. [57] stated that “JMeter script does not capture all the dynamic values, such as SAML Request, Relay State, Signature Algorithm, Authorization State, Cookie Time, Persistent ID (PID), JSession ID and Shibboleth, generated using single sign-on mechanism of Unified Authentication Platform.” Another challenge of JMeter tool is the inability to record test cases. Along with that it also provides confusing charts and unclear terminology.

4.3.1.2.4 Challenge 4 According to Quan et al. [58], most of the tools which are available for testing the quality performance attributes only support the cre- ation of simple test case scenarios. So it may not be sufficient to know the transaction time and number of simultaneous user from these test case scenarios and also difficult to identify the bottlenecks existing in the application.

4.3.1.2.5 Challenge 5 The tools which use random user sessions and log file based sessions for simulating virtual users does not able to provide a real workload. Mitigation: Xu et al. [54] provided a configuration file for each virtual user based on a continuous markov chain. It basically provides the information re- garding the visiting paths, stay time, and visiting moments.

4.3.1.3 Metric Metric schema consists of challenges related to metrics while performing the test- ing. The data coded from the six articles are mapped into six different challenges. So from the literature we identified six challenges related to metrics schema. Each challenge is described below.

4.3.1.3.1 Challenge 1 According to Shams et al. [59], identifying the exist- ing dependencies between requests i.e. previous requests while validating present request is one of the main challenge in performance testing.

4.3.1.3.2 Challenge 2 According to Jiang et al. [60], stated that the selec- tion of parameters and the criteria for testing is an important issue in performance testing. So the selection of metrics is a challenge as there are several parameters available, but selecting a suitable set of parameters is a difficult task.

4.3.1.3.3 Challenge 3 Nikfard et al. [6] addressed that most of the chal- lenges faced during the performance testing are related to faults identified in the running environment i.e. because of poorly deployed resources. Chapter 4. Results and Analysis 66

4.3.1.3.4 Challenge 4 Guitart et al. [21] stated that during scalability test- ing the main challenge is related to the resources such as CPU, server, memory, and disk. Mitigation: This challenge can be mitigated by identifying the type of resource required. By adding particular resource and by measuring the effect from it can helps in reducing the challenge.

4.3.1.3.5 Challenge 5 Zhou et al. [61] stated that problems related to CPU bottlenecks and I/O bottlenecks can be avoided by controlling the virtual users. Another challenge is related to network delay and server computation ability, which are not able to control manually. So the issues are generally faced due to network connection and server processor.

4.3.1.3.6 Challenge 6 Specifying the load test parameters like generation of forms, recognition of the returned pages is a major challenge according to the Lutteroth et al. [62]. Mitigation: These challenges can be overcome by adding the specifications to form specific model.

4.3.1.4 Development Development schema consists of challenges related to development area, coding error etc. The data coded from the 10 articles are mapped into four different chal- lenges. So from the literature we identified four challenges related to development schema. Each challenge is described below.

4.3.1.4.1 Challenge 1 The testing contains scenarios to be tested. In case of large application there may be 100 scenarios for login page itself. So handling large number of scenarios is a challenge. The mitigation to the challenges is provided by Talib et al. [63]. Mitigation: A metric-based test case partitioning algorithm is used to generate the test cases. As it produces three equivalence classes and reduce the number of test cases.

4.3.1.4.2 Challenge 2 Sometimes the errors in the application may not be found in the developed environment, sometimes it may not work properly in the other environment due to faults [64]. The challenge is to reveal the actual error before the end user encounters it.

4.3.1.4.3 Challenge 3 If a site becomes popular, then the load which is considered during the development is not sufficient [3], so it is a challenge to know the number of users may hit the site at the same time. Chapter 4. Results and Analysis 67

4.3.1.4.4 Challenge 4 The development challenge is regarding the unneces- sary sleep statements. And garbage collection heap may cause the socket errors which leads to decrease in the number of hits and garbage collection heap may decrease server response which is due to increase in number of virtual users [24].

4.3.2 Interviews and documents The challenges faced by software testers are obtained from interviews. The chal- lenges are analyzed by using thematic analysis and the thematic map for chal- lenges are represented in figure 4.11.

Figure 4.11: Thematic map for challenges from interviews

The challenges obtained from the interviews are classified into five themes as shown in figure 4.11. They are metrics, network, development, time and tools. The metrics theme contains the data coded from two interviewees, network theme contains the data coded from three interviewees, development contain the data coded from eight interviewees, tools theme contains the coded data from seven interviewees and the time theme contains the data coded from five interviewees. The clear representation of these sub themes and number of interviewees for each sub theme are provided using the bar chart as shown in the figure 4.12. Chapter 4. Results and Analysis 68

Figure 4.12: Number of interviewees addressed the themes

4.3.2.1 Development In this section the challenges related to development such as developing a script or code for test cases are provided. Eight out of 12 interviewees addressed the challenges related to development. Majority of these challenges are related to the script issues. In general, testing can be performed by using scripts. The software testers are facing challenges while developing the testing scripts and some of the challenges related to script are provided below. The challenge related to unclear or changing non-functional requirements, one of the interviewee stated that “As we are working in agile, the requirements sometimes are never clear and there may be a sudden change in requirements or from other module team so automating of scripts need to be reworked sometimes.” -Senior software engineer. Another interviewee stated that “Sometimes in LoadRunner while creating scenarios, we have to capture browser request while building scripts. While build- ing those scripts sometimes the parameters which are captured are not exactly what we want. To achieve this, we need to add additional addins to capture those details because browsers do not support by default. Then there are several pa- rameters which are changing the values dynamically. For one request it will be one value and for other request there will be other value, so that part needs to be captured. And it is very hard to know it, if you don’t see each and every request while building the scripts.” – Verification specialist. Chapter 4. Results and Analysis 69

The other challenges are related to the technology expertise. Some of the challenges identified related to this are provided below. One of the challenge is that, it is difficult to learn other technology scripts and commands, one of the interviewee stated that “Unix commands are needed in order to use some tools so it is difficult for me and also the database scripting is difficult for me but might be other guys able to do it.” -Senior solution integrator. Challenge related to simulator development, one of the interviewee stated that “Sometimes developing a simulator could be something challenging, because if I talk about myself I am not very good in java and most of the simulators are being built in java. So of course we need developers help sometimes for simulated environment.” – Senior test engineer. The challenge is related to testability and test automation; it may arise during testing. One of the interviewee stated that “Only at the code level that we face challenges but not as far as I like, yes there are certain areas which are not testable or which are to be manually testable.” – Test quality architect. The challenge related to reliability may be due to developer fault, one of the interviewee stated that “If there is some issue with reliability, then the developer might have forgot to put a condition check. For example if the users are limited to 1000, then what if 1001 user arrives so validation check of user more than 1000 need to be there. Where developers are missing these logics while developing the code” – Test specialist.

4.3.2.2 Metrics Two out of 12 interviewees mentioned challenges related to metrics. Some of the challenges addressed by interviewees related to metrics are provided below. One of the interviewee stated that “metrics might cause to rerun the tests and also change in benchmarks may be a huge challenge as all the tests need to be rerun again to check the quality of the application. Again re-testing the whole process is taking so much time.”- Verification specialist Interviewee stated that “Suppose a maximum number of users for an applica- tion is provided, the tests are run by generating load. If there is any change in the number of users, then the load need to be regenerate for all the users. In this case, understanding the breakpoint is a challenge.” -Software tester.

4.3.2.3 Network Three out of 12 interviewees mentioned the challenges related to network. The major challenge faced by the software testers is network loss. The network loss is a challenge which prolongs the work of tester further. One of the interviewee stated that “In case of performance testing, challenges are related to network issues, suppose I have to test a game, so it depends on many factors of that internal network or internet. Some times a game can work Chapter 4. Results and Analysis 70 easily even on an old system with an old line of 512 kbps if the network loss is not too much. If you are getting 100% network, it will work. Same game may not work on 10 mbps line because 75% will be network loss if there are network issues.” – Test specialist.

4.3.2.4 Time Five out of 12 interviewees mentioned the challenges related to time. The time provided for testing is not sufficient for the software testers which is a major challenge observed. The challenges provided by the interviewees are as follows. One of the interviewee stated that “Development team may have chance to complete their work late but whereas the testers team need to complete their job within the given time. Sometimes we cannot say certain amount of time is suffi- cient for testing there may be issues which we don’t know.” - Test quality architect. One of the interviewee stated that “Time for testing is never sufficient, al- though you plan a lot of things, like you plan that two weeks for development work will finish and one week for testing, the real thing is development took 2 to 2.5 week and sometimes only 1 day available for testing which is not sufficient.”- Test specialist. Some of the interviewees mentioned that they have become used to the little amount of time for testing. But still it is a challenge which need to be fixed, as it can be mitigated only if the developers finish their work on time.

4.3.2.5 Tool Seven out of 12 interviewees mentioned that they face challenges related to tools. The data extracted from these seven interviewees provide three different chal- lenges and they are provided below. The challenges related to the lack of functionalities provided by tools, lack of knowledge in scripting. One of the interviewee stated that “So in JMeter it is not easy to simulate another system that is the main problem. Now If I need to use JMeter, I have to modify it in such way that I need to first simulate the required system or I need to use the simulator.” And also further stated that “if something like this arises, then because there are proprietary tools like internal tools in Ericsson we can update it but updating open source tools are not in our hand.” – Test engineer. Another interviewee stated that “The JMeter tool fails when you hit simulta- neously thousand users or two thousand users, thread locks created may respond to the request but other request may be in deadlock state.” – Senior solution inte- grator. The challenges related to the system configuration, one of the interviewee stated that “Tools like LoadRunner are quiet heavy tools like if we are normally Chapter 4. Results and Analysis 71 running on Pentium 2 or 4GB RAM then it is quiet hard for the tool to run on these .” – Test specialist.

4.3.2.6 Summary The interviews are conducted from both the case company and other organiza- tions; we find that the challenges identified from other organizations are almost overlapped with the challenges identified in the case company. The challenges obtained from case company and other organizations are provided in appendix G.

4.3.2.7 Documents The challenges analysed from the previous project test reports are provided below. The thematic map for the challenges are provided in the figure 4.13.

Figure 4.13: Thematic map for challenges from documents

The challenges obtained from the documents are classified into one schema as shown in figure 4.13. It is metrics and number of identified challenges related to metrics are three. One of the challenge is regarding the slow response of server, which is caused due to large amount of data being requested. The solution to this challenge is sessions groups i.e. less number of requests, smaller amount of data at the time taken. Chapter 4. Results and Analysis 72

Another challenge is related to the limited number of user connections; it needs to wait for some time until all the required data files get loaded. This is mitigated by loading once and caching the files, so in a repeated use it might not wait for the files. The another challenge while testing is more amount of time taken to know whether the test case is passing or not as test data needed to be exchanged between client and server. In order to mitigate this challenge, the case company used a temporary solution i.e. by installing a local database.

4.3.3 Does mitigations available in literature mitigates chal- lenges in practice? This section provides answer to RQ 3.3, first of all the mitigations available for the challenges in SMS are identified. Later the challenges from practice are identified through interviews and documents. We analyzed the data by comparing challenges identified from state of art and state of practice and noticed that there are no proper mitigation strategies available to mitigate the identified challenges from practice.

4.4 Facet 4: Important attribute among PSR

In order to find the answer to this research question RQ4, we have opted interviews as a source for data collection.

4.4.1 Interviews The data obtained from interview are used to answer the research question RQ4. The data collected from the interviews was analysed to identify the important attribute among PSR in web testing. Based on the collected data, we formed three themes such as all are important, application based and priority order based. The thematic map for important attribute among PSR is represented in figure 4.14. All the interviewees have answered this question. All are important theme contains the data coded from three interviewees, application based theme contains the coded data of six interviewees and priority order theme consists of the coded data from three interviewees. Chapter 4. Results and Analysis 73

Figure 4.14: Thematic map for important attribute from interviews

4.4.1.1 All are important Three out of 12 Interviewees mentioned that all the three attributes are equally important. Regardless of application all the three attributes i.e. PSR need to be tested to give enough competition in the competitive market. So by testing all the three attributes PSR, we can reduce the chances of failing the product in the live environment. One of the interviewee stated that “I think all of them are interdependent and each one has its own importance and all the three should go in hand and hand. Because if you take performance is good and if errors keep coming like number of users logged in are more. In this case, speed is good but if errors keep forming up then the reliability is not there in that. If it is the case, then it is a problem. If scalability is more like number of user’s it can accommodate, if more users it can accommodate then the speed will be low. It will not be good suppose if we load a page it takes 10-15 seconds then users usually get bored and they get irritated first of all. So it is a compromise of all those, I think opting should be there and three of them should be at the optimum level” -Software tester.

4.4.1.2 Application based Six out of 12 interviewees mentioned that the importance of attribute can be defined based on the application you are trying to test. Sometimes it is not necessary to test the attribute scalability for a website which has less number of user’s. So the answer depends upon the application type. From the words of interviewee, “Actually it depends on application, normal application mainly goes with performance no need of scalability. Suppose if a web application has n number of users as a requirement then the application need to be delivered with a capability of n+1 users. For banking application reliability is main aspect then performance, they already have certain number of known users. Chapter 4. Results and Analysis 74

There will be no sudden raise in the number of users. So you need not to go with performance testing always, whereas the reliability is most important as the application need to provide the service to all the users without any fail. For a simple web application, performance is most important and for telecom domain, scalability is very important. So it depends on application.” – Test specialist. One of the interviewee stated that “the selection of attribute mainly depends on application type and also on the requirements provided by customer and market- driven.” – Verification specialist.

4.4.1.3 Priority order Three out of 12 interviewees mentioned the attributes in an order i.e. prioritized their importance. By analysing the data, we found that reliability and perfor- mance interchange for first and second places but the scalability attribute is always ranked last. Out of these three interviewees, reliability is given first place by two interviewees and second place by one interviewee. whereas performance is given second place by two interviewees and first place by one interviewee. But the scalability remains in the last place by all the three interviewees. One of the interviewee stated that “Reliability is most important attribute as it is the basic thing system should be able to do, if it is not doing that then I think it is a fail complete fail. Followed by performance and then scalability.” – Senior software engineer.

4.4.1.4 Summary The interviews are conducted from both the case company and other organiza- tions; we find that the most important attribute varies with each interviewee regardless of case company and other organization, the answer to this particular question is answered more from the interviewee perspective. The important at- tributes mentioned by the interviewees of case company and other companies are provided in the Appendix G. Chapter 5 Discussion

This chapter mainly discusses about the findings presented in section 4, by relat- ing it with the results obtained from all the sources such as systematic mapping study, interviews, and documents. The structure of this chapter is as follows

• Section 5.1 discusses about the metrics existing for PSR attributes

• Section 5.2 discusses about the tools existing for PSR attributes

• Section 5.3 discusses about the challenges related to the PSR attributes.

• Section 5.4 discusses about the important attribute among the PSR attributes.

• Section 5.5 discusses about the implications.

5.1 Metrics for testing PSR attributes of web ap- plications

The metrics used for testing the PSR attributes for web applications are collected from systematic mapping study, interviews and documents. The existing metrics are provided in the results section 4.1. From the available list of metrics, we have observed that the response time and throughput are the most commonly mentioned metrics from all data sources such as interviews, documents and systematic mapping study. The overlap and differences in metrics from three data sources are provided in figure 5.1. Figure 5.1 provides the information about the overlapped metrics among all the data sources and also provides the remaining metrics that are identified in each data source. Eclipse presented in the figure represents the data sources, rectangles represent the remaining different metrics identified from each data source, rounded rectangle represents the overlapped metrics among all the data sources and the connections between them are represented by using an arrow. For example, the collected metrics from SMS will be obtained by combining the overlapped metrics (i.e. metrics in rounded rectangle) and metrics identified from SMS (i.e. metrics in rectangle).

75 Chapter 5. Discussion 76

Figure 5.1: Overlap and differences in metrics among all data sources

In our systematic mapping study, we have identified that response time and throughput are the most commonly mentioned metrics in every article that fo- cused on metrics. Almost all the interviewees also addressed these two metrics during the interview process and along with this we have also extracted the data from documents. Most of the documents related to performance and scalability also focused on these two metrics. By observing the data extracted from all the different data sources, we have noticed that the response time and throughput are commonly used metrics. According to authors [7, 61, 65, 33] the response time and throughput metrics are commonly tested in both the performance and scalability testing which supports our results. Through interviews, we came to learn about some metrics which are not par- ticularly mentioned in the SMS. The metrics such as rampup time, rampdown time and rendezvous point and the description of these metrics are provided in the Appendix C. The rampup and rampdown time metrics are related to manual setting configuration and these are only mentioned in one article [7]. From interviews we have observed that MTBF, number of failures are most commonly mentioned metrics for reliability. Whereas in the systematic mapping study, we have observed 13 metrics related to reliability out of which MTBF, number of errors are the commonly mentioned metrics. According to subraya and subrahmanya [4], fault tolerance and recoverability are calculated by using Chapter 5. Discussion 77

MTBF, MTTR and number of failures. As from both the interviews and SMS, we have noticed MTBF is commonly used metrics for reliability. We observed that all the available metrics in literature are not considered for testing of web application. We have identified from interviews that the selection of metrics for testing the applications are also depending on other metrics. One of the interviewee stated that “the selection of performance metrics is interlinked with other performance metrics. Hence, all the interlinked metrics must be con- sidered while testing the web application”. In general, the selection of metrics for testing the PSR attributes is based on the criteria such as customer requirements, market requirements, metric dependencies and application type. This observation is further supported by the literature. Jiang et al. [60] stated that the selection of metrics is based on the type of the application they are going to test and along with this they have also specified some of the metrics which are common for any type of application. According to the Xia et al. [36], all the metrics of performance are interlinked with other performance metrics. Finally, we have observed that the number of articles addressing the perfor- mance metrics are more from SMS. While in the case of interviews, we have ob- served that all interviewees provided required information about the performance attribute easily when compared to other attributes scalability and reliability. Not only from the interviews but also from the collected documents focused on per- formance mainly. From this, we noticed that there is a lot of research carried out on performance attribute compared to other two attributes.

5.2 Tools for testing PSR attributes of web appli- cations

The tools that exist for testing the PSR attributes of a web application are re- trieved by conducting systematic mapping study, interviews, and documents. The list of identified tools from all the data sources is presented in the result section 4.2. From the interviewees, we observed that Apache JMeter and LoadRunner are the most commonly used tools for testing web applications in practice. In addition to this, from the systematic mapping study we found that in most of the literature, it was mentioned that load runner and Apache JMeter tools are available for testing the web applications. Also from the documents we can see that JMeter is the commonly mentioned tool. From all these data sources, we observed JMeter as the most commonly used tool for testing web applications. The overlap and differences in tools from all the data sources is provided in the figure 5.2. The findings in this study regarding tools are also supported from the literature. Xia et al.[36] stated that, Apache JMeter is an open source tool which is preferred by software testers as it provides more functionality even though it Chapter 5. Discussion 78 is an open source.

Figure 5.2: Overlap and differences in tools among all data sources

Figure 5.2 provides the information about the overlapped tools among all the data sources and also provides the remaining tools that are identified in each data source. Eclipse presented in the figure represents the data sources, rectangles represent the remaining different tools identified from each data source, rounded rectangle represents the overlapped tools among all the data sources and the connections between them are represented by using an arrow. For example, the collected tools from SMS will be obtained by combining the overlapped tools (i.e. tools in rounded rectangle) and tools identified from SMS (i.e. tools in rectangle). We have observed that after JMeter, LoadRunner is the most commonly ad- dressed tool both from SMS and interviews. One of the interviewee stated that “LoadRunner is the perfect tool for performance testing of web applications”. As the LoadRunner tool is a licensed product, the case company does not prefer to use this tool. One of the interviewee stated that the reason for excluding Load- Runner tool as “Ericsson prefers to use open source tools or internal tools for testing web applications” - Test specialist. From the above findings we have identified JMeter and LoadRunner as the Chapter 5. Discussion 79 common tools from interviews and SMS. As the interviewees are familiar with these two tools, they have addressed some of the drawbacks related to these tools. In JMeter the drawbacks are related to the number of virtual users, and parameter scripting are mentioned through interviews. According to Krizanic [12], the JMeter have some disadvantages related to setup of virtual users, test case recordings, improper terminology which supports the findings from the in- terview. From SMS and interviews, we have identified majority of the tools related to performance testing. Only few tools are identified for both scalability and reliability attributes. The identified tools for PSR attributes for testing the web applications are provided in the Appendix D. We have faced a problem regarding the scalability and reliability attributes, as we did not find more number of tools related to these two attributes in SMS and also from interviews and documents. Lack of research and lack of knowledge on these two attributes may be the reason why the interviewees faced difficulty while answering SR related questions. Some of the performance tools are also used for testing the scalability of web applications as the scalability mainly focus on limiting the performance failures. In case of reliability, markov chain models are identified in SMS. A markov model contains all the information regarding the possible states and transition paths between states that are exist in the system. In reliability analysis, markov model holds the information about failures and repairs in the transitions. By using this model, common assumptions about failure rate distributions are avoided and also suitable for the appropriate reliability analysis of web applications. From interviews we observed that they are using an internal tool. The internal tool used by the case company supports performance, and reliability attribute.

5.3 Challenges in PSR testing of web applications

The challenges in PSR testing of web applications are retrieved from the system- atic mapping study, interviews and documents. Challenges identified from the data sources are presented in section 4.3. Most frequent challenges identified from SMS and interviews are mainly re- lated to the tools. Challenges and issues that are related to the tool JMeter are encountered both in practice and SMS. According to Krizanic [12], the Apache JMeter has some issues related to number of virtual users, test case recordings, and improper terminology. We have also observed that to overcome the challenges and drawbacks identified in JMeter tool, Ericsson has started developing its own tool for non-functional testing. Challenges related to the development of web applications are also identified from both SMS and interviews. The identified challenges related to development are related to faults in code, scripts related issues, environment related issues, and programming skills of the testers. Articles [56, 57] also supports the mentioned Chapter 5. Discussion 80 challenges such as issues related to scripts and environment are observed more in the web applications. The challenges related to time are not identified from the literature, whereas five interviewees mentioned challenges that are related to time. One of the major challenges faced by the software testers related to time is that the time available for testers is not sufficient to carry out the testing process. Development methodologies such as agile, kanban are time efficient for the projects. One of the interviewee stated that "Many organizations are failing to complete the project on time even though they are using time efficient method- ologies in the project”. So the main reason for the challenge related to time was obtained from interviews as the delay caused in development phase. Challenge related to time is further supported by authors from the literature. According to Subraya [2], pressure on the delivery leads to less time for testing phase which means it is improperly tested product. Another challenge which we noticed from both SMS and interviews are re- lated to the metrics. The number of metric related challenges identified from SMS is more when compared to interviews. The common challenge identified from both the interviews and SMS is related to the selection of parameter i.e. metric. According to Xia et al. [36], all the metrics are interlinked, so the met- rics should be selected carefully by considering the type of web application. We have also identified some of the challenges related to metrics faced by the case company as mentioned in the section 4.3.2.2. The challenges identified from the documents are related to metrics category. The overlapped challenges from all the data sources are related to metrics as provided in figure 5.4. Finally, from all the data sources we observed that more number of challenges are related to the performance attribute than other two attributes. Figure 5.3 provides the information about the overlapped challenges among all the data sources and also provides the remaining challenges that are identified in each data source. Eclipse presented in the figure represents the data sources, rectangles represent the remaining different challenges identified from each data source, rounded rectangle represents the overlapped challenges among all the data sources and the connections between them are represented by using an arrow. From all the data sources we did not find any common metric. Figure 5.4 provides the information about the overlapped challenge area among all the data sources and also provides the remaining challenges area that are identified in each data source. Eclipse presented in the figure represents the data sources, rectangles represent the remaining different challenge areas identified from each data source, rounded rectangle represents the overlapped challenge ar- eas among all the data sources and the connections between them are represented by using an arrow. From all the data sources we find the challenges mainly related to metrics are most common. Chapter 5. Discussion 81

Figure 5.3: Overlap and differences in challenges among all data sources

Figure 5.4: Overlap and differences in challenge areas among all data sources Chapter 5. Discussion 82

5.4 Most important attribute among PSR

The most important attribute among PSR attributes is retrieved by conduct- ing interviews. The data obtained from interviews about the most important attribute are classified into three schemes as mentioned in section 4.4. The data collected from the interviews vary. The extracted data from the interviews generally comes under one of the three categories such as all are im- portant, application based and order based. We observed that three interviewees mentioned all the three attributes are important for testing of web applications, six interviewees mentioned that the selection of the attributes depend upon the type of the application and finally three interviewees prioritized the PSR at- tributes. Along with the interviews, we have also found that the type of web applications plays a crucial role in the selection of attributes. Depending upon the type of web application the attribute may be selected. This is supported by many articles. According to [4, 11, 60] authors stated that the attribute is considered depending on the type of web application to be tested. So the literature supports the actual response provided by interviewees. The main reason for choosing the category all are important by interviewees is because of the importance for quality. The other category focuses on prioritizing the PSR attributes order, based on their experience the interviewees provided a ranking order among the PSR attributes. This ranking order varies from one to other. The main reason for the variation in response may be due to their experience, due to lack of knowledge, due to their intake of quality aspect. We also find that among three interviewees, two interviewees highlighted the attribute reliability as the most important attribute and one interviewee pointed performance as the most important. As from our observation performance is the most mentioned attribute in the literature. But whereas from the interviews we identified reliability as the most important attribute in the case of priority order based theme. Hence, the results obtained from both the sources vary. As the practitioners feel that reliability is an important attribute, there is more scope for the research related to reliability.

5.5 Implications

The overlap and differences identified for metrics both from state of art and state of practice as provided in figure 5.5. It is identified that the number of metrics are more in literature when compared to practice. The reason for this observation is because in practice all metrics are not considered while testing web application. In practice the metrics are selected based on the type of application and other criteria such as customer and market requirements and metric dependency. From the figure 5.5 observed pattern is almost all the metrics identified from state Chapter 5. Discussion 83

Figure 5.5: Overlap and differences in metrics between state of art and state of practice of practice are overlapped with state of art. Along with this, few metrics are identified from state of practice which are not available in state of art. The difference is due to indepth knowledge to software testers in testing PSR attributes and some metrics mostly related to company specific application. The overlap and differences identified for tools both from state of art and state of practice as provided in figure 5.6. It is identified that the number of tools are more in literature when compared to practice. The reason for this observation is because in practice all tools are not considered while testing web application. In practice the tools are selected based on the type of application and company provided guidelines. From the figure 5.6 observed pattern explains that more difference exists between state of art and practice. As tools identified from the practice are newly available tools in the market, company specific tools and some tools which are known to practitioners through experience. Whereas literature does not contain current information regarding tools, so some new research need to be done for providing data regarding new tools. The overlapped tools are most commonly used tools for testing web applications as more number of articles and interviewees mentioned them. Chapter 5. Discussion 84

Figure 5.6: Overlap and differences in tools between state of art and state of practice

The overlap and differences identified for challenges both from state of art and state of practice as provided in figure 5.7. It is identified that challenges from practice are different from state of art due to software testers face different challenges while testing and based on situation and environment new challenges may arise which are not commonly retrieved from literature. The overlapped challenges are due to studied case company used JMeter tool, as most of the challenges identified from practice are related to this tool. Along with this net- work issues is the common challenge identified in practice, there are no proper mitigations for this challenge because it mainly depends upon environment and network strength. Figure 5.5, 5.6 and 5.7 provides the information about the overlapped metrics, tools, challenges between state of art and state of practice and also provides the remaining metrics, tools, challenges that are identified from state of art and state of practice. Eclipse presented in the figure represents the state of art and practice, rectangles represent the remaining different metrics, tools, challenges identified in each of them, rounded rectangle represents the overlapped metrics, tools, challenges between state of art and practice and the connections between Chapter 5. Discussion 85

Figure 5.7: Overlap and differences in challenges between state of art and state of practice them are represented by using an arrow. For example, the collected metrics from state of art will be obtained by combining the overlapped metrics (i.e. metrics in rounded rectangle) and metrics identified from SMS (i.e. metrics in rectangle). The collected information from the research will help the practitioners to gain the information which was previously not known to them. And it also acts as a reference for the new practitioners in future. It also helps the researchers to know the current status in the research area of PSR attributes. It acts as a reference for carrying out the further research in this area. Chapter 6 Conclusions and Future Work

This section mainly focuses on answering the research questions. Along with that, it also deals with the conclusion and future work.

6.1 Research questions and answers

This section consists of answers to the research questions mentioned in the section 3.2. It is divided into four subsections where each subsection answers each of the research questions. In order to validate the results, we have conducted eight interviews from the case company and four interviews from other organizations. The senior members with lot of experience in testing are selected from these three organizations. Along with the interviews, documents are also used for collecting the information that is missing or failed during the interviews. These are also used for data triangulation.

6.1.1 RQ 1: Metrics used for testing the PSR attributes The following research questions answer the research question 1.

6.1.1.1 RQ1.1 What metrics are suggested in the literature for testing PSR attributes? In order to identify the metrics that are used for testing PSR attributes of web application a systematic mapping study is conducted. The data required for this study is collected from five different databases and a total of 97 articles are selected. As this question is mainly dealing with the metrics, articles that are dealing with the metrics are selected in order to answer this question. A total of 80 articles are selected and analyzed. After complete analysis, we have identified a total of 69 metrics related to PSR. Out of which 39 metrics related to performance, 17 to scalability and the remaining 13 metrics deals with reliability attribute. The identified metrics are specified in the section 4.1.1. Out of these metrics, we identified response time and throughput as the most important and common metrics for both scalability and performance.

86 Chapter 6. Conclusions and Future Work 87

6.1.1.2 RQ1.2 What metrics are used by software testers in practice for testing PSR attributes? In order to identify the metrics that are used for testing the PSR attributes of web application in practice, interviews are conducted and along with that documents i.e. test reports from the previous projects are also collected. A total of 12 interviews and 18 documents are utilized to gather the data required for answering this question. All the selected 12 interviewees addressed the metrics used in their company for testing the PSR attributes of web applications. After analyzing the data collected from the interviews a total of 30 metrics are identified, out of which 20 metrics related to performance, eight to scalability and the remaining two metrics deals with reliability. The identified metrics are specified in the section 4.1.2. We have also collected a total of 18 documents. Among these 18 documents a total of 16 metrics are identified, out of which nine are performance, seven deals with scalability and we did not observe any metrics related to reliability. The metrics collected from the documents are mentioned in the section 4.1.2.5. We also found that the metrics collected from the documents overlapped with metrics collected from interviews. The results from the documents and interviews helped in answering this question.

6.1.1.3 RQ1.3 Why are particular metrics used or not used by soft- ware testers? We have selected interviews as a medium for answering this question. A total of 12 interviews are conducted and all of them have specified the same reason. After analyzing the data collected from the interview, the main reason we found is that the selection of metrics is not fixed and they vary depending on the type of web applications, customer requirements, market requirements and metric de- pendency.

6.1.2 RQ 2: Tools used for testing the PSR attributes The answer to this research question is obtained by answering the research ques- tions below. This research question mainly focuses on identifying the tools used for testing the PSR attributes of web applications. All the identified tools from the systematic mapping study, interview and documents are combined and pre- sented in the appendix D. Along with the parameters such as developer, platform support, availability, testing attribute, resource URL, tool type, source and pro- gramming language for each specified tool is provided. Chapter 6. Conclusions and Future Work 88

6.1.2.1 RQ2.1 What tools are suggested in the literature for testing PSR attributes? A systematic mapping study is selected for answering this question. Five different databases are selected for gathering the data required for answering this question. A total of 97 articles are selected, out of which only 76 articles address the tools used for testing the PSR attributes. After analyzing these selected 76 articles a total of 54 tools are obtained out of these 46 are related to performance, 23 are scalability and five tools specify reliability. We have observed that JMeter and Load Runner are commonly used tools for performance testing. All the identified tools are mentioned in the table 4.5 in section 4.2.1.

6.1.2.2 RQ2.2 What tools are used by the software testers in practice for testing PSR attributes? This research question helps in identifying the tools that are currently used by the software testers in practice for testing the PSR attributes of web applications. So this question helped in collecting the tools that are used by the testers in prac- tice. For answering this question, we have used 12 interviews and 18 documents. The collected data from these 12 interviews contains a total of 18 tools. The tools collected from the interviews are specified in the section 4.2.2. From these interviews we have noticed that JMeter and Load Runner are specified by most of the interviewees. Along with that we have also identified the usage of internal tools for the testing of web applications in the case company. We also identified some new tools which are not observed from the mapping study. From these 18 documents a total of four tools are identified. We did not come across any tools used for scalability and reliability among these documents. The identified tools are addressed in the section 4.2.2.9.

6.1.2.3 RQ2.3 What are the drawbacks of the tools used by software testers in practice and improvements suggested by them? It is inaccurate to identify the current drawbacks of the existing tools from the literature. As the mentioned drawbacks in the literature might be solved or overcome. We didn’t know whether it has solved or not. So in order to find the present drawbacks that are existing in the tools, gathering the information from the software testers who are using the tools for testing the PSR attributes is appropriate. For this purpose, we have selected 12 interviews and 18 documents for finding the answers to this question. From these 12 interviews, a total of seven interviewees specified the drawbacks that are existing in the tools. The drawbacks specified by the interviewees are mainly concerned with the load runner and JMeter. As these are the most commonly used tools for testing the performance attributes as from the data obtained from the RQ 2.2. The drawbacks addressed by the interviewees are provided in section 4.2.3. Chapter 6. Conclusions and Future Work 89

In order to overcome the drawbacks, the case company developed an internal tool which consists of needful functionalities of the JMeter. The internal tool also contains some additional functionalities. Along with that we have noticed that the specified drawbacks can be improved by the changing the limitations existing in the tools. From 18 documents we did not observe any drawbacks related to the tools.

6.1.3 RQ 3 Challenges identified while testing the PSR at- tributes This research question mainly concentrated on identifying the challenges faced by the software testers while testing the PSR attributes in web applications. The solutions for this research question is obtained by answering the below questions. The answers to these questions obtained from different sources, one is from liter- ature and the other from the interviews and documents.

6.1.3.1 RQ3.1 What are the challenges faced by software testers and what are the mitigation strategies available in literature for testing PSR attributes? The answers to this question is obtained from the systematic mapping study. A total of 97 articles are selected from five different databases in order to gather the data regarding the challenges faced while testing these PSR attributes. We have identified 18 challenges among the selected 33 articles out of which three are about users, five about tools, six about metrics and four about development analyzing all the articles, we came across different challenges in these PSR attributes. We have divided and assigned all the identified challenges into four categories they are user category contains three challenges, metric contains six challenges, tool contains five challenges and development contains four challenges. All the identi- fied challenges are mentioned in the section 4.3.1. The main challenges identified are simulating the real user behavior, unable to support the virtual users by tools, network delay and identifying the suitable metrics for testing and handling the large number of scenarios.

6.1.3.2 RQ3.2: What are the challenges faced by software testers in practice while testing PSR attributes? As a part of answering this question we have conducted 12 interviews. All the 12 interviewees mentioned the challenges faced during testing PSR attributes of web applications based on their experience from previous and current projects they are working. After analyzing the results obtained from the interviewees, a total of 13 challenges are identified. All these challenges are mentioned in the section 4.3.2. After analyzing the data, we have noticed that main challenges Chapter 6. Conclusions and Future Work 90 are due to scripts used for testing, limitations in the tools and issues related to the development. As these challenges are specified in majority of the interviews. And a total of three challenges are identified from the documents and all these challenges are presented in the section 4.3.2.6.

6.1.3.3 RQ3.3: Does the existing measures from the literature can solve the challenges faced by software testers in practice? After analyzing all the collected data from literature and interviews, we have noticed that both the literature and interview pointed some similar challenges. The identified mitigations from literature are provided in section 4.3.1. And we noticed that there were no proper mitigation strategies available in the literature for identified challenges in practice. Even though some of the authors specified some strategies and models but they are not validated empirically and proper measures for these challenges. There is a need to address the identified challenges from interviews by conducting further research.

6.1.4 RQ 4: Important attribute among PSR Quality attributes are very important for holding the competition in the com- petitive market. But because of factors like early delivery and pressure, web applications are deploying into the market without proper testing. From this question we thought to find the important attribute to be considered for testing, so that the software tester can conduct the testing on most important attribute first by excluding the remaining in case of time pressure. For answering this ques- tion, we have conducted a total of 12 interviews and all the 12 interviewees were answered. After analyzing the answers obtained from the interview, we catego- rized the answers from interviewees into three themes such as all are important, application based, priority order. Finally, from all these categories we noticed that the priority of the attributes varies and they are not always fixed. From our observations through SMS and interviews, it is concluded that the importance of the attribute depends on the type of the web application.

6.2 Conclusion

The study was conducted in order to identify the metrics, tools and challenges that exist while testing the PSR attributes of web applications. By conducting a systematic mapping study and a case study at Ericsson, we are able to accomplish the objectives of the research. In order to obtain the required information for our research, we used three different data sources such as interviews, SMS, and documents. The documents and interviews are obtained from the case company. The available documents i.e. test reports of the previous projects are collected Chapter 6. Conclusions and Future Work 91 from the case company for the data triangulation. The data collected from all the sources are analyzed by using thematic analysis with the help of a tool named Nvivo, which is used for coding the data. Based on the obtained results, the existing metrics are identified from all the data sources. We have identified that most of the literature mainly focused on the performance attribute than the other two attributes i.e. scalability and reliability. The response time and throughput are viewed as the most commonly used and mentioned metrics for both performance and scalability from all the sources. In the case of reliability, we observed that MTBF is the most commonly mentioned metric from the interviews and SMS. The tools available for testing the PSR attributes are collected from all the data sources. An interesting finding is that from all the identified tools, JMeter is the most important and commonly mentioned open source tool for testing the performance attribute of web applications. Along with this from the interviews and SMS, we have also identified another tool named Load Runner (commercial tool) next to JMeter. These two tools are commonly preferred for testing the performance attribute of web applications. Whereas in the case of scalability and reliability, there are very few tools available from both SMS and interviews. The challenges while testing the PSR attributes are also identified from the literature, interviews, and documents. And the majority of the challenges iden- tified from this study are related to the development, metrics, and tools. The challenges faced by the software testers regarding the development are related to the scripting issues, metric issues are related to the dependencies and the issues related to tools are mainly due to the test case scenarios and lack of proper input to the tool. In order to overcome the challenge posed by the JMeter tool, the case company has developed its own tool to mitigate the challenges. The most important attribute among the PSR attributes is identified from interviews. We have identified that the selection of the attribute mainly depends upon the type of the application they are testing, customer and market require- ments. Through this study, all the analyzed data regarding the tools, metrics and challenges are used to generate a list. These lists will help the software testers to gain the knowledge. Hence we conclude that the PSR attributes are the essential quality factors which play a major role in the testing of web applications to decide the quality.

6.3 Research contribution

The study deals with the metrics and tools used by software testers for testing the PSR attributes of web applications. Along with this, it also concentrates on identifying the challenges faced by the software testers while testing the PSR attributes of web applications. As there is a lack of research related to PSR at- Chapter 6. Conclusions and Future Work 92 tributes in web applications, we have opted the present study as our research to contribute some knowledge to the existing body of knowledge in software engi- neering. The study adds the information to the existing knowledge regarding metrics and tools available for testing the PSR attributes of web applications. Whereas the previous studies are not much focused on providing the information regarding the tools and metrics which are already available for testing the PSR attributes. So the contribution of the study will help the practitioners and researchers to gain the information regarding the tools and metrics which are available for testing the PSR attributes of web applications. The identified data regarding the metrics and tools are provided in section 4.1 and 4.2 and also provided in a list as shown in the Appendix C and D. The study also identified the challenges that are faced by the software testers while testing the PSR attributes of web application. Identified challenges from all the data sources are provided in section 4.3 and as a list in the Appendix E. As per our knowledge, there are no previous studies existing on the challenges faced by software testers while testing the PSR attributes of web applications. There is less attention sought regarding the PSR of web applications in the prior studies.

6.4 Future work

The study provides a knowledge regarding the metrics, tools and challenges re- lated to PSR testing of web applications. As from the obtained results, it is clear that more research has carried on the performance attribute of the web applica- tions. Whereas the research available on the scalability and reliability attributes of the web applications are very less. Future research can be carried out in four different areas such as it can be done on the scalability and reliability testing of web applications, a systematic literature review can be conducted in order to identify all the information related to the performance testing of the web applica- tions, a research can be conducted from more number of companies to identify the metrics and tools available for all the non-functional attributes and a research can be conducted to investigate the difficulty level in testing these PSR attributes. Bibliography

[1] V. Varadharajan. Evaluating the Performance and Scalability of Web Appli- cation Systems. Third International Conference on Information Technology and Applications (ICITA’05), 1:111–114, 2005. [2] B. M. Subraya, S. V. Subrahmanya, J. K. Suresh, and C. Ravi. Pepper: a new model to bridge the gap between user and perceptions. In Computer Software and Applications Conference, 2001. COMPSAC 2001. 25th Annual International, pages 483–488, 2001. [3] Md Safaet Hossain. Performance evaluation web testing for ecommerce web sites. In Informatics, Electronics & Vision (ICIEV), 2012 International Con- ference on, pages 842–846. IEEE, 2012. [4] B. M. Subraya and S. V. Subrahmanya. Object driven performance testing of web applications. In Quality Software, 2000. Proceedings. First Asia-Pacific Conference on, pages 17–26, 2000. [5] Amira Ali and Nagwa Badr. Performance testing as a service for web ap- plications. In 2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems (ICICIS), pages 356–361. IEEE, 2015. [6] Hossein Nikfard, Ibrahim. A Comparative Evaluation of approaches for Web Application Testing. International Journal of Soft Computing and Software Engineering [JSCSE], 3(3):333–341, 2013. [7] Elder Rodrigues, Maicon Bernardino, Leandro Costa, Avelino Zorzo, and Flavio Oliveira. PLeTsPerf - A Model-Based Performance Testing Tool. 2015 IEEE 8th International Conference on Software Testing, Verification and Val- idation (ICST), pages 1–8, 2015. [8] M Pinzger and G Kotsis. AWPS - Simulation based automated web perfor- mance analysis and prediction. Proceedings - 7th International Conference on the Quantitative Evaluation of Systems, QEST 2010, (c):191–192, 2010.

[9] Jeff Tian and Li Ma. Web testing for reliability improvement. Advances in Computers, 67:177–224, 2006.

93 BIBLIOGRAPHY 94

[10] Thanh Nguyen. Using control charts for detecting and understanding per- formance regressions in large software. Proceedings - IEEE 5th International Conference on Software Testing, Verification and Validation, ICST 2012, pages 491–494, 2012.

[11] Ping Li, Dong Shi, and Jianping Li. Performance test and bottle analysis based on scientific research management platform. 2013 10th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), pages 218–221, 2013. [12] J. Križanić, A. Grgurić, M. Mošmondor, and P. Lazarevski. Load testing and performance monitoring tools in use with ajax based web applications. In MIPRO, 2010 Proceedings of the 33rd International Convention, pages 428–434, May 2010.

[13] Lakshmi S. Iyer, Babita Gupta, and Nakul Johri. Performance, scalability and reliability issues in web applications. Industrial Management & Data Systems, 105(5):561–576, June 2005. [14] Giuseppe A. Di Lucca and Anna Rita Fasolino. Testing Web-based appli- cations: The state of the art and future trends. Information and Software Technology, 48(12):1172–1186, 2006. [15] Anna Rita Fasolino, Domenico Amalfitano, and Porfirio Tramontana. Web application testing in fifteen years of WSE. Proceedings of IEEE International Symposium on Web Systems Evolution, WSE, pages 35–38, 2013. [16] Rizal Suffian, Dhiauddin. Performance testing: Analyzing differences of re- sponse time between performance testing tools. In Computer & Information Science (ICCIS), 2012 International Conference on, volume 2, pages 919–923. IEEE, 2012.

[17] Xingen Wang, Bo Zhou, and Wei Li. Model-based load testing of web appli- cations. Journal of the Chinese Institute of Engineers, 36(1):74–86, 2013. [18] H.M.a Aguiar, J.C.a Seco, and L.b Ferrão. Profiling of real-world web ap- plications. In PADTAD 2010 - International Workshop on Parallel and Dis- tributed Systems: Testing, Analysis, and Debugging, pages 59–66, 2010.

[19] Amal Ibrahim. Quality Testing. Signal Processing, pages 1071–1076, 2007. [20] Arora A and Sinha M. Web Application Testing: A Review on Techniques, Tools and State of Art. International Journal of Scientific & Engineering Research, 3(2):1–6, 2012. BIBLIOGRAPHY 95

[21] Jordi Guitart, Vicenç Beltran, David Carrera, Jordi Torres, and Eduard Ayguadé. Characterizing secure dynamic web applications scalability. Pro- ceedings - 19th IEEE International Parallel and Distributed Processing Sym- posium, IPDPS, 2005. [22] Chia Hung Kao, Chun Cheng Lin, and Juei-Nan Chen. Performance Test- ing Framework for REST-Based Web Applications. 2013 13th International Conference on Quality Software, pages 349–354, 2013. [23] Hamed O. and Kafri N. Performance testing for web based application archi- tectures (.NET vs. Java EE). 2009 1st International Conference on Networked Digital Technologies, NDT 2009, pages 218–224, 2009. [24] M. Kalita, S. Khanikar, and T. Bezboruah. Investigation on performance testing and evaluation of PReWebN: a Java technique for implementing web application. IET Software, 5(5):434, 2011. [25] Kunhua Zhu, Junhui Fu, and Yancui Li. Research the performance test- ing and performance improvement strategy in web application. In 2010 2nd International Conference on Education Technology and Computer, volume 2, pages 328–332, June 2010.

[26] Richard Berntsson Svensson, Tony Gorschek, Björn Regnell, Richard Torkar, Ali Shahrokni, and Robert Feldt. Quality Requirements in Industrial Practice - An Extended Interview Study at Eleven Companies. IEEE Transactions on Software Engineering, 38(4):923–935, 2012. [27] Tasha Hollingsed and David G. Novick. Usability inspection methods after 15 years of research and practice. In Proceedings of the 25th Annual ACM International Conference on Design of Communication, SIGDOC ’07, pages 249–255. ACM, 2007.

[28] Muhammad Junaid Aamir and Awais Mansoor. Testing web application from usability perspective. In Computer, Control & Communication (IC4), 2013 3rd International Conference on, pages 1–7. IEEE, 2013. [29] Connie U Smith and Lloyd G Williams. Building responsive and scalable web applications. In Int. CMG Conference, pages 127–138, 2000. [30] Giovanni Denaro, Andrea Polini, and Wolfgang Emmerich. Early perfor- mance testing of distributed software applications. In ACM SIGSOFT Soft- ware Engineering Notes, volume 29, pages 94–103. ACM, 2004. [31] Sangeeta Phogat and Kapil Sharma. A statistical view of software reliability and modeling. In Computing for Sustainable Global Development (INDIA- Com), 2015 2nd International Conference on, pages 1726–1730. IEEE, 2015. BIBLIOGRAPHY 96

[32] Tanjila Kanij, Robert Merkel, and John Grundy. Performance assessment metrics for software testers. In 2012 5th International Workshop on Co- operative and Human Aspects of Software Engineering, CHASE 2012 - Pro- ceedings, pages 63–65, 2012. [33] Niclas Snellman, Adnan Ashraf, and Ivan Porres. Towards Automatic Per- formance and Scalability Testing of Rich Internet Applications in the Cloud. 2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications, pages 161–169, 2011. [34] Akshay and Nikhil. Thesis project plan. pages 1–15, 2016.

[35] Serdar Doğan, Aysu Betin-Can, and Vahid Garousi. Web application testing: A systematic literature review. Journal of Systems and Software, 91:174–201, 2014.

[36] Xiaokai Xia, Qiuhong Pei, Yongpo Liu, Ji Wu, and Chao Liu. Multi-level logs based web performance evaluation and analysis. ICCASM 2010 - 2010 In- ternational Conference on Computer Application and System Modeling, Pro- ceedings, 4(Iccasm):37–41, 2010. [37] Deepak Dagar and Amit Gupta. Performance testing and evaluation of web applications using wapt pro. International Journal of Innovative Research in Computer and Communication Engineering, 3(7):6965–6975, 2015.

[38] Tyagi Rina. A Comparative Study of Performance Testing Tools. Inter- national Journal of Advanced Research in and Software Engineering, 3(5):1300–1307, 2013. [39] R Manjula and Eswar Anand Sriram. Reliability evaluation of web applica- tions from click-stream data. International Journal of Computer Applications, 9(5):23–29, 2010.

[40] Fei Wang and Wencai Du. A test automation framework based on WEB. Proceedings - 2012 IEEE/ACIS 11th International Conference on Computer and Information Science, ICIS 2012, pages 683–687, 2012. [41] Isha Arora. A Brief Survey on Web Application Performance Testing Tools Literature Review. International Journal of Latest Trends in Engineering and Technology, 5(3):367–375, 2015. [42] Vahid Garousi, Ali Mesbah, Aysu Betin-Can, and Shabnam Mirshokraie. A systematic mapping study of web application testing. Information and Software Technology, 55(8):1374–1396, 2013. BIBLIOGRAPHY 97

[43] Junzan Zhou, Shanping Li, Zhen Zhang, and Zhen Ye. Position paper. Pro- ceedings of the 2013 international workshop on Hot topics in cloud services - HotTopiCS ’13, (April):55, 2013. [44] C. Kallepalli and J. Tian. Usage for statistical Web testing and reliability analysis. Proceedings Seventh International Software Metrics Symposium, pages 148–158, 2001. [45] Per Runeson and Martin Höst. Guidelines for conducting and reporting case study research in software engineering. Empirical Software Engineering, 14(2):131–164, 2009.

[46] Kai Petersen, Robert Feldt, Shahid Mujtaba, and Michael Mattsson. System- atic mapping studies in software engineering. In 12th international conference on evaluation and assessment in software engineering, volume 17, pages 1–10. sn, 2008.

[47] Martin N Marshall. Sampling for qualitative research. Family practice, 13(6):522–526, 1996.

[48] Daniela S. Cruzes and Tore Dyba. Recommended Steps for Thematic Syn- thesis in Software Engineering. 2011 International Symposium on Empirical Software Engineering and Measurement, (7491):275–284, 2011.

[49] V. Braun and V. Clarke. Using thematic analysis in . Qualitative Research in Psychology, 3(May 2015):77–101, 2006. [50] Emilia Mendes, Nile Mosley, and Steve Counsell. Web metrics - estimating design and authoring effort. IEEE Multimedia, 8(1):50–57, 2001. [51] Fredrik Abbors, Tanwir Ahmad, Dragos Truscan, and Ivan Porres. MBPeT : A Model-Based Performance Testing Tool. Fourth International Conference on Advances in System Testing and Validation Lifecycle, (c):1–8, 2012. [52] A Arkles and D Makaroff. MT-WAVE: Profiling multi-tier web applications. In ICPE’11 - Proceedings of the 2nd Joint WOSP/SIPEW International Con- ference on Performance Engineering, pages 247–258, 2011. [53] Wu Gongxin Jinlong Gao, Tiantian. A Reactivity-based Framework of Au- tomated Performance Testing for Web Applications. In 2010 Ninth Interna- tional Symposium on Distributed Computing and Applications to Business, Engineering and Science, pages 593–597. IEEE, 2010. [54] L. Xu, W. Zhang, and L. Chen. Modeling users’ visiting behaviors for web load testing by continuous time markov chain. In Web Information Systems and Applications Conference (WISA), 2010 7th, pages 59–64, Aug 2010. BIBLIOGRAPHY 98

[55] Aida Shojaee, Nafiseh Agheli, and Bahareh Hosseini. Cloud-based load test- ing method for web services with vms management. In 2015 2nd International Conference on Knowledge-Based Engineering and Innovation (KBEI), pages 170–176. IEEE, 2015.

[56] Muhammad Arslan, Usman Qamar, Shoaib Hassan, and Sara Ayub. Au- tomatic performance analysis of cloud based load testing of web-application & its comparison with traditional load testing. In Software Engineering and Service Science (ICSESS), 2015 6th IEEE International Conference on, pages 140–144. IEEE, 2015.

[57] S. Kiran, A. Mohapatra, and R. Swamy. Experiences in performance test- ing of web applications with unified authentication platform using jmeter. In Technology Management and Emerging Technologies (ISTMET), 2015 Inter- national Symposium on, pages 74–78, Aug 2015. [58] Xiuxia Quan and Lu Lu. Session-based performance test case generation for web applications. In Supply Chain Management and Information Systems (SCMIS), 2010 8th International Conference on, pages 1–7. IEEE, 2010. [59] Diwakar Krishnamurthy, Mahnaz Shams, and Behrouz H Far. A model- based performance testing toolset for web applications. Engineering Letters, 18(2):92, 2010.

[60] Guangzhu Jiang and Shujuan Jiang. A quick testing model of web perfor- mance based on testing flow and its application. In 2009 Sixth Web Informa- tion Systems and Applications Conference, pages 57–61. IEEE, 2009. [61] Junzan Zhou, Bo Zhou, and Shanping Li. LTF: A Model-Based Load Testing Framework for Web Applications. 2014 14th International Conference on Quality Software, pages 154–163, 2014. [62] Christof Lutteroth and Gerald Weber. Modeling a realistic workload for performance testing. Proceedings - 12th IEEE International Enterprise Dis- tributed Object Computing Conference, EDOC 2008, pages 149–158, 2008. [63] Manar Abu Talib, Emilia Mendes, and Adel Khelifi. Towards reliable web applications: Iso 19761. In IECON 2012-38th Annual Conference on IEEE Industrial Electronics Society, pages 3144–3148. IEEE, 2012. [64] Yuta Maezawa, Kazuki Nishiura, Hironori Washizaki, and Shinichi Honiden. Validating ajax applications using a delay-based mutation technique. In Pro- ceedings of the 29th ACM/IEEE international conference on Automated soft- ware engineering, pages 491–502. ACM, 2014. BIBLIOGRAPHY 99

[65] FA Torkey, Arabi Keshk, Taher Hamza, and Amal Ibrahim. A new method- ology for web testing. In Information and Technology, 2007. ICICT 2007. ITI 5th International Conference on, pages 77–83. IEEE, 2007. [66] Yunming Pu, Mingna Xu, Pu Yunming, Xu Mingna, and Xu M Pu Y. Load testing for web applications. 2009 1st International Conference on Informa- tion Science and Engineering, ICISE 2009, (1):2954–2957, 2009. [67] R Thirumalai Selvi and N V Balasubramanian. Performance Measurement of Web Applications Using Automated Tools. I:13–16, 2013.

[68] J.W. Cane. Performance of Web applications. IEEE South- eastCon, 2003. Proceedings., 55(5):1599–1605, 2003. [69] Joydeep Mukherjee, Mea Wang, and Diwakar Krishnamurthy. Performance Testing Web Applications on the Cloud. 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation Workshops, pages 363–369, 2014.

[70] M. R. Dhote and G. G. Sarate. Performance testing complexity analysis on ajax-based web applications. IEEE Software, 30(6):70–74, Nov 2013. [71] Qinglin Wu and Yan Wang. Performance testing and optimization of J2EE- based web applications. 2nd International Workshop on Education Technology and Computer Science, ETCS 2010, 2:681–683, 2010. [72] Supriya Gupta and Lalitsen Sharma. Performance analysis of internal vs. external security mechanism in web applications. Int. J. Advan. Network Applic, 1(05):314–317, 2010. [73] R Thirumalai Selvi, Sudha, N V Balasubramanian, and E N G Ia. Per- formance analysis of proprietary and non-. Imecs 2008: International Multiconference of Engineers and Computer Scientists, Vols I and Ii, I:982–984, 2008. [74] Harry M. Sneed and Shihong Huang. WSDLTest - A tool for testing web services. Proceedings of the Eighth IEEE International Symposium on Web Site Evolution, WSE 2006, pages 14–21, 2006. [75] Jianfeng Yang, Rui Wang, Zhouhui Deng, and Wensheng Hu. Web software reliability analysis with Yamada exponential testing-effort. ICRMS’2011 - Safety First, Reliability Primary: Proceedings of 2011 9th International Con- ference on Reliability, and Safety, pages 760–765, 2011. [76] G. Ruffo, R. Schifanella, M. Sereno, and R. Politi. Walty: a tool for eval- uating web application performance. In Quantitative Evaluation of Systems, BIBLIOGRAPHY 100

2004. QEST 2004. Proceedings. First International Conference on the, pages 332–333, Sept 2004.

[77] Filippo Ricca and Paolo Tonella. Testing processes of web applications. Annals of Software Engineering, 14(1):93–114, 2002. [78] K. I. Pun and Y. W. Si. Audit trail analysis for traffic intensive web ap- plication. In e-Business Engineering, 2009. ICEBE ’09. IEEE International Conference on, pages 577–582, Oct 2009. [79] Filippo Ricca. Analysis, testing and re-structuring of Web applications. IEEE International Conference on Software Maintenance, ICSM, pages 474– 478, 2004.

[80] Arlitt Martin Hashemian, Krishnamurthy. Overcoming bench- marking challenges in the multi-core era. Proceedings - IEEE 5th Interna- tional Conference on Software Testing, Verification and Validation, ICST 2012, pages 648–653, 2012. [81] Mehul Nalin Vora. A Nonintrusive Approach to Estimate Web Server Re- sponse Time. International Journal of Computer and Electrical Engineering, 5(1):93–97, 2013.

[82] R. Aganwal, B. Ghosh, S. Banerjee, and S. Kishore Pal. Ensuring website quality: a case study. In Management of Innovation and Technology, 2000. ICMIT 2000. Proceedings of the 2000 IEEE International Conference on, vol- ume 2, pages 664–670, 2000.

[83] J Zinke, J Habenschuß, and B Schnor. Servload: Generating representative workloads for . Simulation Series, 44(BOOK 12):82– 89, 2012.

[84] S. Mungekar and D. Toradmalle. W taas: An architecture of website analysis in a cloud environment. In Next Generation Computing Technologies (NGCT), 2015 1st International Conference on, pages 21–24, Sept 2015. [85] A. Keshk and A. Ibrahim. Ensuring the Quality Testing of Web Using a New Methodology. 2007 IEEE International Symposium on Signal Processing and Information Technology, 2007. [86] Hidam Kumarjit Singh and Tulshi Bezboruah. Performance metrics of a customized web application developed for monitoring sensor data. In 2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS), pages 157–162. IEEE, jul 2015. BIBLIOGRAPHY 101

[87] Hugo Saba, Eduardo Manuel, De Freitas Jorge, and Victor Franco Costa. Webteste: a Stress Test Tool. Proceedings of WEBIST 2006 - Second In- ternational Conference on Web Information Systems and Technologies, pages 246–249, 2006.

[88] I.a Jugo, D.b Kermek, and A.a Meštrović. Analysis and evaluation of web application performance enhancement techniques. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lec- ture Notes in Bioinformatics), 8541:40–56, 2014.

[89] Wen-Kui Chang and Shing-Kai Hon. Evaluating the Performance of a Web Site via Queuing Theory, pages 63–72. Springer Berlin Heidelberg, 2002. [90] R. Srinivasa Perumal and P. Dhavachelvan. Performance Analysis of Dis- tributed Web Application: A Key to High Perform Computing Perspective. In 2008 First International Conference on Emerging Trends in Engineering and Technology, pages 1140–1145. IEEE, 2008. [91] Zhang Huachuan, Xu Jing, and Tian Jie. Research on the parallel algo- rithm for self-similar network traffic simulation. Proceedings - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009, pages 355–359, 2009. [92] C. Kallepalli and J. Tian. Usage measurement for statistical Web testing and reliability analysis. Proceedings Seventh International Software Metrics Symposium, 27(11):148–158, 2001. [93] Kaiyu Wang and Naishuo Tian. Performance evaluation of J2EE web appli- cations with queueing networks. Proceedings - 2009 International Conference on Information Technology and Computer Science, ITCS 2009, 1:437–440, 2009.

[94] Rizal Suffian, Dhiauddin. Performance testing: Analyzing differences of re- sponse time between performance testing tools. In Computer & Information Science (ICCIS), 2012 International Conference on, volume 2, pages 919–923. IEEE, 2012.

[95] Manish Rajendra Dhote and GG Sarate. Performance testing complexity analysis on ajax-based web applications. Software, IEEE, 30(6):70–74, 2013. [96] Reza NasiriGerdeh, Negin Hosseini, Keyvan RahimiZadeh, and Morteza AnaLoui. Performance analysis of web application in xen-based virtualized environment. In Computer and Knowledge Engineering (ICCKE), 2015 5th International Conference on, pages 256–261. IEEE, 2015. BIBLIOGRAPHY 102

[97] Vipul Mathur, Preetam Patil, Varsha Apte, and Kannan M Moudgalya. Adaptive admission control for web applications with variable capacity. In Quality of Service, 2009. IWQoS. 17th International Workshop on, pages 1– 5. IEEE, 2009.

[98] Ana Cavalli, Stephane Maag, and Gerardo Morales. Regression and perfor- mance testing of an e-learning web application: Dotlrn. Proceedings - Interna- tional Conference on Signal Image Technologies and Internet Based Systems, SITIS 2007, pages 369–376, 2007. [99] John W Cane. Measuring performance of web applications: empirical tech- niques and results. In SoutheastCon, 2004. Proceedings. IEEE, pages 261–270. IEEE, 2004.

[100] Elhadi Shakshuki, Chao Chen, Yihai Chen, Huaikou Miao, and Hao Wang. Usage-pattern based statistical web testing and reliability measurement. Pro- cedia Computer Science, 21:140 – 147, 2013. [101] Shyaamini B and Senthilkumar M. A novel approach for performance test- ing on web application services. volume 10, pages 38679–38683. Research India Publications, 2015.

[102] Kai Lei, Yining Ma, and Zhi Tan. Performance comparison and evaluation of web development technologies in php, python, and node. js. In Computa- tional Science and Engineering (CSE), 2014 IEEE 17th International Con- ference on, pages 661–668. IEEE, 2014. [103] Rigzin Angmo and Mukesh Sharma. Performance evaluation of web based automation testing tools. In Confluence The Next Generation Information Technology Summit (Confluence), 2014 5th International Conference-, pages 731–735. IEEE, 2014.

[104] Martti Vasar, Satish Narayana Srirama, and Marlon Dumas. Framework for monitoring and testing web application scalability on the cloud. In Proceedings of the WICSA/ECSA 2012 Companion Volume, WICSA/ECSA ’12, pages 53–60. ACM, 2012.

[105] Izzat Alsmadi, Ahmad T. Al-Taani, and Nahed Abu Zaid. Web structural metrics evaluation. Proceedings - 3rd International Conference on Develop- ments in eSystems Engineering, DeSE 2010, pages 225–230, 2010. [106] Thirumalai Selvi, N. V. Balasubramanian, and P. Sheik Abdul Khader. Quantitative evaluation of frameworks for web applications. International Journal of Computer, Electrical, Automation, Control and Information Engi- neering, 4(4):708 – 713, 2010. BIBLIOGRAPHY 103

[107] Breno Lisi Romano, Gláucia Braga E Silva, Henrique Fernandes De Cam- pos, Ricardo Godoi Vieira, Adilson Marques Da Cunha, Fábio Fagundes Silveira, and Alexandre Carlos Brandão Ramos. Software testing for web- applications non-functional requirements. ITNG 2009 - 6th International Conference on Information Technology: New Generations, pages 1674–1675, 2009.

[108] Elder M. Rodrigues, Rodrigo S. Saad, Flavio M. Oliveira, Leandro T. Costa, Maicon Bernardino, and Avelino F. Zorzo. Evaluating capture and replay and model-based performance testing tools: An empirical comparison. In Proceed- ings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM ’14, pages 9:1–9:8. ACM, 2014. [109] Minzhi Yan, Hailong Sun, Xu Wang, and Xudong Liu. Building a taas platform for web service load testing. In Cluster Computing (CLUSTER), 2012 IEEE International Conference on, pages 576–579. IEEE, 2012. [110] Xingen Wang, Bo Zhou, and Wei Li. Model based load testing of web ap- plications. Proceedings - International Symposium on Parallel and Distributed Processing with Applications, ISPA 2010, pages 483–490, 2010. [111] Sara Sprenkle, Holly Esquivel, Barbara Hazelwood, and Lori Pollock. We- bVizOR: A tool for applying automated oracles and analyzing test results of web applications. Proceedings - Testing: Academic and Indus- trial Conference Practice and Research Techniques, TAIC PART 2008, pages 89–93, 2008.

[112] Marc Guillemot and Dierk König. Web testing made easy. In Compan- ion to the 21st ACM SIGPLAN Symposium on Object-oriented Programming Systems, Languages, and Applications, OOPSLA ’06, pages 692–693. ACM, 2006.

[113] Harry M Sneed and Shihong Huang. The design and use of wsdl-test: a tool for testing web services. Journal of Software Maintenance and Evolution: Research and Practice, 19(5):297–314, 2007. [114] Yasuyuki Fujita, Masayuki Murata, and Hideo Miyahara. Performance modeling and evaluation of web server systems. Electronics and Communica- tions in Japan(Part II Electronics), 83(12):12–23, 2000. [115] Jianhua Hao and Emilia Mendes. Usage-based statistical testing of web ap- plications. Proceedings of the 6th international conference on Web engineering, pages 17–24, 2006.

[116] Sebastian Lehrig, Hendrik Eikerling, and Steffen Becker. Scalability, Elas- ticity, and Efficiency in Cloud Computing: A Systematic Literature Review BIBLIOGRAPHY 104

of Definitions and Metrics. Proceedings of the 11th International ACM SIG- SOFT Conference on Quality of Software Architectures, (MAY):83–92, 2015. [117] Samer Al-Zain, Derar Eleyan, and Joy Garfield. Automated user interface testing for web applications and TestComplete. Proceedings of the CUBE International Information Technology Conference on - CUBE ’12, pages 350– 354, 2012.

[118] Tahani Hussain. An Approach to Evaluate the Performance of Web Ap- plication Systems. Proceedings of International Conference on Information Integration and Web-based Applications & Services - IIWAS ’13, pages 692– 696, 2013.

[119] Rubén Casado, Javier Tuya, and Muhammad Younas. Testing the reliability of web services transactions in cooperative applications. Proceedings of the 27th Annual ACM Symposium on Applied Computing - SAC ’12, page 743, 2012. Appendices

105 Appendix A Systematic maps

Figure A.1: Research parameters vs research attributes in SMS

106 Appendix A. Systematic maps 107

Figure A.2: Research methods vs research attributes in SMS

Figure A.3: Research methods vs research parameters in SMS Appendix B SMS overview

Table B.1: SMS overview Author Name Quality At- Facet 1: Facet 2: Facet 3: tribute’s Metrics Tools Challenges J. Križani , A. Gr- Performance Provided Provided Provided guri , M. Mošmon- dor, P. Lazarevski [12] Performance Fei Wang, Wencai Not provided Provided Not provided Scalability Du [40] Chia Hung Kao, Performance Provided Provided Provided Chun Cheng Lin, Juei-Nan Chen [22] Pu Yunming, Xu Performance Provided Provided Not provided Mingna [66] Junzan Zhou, Bo Performance Provided Provided Provided Zhou, Shanping Li [61] R. Thirumalai Performance Provided Provided Not provided Selvi, N. V. Bal- asubramanian [67] John W. Cane [68] Performance Provided Not provided Not provided Performance, Osama Hamed, Reliability, Provided Provided Not provided Nedal Kafri [23] Scalability

108 Appendix B. SMS overview 109

Elder M. Rodrigues, Maicon Bernardino, Leandro T. Costa, Performance Provided Provided Not provided Avelino F. Zorzo, Flávio M. Oliveira [7] Joydeep Mukher- Performance Provided Provided Not provided jee, Mea Wang, Diwakar Krishna- murthy [69] Manish Rajendra Performance Not provided Provided Not provided Dhote, G.G. Sarate [70] Amira Ali, Nagwa Performance, Provided Provided Not provided Badr [5] Reliability Qinglin Wu, Yan Performance, Provided Provided Not provided Wang [71] Scalability Muhammad Dhi- Performance Provided Not provided Not provided auddin Mohamed Suffiani, Fairul Rizal Fahrurazi [16] Ping Li, Dong Shi, Performance Provided Provided Provided Jianping Li [11] Supriya Gupta, Performance Provided Provided Not provided Lalitsen Sharma [72] R. Thirumalai Performance Provided Provided Not provided Selvi, Sudha, N. V. Balasubramanian [73] Harry M. Sneed, Performance, Not provided Provided Not provided Shihong Huang [74] Reliability Thanh H. D. Performance Provided Not provided Not provided Nguyen [10] Zao-Bin GAN, Performance, Provided Provided Not provided Deng-Wen WEI, Scalability Vijay Varadhara- jan [1] Appendix B. SMS overview 110

Jianfeng Yang, Reliability Provided Not provided Not provided Zhouhui Deng, Rui Wang, Wensheng Hu [75] G. Ruffo, R. Schi- Performance, Provided Provided Not provided fanella, and M. Scalability Sereno [76] F. A Torkey, Arabi Performance, Provided Provided Provided Keshk, Taher Reliabilty Hamza, Amal Ibrahim [65] F. Ricca, P. Tonella Performance, Provided Provided Not provided [77] Reliabilty Performance, Jordi Guitart, Reliability, Provided Provided Provided Vicenç Beltran, Scalability David Carrera, Jordi Torres and Eduard Ayguadé [21] Performance, Tiantian Gao, Yu- Scalability, Provided Provided Provided jia Ge, Gongxin Reliability Wu, and Jinlong Ni [53] Ka-I Pun, Yain- Performance Provided Provided Provided Whar Si [78] Filippo Ricca [79] Reliability Provided Not provided Not provided P. Nikfard, S. bin Performance, Provided Not provided provided Ibrahim, M. Hos- Reliability sein [6] Diwakar Krishna- Performance Provided Provided Not provided murthy, Mahnaz Shams, Behrouz H. Far [80] Mehul Nalin Vora Performance, Provided Provided Not provided [81] Reliabilty Guangzhu Jiang, Performance, Provided Provided Provided Shujuan Jiang [60] Reliabilty Appendix B. SMS overview 111

Performance, Lakshmi S.Iyer, B. Scalability, Not provided Not provided Not provided Gupta, N. Johri Reliability [13] Niclas Snellman , Performance, Provided Provided Not provided Adnan Ashraf yz, Scalability Ivan Porres [33] Scalability, R. Aganwal, B. Reliability, Provided Provided Not provided Ghosh, S. Baner- Performance jee, S. Kishore Pal [82] Sandhya Kiran, Performance, Provided Provided Provided Akshyansu Mohap- Scalability atra, Rajashekara Swamy [57] scalability Jörg Zinke, Jan Provided Provided Not provided Performance Habenschuss, Bettina Schnor [83] M. Arslan, U. Qa- Performance Provided Provided provided mar, S. Hassan, S. Ayub [56] Performance, B.M. Subraya, S.V. Scalability, Provided Provided Not provided Subrahmanya, J.K. Reliabilty Suresh, C. Ravi [2] Christof Lutteroth, Performance Provided Provided Provided Gerald Weber [62] Kunhua Zhu, Jun- Performance, Provided Not provided Not provided hui Fu, Yancui Li Reliability [25] scalability M. Kalita, S. reliability Provided Provided Provided Khanikar, T. performance Bezboruah [24] scalability M. Kalita, T. reliability Provided Provided provided Bezboruah [24] performance Appendix B. SMS overview 112

Shraddha Performance, Not provided Provided Not provided Mungekar, Scalability Dhanashree Torad- malley [84] B.M. Subraya, S.V. Performance Provided Provided Not provided Subrahmanya [4] Arabi Keshk, Amal Performance, Provided Provided Not provided Ibrahim [85] Scalability Performance, Hidam Kumar- Scalability, Provided Provided Not provided jit Singh, Tulshi Reliability Bezboruah [86] Hugo Saba, Ed- Performance, Provided Provided Not provided uardo Manuel de Reliability Freitas Jorge, Vic- tor Franco Costa [87] Igor Jugo, Performance, Provided Provided Not provided Dragutin Kermek, Reliability Ana Mestrovi´c [88] G. Ruffo, R. Schi- Performance Provided Provided Not provided fanella, M. Sereno, R. Politi [76] Wen-Kui Chang, Performance Provided Provided Not provided Shing-Kai Hon [89] R. Srinivasa Peru- Performance, Provided Provided Not provided mal, P. Dhavachel- Reliabilty van [90] Weifeng Zhang, Performance Provided Provided Provided Lianjie Chen, Lei Xu [54] ZHANG Performance, Provided Provided Provided Huachuan, XU Reliabilty Jing, TIAN Jie [91] Chaitanya Performance, Provided Not provided Not provided Kallepalli, Jeff Reliability Tian [92] Md. Safaet Hossain Performance, Provided Not provided Provided [3] Reliability Appendix B. SMS overview 113

Xiaokai Xia, Qi- Performance Provided Not provided Not provided uhong Pei, Yongpo Liu, Ji Wu, Chao Liu [36] Kaiyu Wang, Performance Provided Provided Not provided Naishuo Tian [93] Martin Pinzger, Performance Provided Not provided Not provided Gabriele Kotsis [8] Xiuxia Quan, Lu Performance Not provided Provided Provided Lu [58] Muhammad Dhi- Performance Provided Provided Provided auddin Mohamed Suffiani, Fairul Rizal Fahrurazi [94] Manish Rajendra Performance, Not provided Provided Provided Dhote, G.G. Sarate Scalability [95] Reza Performance Provided Provided Provided NasiriGerdeh’l, Ne- gin Hosseinit, Key- van RahimiZadeh, Morteza AnaLoui [96] Vipul Mathur, Performance Provided Not provided Not provided Preetam Patil, Varsha Apte and Kannan M. Moudgalya [97] A. Shojaee, N. Performance Provided Provided Provided Agheli , B. Hos- seini [55] Ana Cavalli, Performance, Provided Provided Not provided Stephane Maag, Scalability Gerardo Morales [98] John W. Cane [99] Performance Not provided Provided Not provided Chao Chen, Yihai Reliability Provided Provided provided Chena, Huaikou Miao, Hao Wang [100] Appendix B. SMS overview 114

Manar Abu Talib, Reliability Not provided Not provided Provided Emilia Mendes, Adel Khelifi [63] Raoufehsadat Performance, Provided Provided Not provided Hashemian, Di- Scalability wakar Krishna- murthy, Martin Arlitt [80] Mahnaz Shams, Performance Provided Provided provided Diwakar Krishna- murthy , Behrouz Far [59] Ms.B.Shyaamini, Performance Provided Not provided Not provided Dr.M.Senthilkumar [101] Kai Lei1, Yining Performance, Provided Provided Not provided Ma, Zhi Tan [102] Scalability Ms. Rigzin Angmo, Performance Not provided Provided Not provided Mrs. Monika Sharma [103] Fredrik Abbors, Performance Provided Provided Provided Tanwir Ahmad, Dragos¸ Trus¸can, Ivan Porres [51] Martti Vasar, Scalability, Per- Provided Provided Not provided Satish Narayana formance Srirama, Marlon Dumas [104] Izzat Alsmadi, Ah- Performance, Provided Not provided Not provided mad T. Al-Taani, Reliability and Nahed Abu Zaid [105] Thirumalai Selvi, Performance Provided Provided Not provided N. V. Balasub- ramanian, and P. Sheik Abdul Khader [106] Appendix B. SMS overview 115

Breno Lisi Romano, Gláucia Braga scalability e Silva, reliability Provided Provided Not provided Henrique Fernandes performance de Campos, Ricardo [107] Chaitanya Reliability Provided Provided Not provided Kallepalli and Jeff Tian [44] Yuta Maezawa, Reliability Not provided Provided Provided Kazuki Nishiura, Shinichi Honiden, Hironori Washizaki [64] Elder M. Ro- Performance Not provided Provided Not provided drigues, Flavio M. Oliveira,. Maicon Bernardino, Ro- drigo S. Saad, Leandro T. Costa,Avelino F. Zorzo [108] Minzhi Yan, Hai- Performance Provided Provided Provided long Sun, Xu Wang, Xudong Liu [109] M. Kalita1, T. Performance, Provided Provided Not provided Bezboruah [24] Scalability, Reliability Anthony Arkles, Performance Not provided Provided Provided Dwight Makaroff [52] Xingen Wang, Bo Performance Provided Provided provided Zhou, Wei Li [110] Sara Sprenkle†, NA Not provided Provided provided Holly Esquivel, Barbara Hazel- wood, Lori Pollock [111] Marc Guillemot, Performance, re- Not provided Not provided Not provided Dierk König [112] liability Appendix B. SMS overview 116

Martin Arlitt, Performance Provided Provided Not provided Carey Williamson [80] Harry M. Sneed, Performance Not provided Provided provided Shihong Huang [113] Yasuyuki, Performance Provided Not provided Not provided Masayuki Mu- rata and Hideo Miyahara [114] Jianhua Hao, Reliability Provided Provided Not provided Emilia Mendes [115] Hugo Menino, Performance Not provided Not provided Not provided Aguiar João Costa Seco, Lúcio Ferrão [18] Sebastian Lehrig Scalability Provided Not provided Not provided ,Hendrik Eikerling, Steffen Becker [116] Samer Al-Zain, Performance Provided Provided Not provided Derar Eleyan, Joy Garfield [117] Tahani Hussain Performance Provided Provided Not provided [118] Rubén Casado, Reliability Provided Not provided Not provided Javier Tuya, Muhammad Younas [119] Appendix C List of metrics

Table C.1: Metrics description Metric Name Metric Description Response time Time taken from the request provided by user until the last character of response received [12, 25] Throughput Number of requests received per second by a net- work or server [12, 25] Number of concurrent Total number of users using the application at a users given period of time [12] CPU utilization Amount of work handled by CPU in order to ex- ecute task [12] Disk I/O (access) NA Memory utilization Amount of physical memory or (RAM) consumed by a process [12, 25] Number of transac- Total number of transactions completed in a sec- tions per sec(http) ond [12] Resource utilization Amount of resources utilized by a task MTBF Average time between failures of a application [65] MTTR Average time required to repair the failed appli- cation [65] Latency The time taken for sending a packet and receiving the packet sent by sender Processor time Amount of time the CPU is utilized by a appli- cation or task Think time Time the user pause between performing task Ramp up and ramp Ramup increases load on server and measure down breakpoint and Rampdown is decreasing the load gradually inorder to recover from ramup Disk space Amount of memory available in logical disk [12] Number of hits per sec The total number of hits on a web server for each second

117 Appendix C. List of metrics 118

Number of errors Total number of errors in an application Errors Percentage The number of samples failing (Percentage of re- quests with errors) Error ratio The number of samples failing/total no of samples passed MTTF Average time the application work before it fails [65] Failure rate (request) The frequency of failures as per number of re- quests [65] Number of HTTP re- Total number of requests received by a server in quests a given unit of time Capacity NA Load time Time taken to load the web page at client side Disk I/O transactions Read or write transactions per second and bytes per second Hit ratio Ratio of number of cache hits to number of misses [65] Page load time and re- Time taken to load a web page in seconds and quest response time request response time is time for single request Availability Probability of the application work when required during a period of time Roundtrip time The total time between the data sent and data received Cache hit Cache hit is the data required is found in cache memory [65] Cache hit ratio Ratio of number of cache hits to number of misses Network latency Time taken to transmit one packet of data from source to destination Physical disk Time Amount of time the read and write request are executed by disk Successful or Failed Total number of succesful hits and failed hits [65] Hits Number of connec- Total number of connections requested to server tions per sec (user) in a given second Number of deadlocks Measure of frequency of deadlocking in database Elapsed time (disk) Time between the request and response transmis- sion process Number of sessions Total number of times a users with unique IP address acceseed application [25] Number of requests The total number of requests received by a server per sec in a given second Appendix C. List of metrics 119

Hit value NA Execution time Time taken to execute a particular request Session time The time between the user enters and leaves the application Disk queue Average number of read and write request queued length(request) in the selecte disk [25] Transaction time A period of time where a data entry Disk utilization The usage of disk space by the application Session length The total time between the user enter and leaves the application [25] Successful requests The number of connection who have received the rate response for their requests Connect time The elapsed time the user connected to network or application Number of connec- The number of connections rejected by server and tion errors, Number of number of timeouts at client side due to time limit timeouts Request rate The number of requests requested to a server at a given time Computing power Metric is related to how fast a or server can perform a task Processing speed Number of instructions the computer executes per second Cache memory usuage NA Number of successful Total number of virtual users created to perform virtual users load testing Available memory The amount of physical memory which is free or not using by any resoure Requests in bytes per Number of bytes transmitted per a request from sec a server [25] Rate of successfully The total number of useful information bits per completed requests request delivered to a certain destination in a unit (goodput) time Connection The time taken for the user to connect with server time(server) Load distribution NA Rendezvous point Point where all expected users wait until all are emulated, and then all virtual users send request at one time Speed Speed at which processor executes instruction re- ceived Appendix C. List of metrics 120

Queue percentage Percentage of work queue size currently in use [25] Appendix D List of tools

TID Developer Tool Name Platform Support Programming Language Tool Type Source Reference Link Availability Attribute 1 Apache Apache JMeter™ Cross-platform Java Manual Open Source http://jmeter.apache.org/ Available Performance (Load testing) 2 HP LoadRunner Windows, C Automated Commercial http://www8.hp.com/us/en/software-solutions/loadrunner-load-testing/index.html?jumpid=va_uwxy6ce9tr Available Performance(Load testing) 3 Parasoft WebKing Windows, Linux, Solaris Java Automated Commercial https://www.parasoft.com/press/simplifies-functional-testing-in-webking/ Not Available Performance(Load testing), Reliability ESnet / Lawrence 4 Berkeley National iPerf Cross-platform C Manual Open Source https://iperf.fr/ Available Performance Laboratory 5 Ericsson Tsung Cross-platform Erlang Manual Open Source http://tsung.erlang-projects.org/ Available Performance (Load testing, Stress), Scalability 6 Softlogica WAPT Windows NA Automated Freeware http://www.loadtestingtool.com/index.shtml Available Performance(Load, Stress) 7 Cyrano openSTA Windows C++ Manual Open Source http://opensta.org/ Available Performance(Load, Stress) 8 Parasoft SOAtest Cross-platform NA Manual and Automated Commercial https://www.parasoft.com/product/soatest/ Available Reliability, Performance(Load,stress) http://www.microsoft.com/downloads/details.aspx?FamilyID=e2c0585a-062a-439e-a67d- 9 Microsoft Microsoft Web Application Stress Tool Windows NA Manual and Automated Freeware Not Available Performance(Stress) 75a89aa36495&DisplayLang=en 10 HP httperf Linux, Windows C Manual Open Source http://www.labs.hpe.com/research/linux/httperf/ Available Performance(Load) 11 Paco Gomez The Grinder Independent Python or Jython Manual Open Source http://grinder.sourceforge.net/ Available Performance(Load) 12 RADVIEW WebLOAD Linux, Windows C++ Automated Freeware http://www.radview.com/ Available Performance(Load, stress), Scalability Micro Focus 13 Silk Performer Windows NA Automated Freeware http://www.borland.com/en-GB/Products/Software-Testing/Performance-Testing/Silk-Performer Available Performance(Load, stress), Scalability International 14 Paessler AG Webserver Stress Tool Windows NA Automated Freeware https://www.paessler.com/tools/webstress/features Available Performance(Load, stress) Micro Focus 15 QAload Windows NA Automated Freeware http://www.borland.com/en-GB/Products/Other-Borland-products/Qaload Not Available Performance(Load, stress), Scalability International The Wireshark team 16 Wireshark Cross-platform C, C++ Manual Open Source https://www.wireshark.org/ Available Performance Firebug Working 17 Firebug Cross-platform JavaScript, XUL, CSS Automated Open Source http://getfirebug.com/ Available Performance ( web page performance analysis) Group 18 John Levon Oprofile Cross-platform C Automated Open Source http://oprofile.sourceforge.net/news/ Available Performance ( performance counter monitor profiling tools ) 19 HP & Open source Xenoprof Linux NA Automated Open Source http://xenoprof.sourceforge.net/ Available Performance ( performance counter monitor profiling tools ) Open source standard 20 version, SmartBear SoapUI Cross-platform NA Automated Open Source & commercial https://www.soapui.org/ , https://sourceforge.net/projects/soapui/ Available Performance (load testing) Software Pro version CloudTest 21 SOASTA Cross-platform NA Automated Commercial http://www.soasta.com/load-testing/ Available Performance (load testing) & scalability 22 Mark Seger collectl Linux NA Manual Open source https://sourceforge.net/projects/collectl/ Available Performance (monitoring tool) ApacheBench 23 Apache Cross-platform NA Automated Open source https://httpd.apache.org/docs/2.4/programs/ab.html Available Performance (load testing) 24 SmartBear Software TestComplete Windows NA Automated Commercial https://smartbear.com/product/testcomplete/overview/ Available Performance, scalability and reliability MBPeT: A performance testing tool 25 TANWIR AHMAD NA NA Manual NA Not available Not available Performance and scalability

26 Florian Forster collectd Unix-like C Manual Open source http://collectd.org/ Available Performance (load testing) The Cacti Group, Inc. 27 Cacti Cross-platform PHP, MySQL Manual Open source http://www.cacti.net/ Available Performance

28 Mach5 FastStats Log File Analyzer Cross-platform NA Automated Commercial https://www.mach5.com/index.php Available Performance, Scalability and reliability analysis tool based on log 29 IBM Rational TestManager Windows, Linux NA Manual and Automated Commercial Not available Not Available Performance 30 Corey Goldberg Pylot NA Python Manual and Automated Open source http://www.pylot.org/ Available Peeformance and Scalability 31 CustomerCentrix loadstorm Cross-platform NA Automated and manual Commercial http://loadstorm.com/ Available Performance (load testing) Rational Performance Tester 32 Rational Software Windows, linux NA Automated Commercial http://www-03.ibm.com/software/products/en/performance Available Performance 33 Pushtotest Testmaker Cross-platform NA Automated and manual Commercial, Opensource, Trial http://www.pushtotest.com/intrototm.html Available Peeformance and Scalability Armstrong World 34 Siege Cross-platform NA Manual Open source https://www.joedog.org/siege-home Available Performance (load testing) Industries 36 LOADIMPACT AB LOADIMPACT Cross-platform NA Automated Commercial, Trial https://loadimpact.com/ Available Performance(Load testing) 37 Dynatrace Advanced Web Monitoring Scripting (KITE) Windows NA Automated Freeware http://www.keynote.com/solutions/monitoring/web-monitoring-scripting-tool Available Performance monitoring 38 Microsoft Visual Studio Windows C++, C# Manual, Automated Commercial, Trial https://www.visualstudio.com/en-us/features/testing-tools-vs.aspx Available Performance(Load testing, stress testing) 39 testoptimal testoptimal Windows, linux NA Automated Commercial, Trial http://testoptimal.com/ Available Performance(Load testing) 40 Westwind WebSurge Windows NA Manual OpenSource https://websurge.west-wind.com/ Available Performance(Load, stress) 41 Microsoft Application Center Test Windows NA Automated Freeware https://msdn.microsoft.com/en-us/library/aa287410(v=vs.71).aspx Available Performance(Stress, Load), Scalability 42 EMPIRIX e-TEST suite Windows, linux NA Automated Commercial http://www.empirix.com/ Not Available Performance, Reliability 43 Watir Watir-webdriver Cross-platform Ruby Automated Open source https://watir.com/ Available Performance 44 SeleniumHQ Selenium WebDriver Cross-platform Java Automated, Manual Freeware http://docs.seleniumhq.org/ Available Performance, Scalability AppPerfect 45 AppPerfect Load Test Cross-platform NA Automated Commercial http://www.appperfect.com/index.html Available Performance(Load, Stress) Corporation 46 Yahoo Yslow Cross-platform NA Manual Open Source http://yslow.org/ Available Performance analysis 47 neustar BrowserMob Cross-platform NA Automated Commercial, Trial NA Not Available Performance(Load), Scalability 48 Neotys NeoLoad Cross-platform Java Automated Commercial,Trial http://www.neotys.com/ Available Performance(Load, Stress) 49 Brendan gregg perf Linux C Manual NA https://perf.wiki.kernel.org/index.php/Main_Page Available Performance monitoring 50 Alon Girmonsky Blazemeter Cross-platform NA Automated Open source https://www.blazemeter.com/ Available Performance(Load) 51 Zabbix Company Zabbix Cross-platform C,PHP, Java Automated Open source http://www.zabbix.com/ Available Scalability(monitoring tool) 52 Ethan Galstad Nagios Cross-platform C Automated Open source https://www.nagios.org/ Available Scalability(monitoring tool) 53 Opsview Limited Opsview Linux,Solaris Perl, c, ExJS Automated trial, commercial https://www.opsview.com/ Available Scalability(monitoring tool) 54 Atlassian HyperHQ Cross-platform NA Automated Open source https://github.com/hyperic/hq Available Scalability(monitoring tool) HP, HP Software 55 HP QuickTest Professional Windows NA Automated Commercial http://www8.hp.com/us/en/software-solutions/unified-functional-automated-testing/index.html Available Performance(load) Division 56 Oracle Corporation Jconsole Windows NT, OS X, Linux,Solaris Java Automated Open source http://docs.oracle.com/javase/8/docs/technotes/guides/management/jconsole.html Available Performance PureLoad Software 57 Pure load enterprize Cross-platform NA Automated Commercial,Trial http://www.pureload.com/products-pureload Available Performance (load) Group AB 58 Ixia IxExplorer Windows NA Automated Commercial,Trial NA NA Performance (load) HP/Mercury HP Quality Center 59 or LINUX NA Automated Proprietary http://www8.hp.com/us/en/software-solutions/website-testing-stormrunner-load/index.html Available Performance (load) Interactive 60 IBM IBM RPT Windows, Linux NA Automated Commercial http://www-03.ibm.com/software/products/en/performance Available Performance (load) 61 Tyto software Sahi pro Cross platform Java and Javascript Automated Commercial http://sahipro.com/ Available Performance (load) 62 Vmware Vmware Vcenter Cross-platform NA Automated Commercial http://www.vmware.com/in/products/converter Available Performance and scalability Appendix E List of challenges

Table E.1: List of challenges Challenge Area Challenge Simulating the real,user behavior for testing User Improving the identified bottlenecks can improve the overall performance of the system or it leads to the cause of any another bottleneck due to the different user actions How users are reacting to different response times and what actions are being performed by the users related to server responses Tools,Improper environment,of tools i.e. system configuration, tool installation, tool setup„flexibility to perform test To create more number of virtual users as JMeter Tools only supports limited number of virtual users JMeter tool does not support the generation of test scripts JMeter script does not capture all the dynamic values, such as SAML Request, Relay State, Signature Algorithm, Authorization State, Cookie Time, Persistent ID (PID), JSession ID and Shibboleth, generated using single sign-on mechanism of Unified Authentication Platform. In JMeter it is not easy to simulate another sys- tem that is the main problem JMeter tool is unable to record test cases and it provides a very confusing charts.

122 Appendix E. List of challenges 123

Some of the tools available for testing the quality performance attributes only support the creation of simple test case scenarios. It may not be sufficient to know the transaction time and number of simultaneous user from these test case scenarios and also difficult to identify the bottlenecks existing in the application Tools using random user sessions and log file based sessions for simulating virtual users does not provide a real workload. Identifying,the existing dependencies between re- quests Metrics Selection of parameters and the criteria for test- ing is an important issue in performance testing Scalability testing related to the resources such as CPU, server, memory, and disk. Challenges related to the network connection and server processor Specifying the load test parameters like genera- tion of forms, recognition of the returned pages Handling,large number of test scenarios is a chal- lenge Development Challenge is to know the number of users may hit the site at the same time i.e. loading and turing challenges Challenges related to code i.e. unnecessary sleep statements, loops. Challenges related to the enhancement of test scripts Appendix F Interview questions

F.1 Technical Questions

• Do you have any previous experience in Web testing? If so, how many years?

• What is your experience with PSR testing?

• What are the subtypes of testing that you would perform in order to conduct the performance, Reliability, and Scalability testing?

F.1.1 Tools F.1.1.1 Performance • What are the tools used in your current company for conducting the perfor- mance testing of a web application?

• Are there any other tools that you have worked with for conducting perfor- mance testing?

• What are the additional tools that you would specify for performance testing?

• What are the main reasons for considering the specific tools among existing tools?

• What are the difficulties that you have faced while working with these tools?

• What are the drawbacks that you noticed in the specified tools?

• What are the suggestions/improvements that you would suggest?

F.1.1.2 Scalability • What are the tools used in your current company for conducting the scalability testing of a web application?

124 Appendix F. Interview questions 125

• Are there any other tools that you have worked with for conducting scalability testing?

• What are the additional tools that you would specify for scalability testing? • What are the main reasons for considering the specific tools among existing tools?

• What are the difficulties that you have faced while working with these tools? • What are the drawbacks that you noticed in the specified tools? • What are the suggestions/improvements that you would suggest?

F.1.1.3 Reliability • What are the tools used in your current company for conducting the reliability testing of a web application?

• Are there any other tools that you have worked with for conducting reliability testing?

• What are the additional tools that you would specify for reliability testing? • What are the main reasons for considering the specific tools among existing tools?

• What are the difficulties that you have faced while working with these tools? • What are the drawbacks that you noticed in the specified tools? • What are the suggestions/improvements that you would suggest?

F.1.2 Metrics F.1.2.1 Performance • Which metrics are considered for conducting performance testing in your com- pany?

• What are the other metrics do you know for conducting performance testing? • What are the reasons for considering only the specified metrics? • What are the reasons for excluding the other metrics? • Which is the most important metric among the specified metrics that need to be considered while testing? What is the reason?

• Which is the least important metric among the specified metrics? What is the reason? Appendix F. Interview questions 126

F.1.2.2 Scalability • Which metrics are considered for conducting scalability testing in your com- pany?

• What are the other metrics do you know for conducting scalability testing?

• What are the reasons for considering only the specified metrics?

• What are the reasons for excluding the other metrics?

• Which is the most important metric among the specified metrics that need to be considered while testing? What is the reason?

• Which is the least important metric among the specified metrics? What is the reason?

F.1.2.3 Reliability • Which metrics are considered for conducting reliability testing in your com- pany?

• What are the other metrics do you know for conducting reliability testing?

• What are the reasons for considering only the specified metrics?

• What are the reasons for excluding the other metrics?

• Which is the most important metric among the specified metrics that need to be considered while testing? What is the reason?

• Which is the least important metric among the specified metrics? What is the reason?

F.1.2.4 Other • Are the tools specified earlier address all the specified metrics?

• If no, how are these metrics handled? Are they using any tailored tools?

F.1.3 Challenges • What are the challenges faced while testing these PSR attributes?

• What are the causes for facing the specified challenges?

• Does your company able to address all the challenges identified during the testing process? Appendix F. Interview questions 127

• What are the mitigation strategies that are employed by your company for overcoming the identified challenges?

• Are there any challenges that your company was unable to address?

F.1.4 General • Do you think thus testing the PSR attributes is necessary for web applications?

• Which is the most important attribute among PSR attributes? What is the reason?

• Which is the least important attribute among PSR attributes? What is the reason?

• Do you have any more suggestions regarding the PSR attributes that would help our research? Appendix G MTC and IA identified between case company and other companies

G.1 Metrics

Table G.1: Identified metrics between case company and other companies Case company Other companies Number of transactions per sec Number of transactions per sec CPU utilization CPU utilization Memory utilization Memory utilization Processor time Processor time Throughput Throughput Disk I/O Disk I/O Number of hits per sec Number of hits per sec Number of requests per sec Number of requests per sec Number of concurrent users Number of concurrent users Network usage Network usage Server requests and response Speed Speed Response time Response time Error percentage Rendezvous point Number of failures Transactions pass and fail criteria Bandwidth Rampup time and rampdown time Transactions pass and fail criteria Error percentage Queue percentage Bandwidth Network latency Load distribution MTBF Number of failures

128 Appendix G. MTC and IA identified between case company and other companies129

G.2 Tools

Table G.2: Identified tools between case company and other companies Case company Other companies HP LoadRunner HP LoadRunner VMware Vcenter IBM RPT QualityTest Professional Sahi pro HP Quality Center Apache JMeter Silk performer Selenium AKKA clustering QualityTest Professional Zookeeper clustering Oracle RAC clustering M1 - Monitor One Wireshark Apache JMeter Selenium Ixia

G.3 Challenges

Table G.3: Identified challenges between case company and other companies Case company Other companies Limited number of virtual users in Limited number of virtual users in JMeter tool JMeter tool Technology expertise challenge Insufficient time Script related issues (capturing Compatibility issues browser requests) Metric related issues Metric related issues Insufficient time Network related issues

G.4 Important attribute

Table G.4: Identified important attribute between case company and other com- panies Application based Priority order All are important Case company 4 2 2 Other companies 2 1 1 Appendix H Consent form

The following is a consent form for a research project. It is a research project on Performance, Scalability, and Reliability (PSR) challenges, metrics and tools for web testing: A Case Study, carried out by Akshay Kumar Magapu and Nikhil Yarlagadda of this project from the Blekinge Tekniska Högskola (BTH), Karlskrona, Sweden. Before the interview can start, the investigator and the interviewee should sign two copies of this form. The interviewee will be given one copy of the signed form. Consent for Participation in Interview Research

1. I volunteer to participate in a research project conducted by the students of this research. I understand that the project is designed to gather information about Performance, Scalability, and Reliability (PSR) challenges, metrics and tools for web testing. I will be one among the members being interviewed for this research. 2. My participation in this project is voluntary. 3. I understand that most interviewees in will find the discussion interesting and thought- provoking. If, however, I feel uncomfortable in any way during the interview session, I have the right to decline to answer any question or to end the interview. 4. Participation involves being interviewed by researchers from BTH. The interview will last approximately 30-45 minutes. Notes will be written during the interview. An audio tape of the interview and subsequent dialogue will be make. If I don't want to be taped, I will not be able to participate in the study. 5. I understand that the researcher will not identify me by name in any reports using information obtained from this interview, and that my confidentiality as a participant in this study will remain secure. 6. Employees from my company will neither be present at the interview nor have access to raw notes or transcripts. This precaution will prevent my individual comments from having any negative repercussions. 7. I have read and understand the explanation provided to me. I have had all my questions answered to my satisfaction, and I voluntarily agree to participate in this study. 8. I have been given a copy of this consent form.

______My Signature Date

______My Printed Name Signature of the Investigator

For further information, please contact:

Akshay Kumar Magapu Nikhil Yarlagadda [email protected] [email protected]

Supervisor:

Michael Unterkalmsteiner Postdoc at Blekinge Tekniska Högskola (BTH) [email protected]