Proceedings of The 6th IEEE International Symposium on Service Oriented System Engineering (SOSE 2011) Cloud Testing Tools

Xiaoying Bai∗ †, Muyang Li∗, Bin Chen∗, Wei-Tek Tsai∗ ‡, Jerry Gao∗ § ∗Department of Computer Science and Technology, TNList, Tsinghua University, Beijing, China † SKLSDE, Beijing University of Aeronautics and Astronautics, Beijing, China ‡School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, USA §Department of Computer Science and Technology, San Jose State University, USA Email: [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract—Cloud platform provides an infrastructure for re- running 99.95% of the year, i.e., allows for 4.4 hours of source sharing, software hosting and service delivering in a downtime per year only. Unfortunately, an unexpected crash pay-per-use approach. To test the cloud-based software systems, happened in April, 2011 due to operation mistakes during techniques and tools are necessary to address unique quality concerns of the cloud infrastructure such as massive scalability network reconfiguration [5]. More than 70 organizations are and dynamic configuration. The tools can also be built on the affected including FourSquare, the New York Times, and cloud platform to benefit from virtualized platform and services, Reddit, who pay to use AWS to run their websites on EC2. Due massive resources, and parallelized execution. The paper makes to the accident, the performance of these websites is greatly a survey of representative approaches and typical tools for cloud decreased, and some sites were even down for dozens of testing. It identifies the needs for cloud testing tools including multi-layer testing, SLA-based testing, large scale simulation, hours. Applications hosted on remote clouds may have lower and on-demand test environment. To address the needs, it controllability and observability, compared with conventional investigates the new architecture and techniques for designing in-house hosted applications. testing tools for the cloud and in the cloud. Tool implementations In addition, the mechanisms of massive resource sharing are surveyed considering different approaches including migrated and usage-based allocation introduce uncertainties. From user- conventional tools, research tools, commercial tools and facilities like benchmark and testbed. Based on the analysis of state-of- s’ perspective, it needs to ensure that system will provide the-art practices, the paper further investigates future trend of functionalities and performance with reasonable cost. From the testing tool research and development from both capability and providers’ perspective, it aims to enhance resource utilization usability perspectives. by utilizing unused capacities based on guaranteed system Keywords-, Cloud Testing, Testing Tool performance. A SLA between the two parties usually defines the QoS properties, including not only quality properties I.INTRODUCTION during normal operations, but also fault-tolerance and recovery Cloud computing proposes a new architecture of multi- capabilities in reaction to failures. However, as the cloud layered resource sharing, such as infrastructure-as-a-service (I- infrastructure also changes its status (e.g., number of virtual aaS), data-as-as-service (DaaS), platform-as-a-service (PaaS), machine instances, number of applications, and usage loads) and software-as-a-service (SaaS) [1]. Resources can be dynam- continuously, making testing a difficult task because the un- ically allocated based on usage demands following negotiated derlying infrastructure is also changing and thus performance Service-Level-Agreement (SLA) in a pay-per-use business will be different even the same application is run. model to achieve cost-effective performance and resource Cloud testing issues have been addressed earlier [6]. The utilization. Cloud-based infrastructure has a significant impact paper addresses tool support for cloud testing, specifically on software technologies. It shifts the focus of software de- support for multi-layer testing, SLA-based testing, simulation- velopment from product-oriented activities to service-oriented based testing, embedding testing with execution, and on- reuse, composition, and online renting. It has a potential demand testing. Various tools are available [7], [8], [9], [10], to enhance the scalability, portability, reusability, flexibility, [11], [12] to test cloud-based systems at various layers includ- and fault-tolerance capabilities of software systems, taking ing hardware interface, platform interface, storage system, and advantages of the new cloud infrastructure [2], [3]. application systems. One can make the following observation: However, cloud-based hosting introduces risks to the de- 1) Testing faces the challenges of cost and scalability due to pendability of systems. Software is remotely deployed in a vir- increases in software size and complexity. However test- tualized runtime environment using shared hardware/software ing tools can also take advantages of from the enormous resources, and hosted in a third-party infrastructure. The resources from a cloud infrastructure, resources such quality and performance of the software highly depend on its as virtualized platforms and services, large computation runtime environment that are usually out of the users’ control. power and memory, parallel operations, and automated For example, Amazon provides a cloud infrastructure and web- recovery mechanisms that comes in a PaaS. For exam- hosting system AWS (Amazon Web Services)[4], including ples, conventional testing tools, such as JUnit test frame- services like EC2 (Elastic Compute Cloud) and S3 (Simple work and symbolic execution tools, can be migrated to Storage Service). It promises to keep customers’ sites up and a cloud platform. They can be refactored using cloud

978-1-4673-0412-2/11/$26.00 ©2011 IEEE 1 TABLE I parallel programming techniques like MapReduce. By COMPARISON OF SOFTWARE IN CLOUD AND IN-HOUSEENVIRONMENT parallelizing job units, tools’ performance and scalability can be greatly enhanced. In-house Hosting Cloud Hosting 2) To address cloud unique quality problems, new test- Architecture Centralized, limited Built-in distributed ing methods, techniques and tools are necessary. Some parallel and fault tol- parallel computing, erance high fault-tolerance frameworks are proposed to provide testing as contin- Configuration Pre-defined Dynamic uous infrastructure services [13], [14]. Testing archi- configuration, deployment tectures like Collaborative Verification and Validation offline deployment, and online (CV&V), Testing-as-aservice (TaaS), self-testing of au- relative fixed set-up reconfiguration, transparent to tomatic services, are proposed recently. Techniques such software providers as simulation, services mocking, job parallelization, and Resource Allocation Stable dedicated re- Unlimited resource environment virtualization can also be used to enhance sources, limited up- pools, dynamic allo- the testing capabilities. per bound cation based on real- time usage 3) To compare across heterogeneous cloud infrastructure Resource Sharing Limited sharing multi-tenancy and development stacks, new metrics are defined and architecture, large- benchmarks [15], [16], [17] are developed to setup the scale resource baseline for testing and evaluation. sharing Runtime Stable dedicated en- Virtualized Cloud testing is an emerging area. This paper suggests new Environment vironment, in-house computing services, trends of cloud testing tools, including the needs for online control unpredictable and adaptive testing, cross-cloud testing, SaaS multi-tenancy environment, low controllability testing, real-time results processing, SLA conformance testing, Scalability Offline scale Online massive scal- dependability testing such as security testing and reliability up/down ability with unlimit- testing. ed resource pool, dy- This paper is organized as follows. Section II analyzes the namic scale up/down in response to usage issues of cloud testing tools. Section III introduces novel ar- chitectures for cloud testing. Section IV reviews enabling tech- niques and typical tool implementations. Section V presents Many new techniques are introduced to support the desired cloud benchmarks. Section VI briefly shows commercial test- features of cloud computing, such as virtualization and MTA. ing tools. Section VII compares the tools and discusses future These techniques also introduce new fault models and threat- needs of tool development. Finally, section VIII concludes this ens to system quality [22], [23], [24], [25], [26]. For example, paper. even an application is allocated with promised hardware like CPU and disks, its performance cannot be guaranteed. The II.CHALLENGESAND NEEDS performance of application in the cloud is affected by many As shown in TABLE I, software hosted in a cloud environ- factors such as the number of virtual machines (VM) instances, ment differs from that in an in-house environment. Essentially, VM management policies, the effectiveness of hypervisor the in-house environment is usually fully controlled by soft- scheduling, the strategy of application migration and recovery ware vendors with dedicated hardware and software resources; in case of failures. Some of these factors change at runtime while the cloud-based hosting provides infrastructure as leased and thus cannot be fully predicted and controlled. services from a theoretically unlimited resource pool. Software In traditional software lifecycle, testing is usually a stage in deployed to the cloud are supposed to benefit from its built- the process, and is often performed offline by test engineers in mechanisms of distributed parallel computing and fault- before product delivery. As cloud-based software has unique tolerance, using new techniques such as Hadoop [18] and GFS lifecycle models and quality issues, new testing capabilities are ( File System)[19]. However, risks are introduced using necessary to meet the needs of cloud testing such as continuous the new model of resource sharing. Resources are shared in online testing and massive scalability testing. a virtualized and multi-tenancy architecture (MTA) [20], [21], In [6], different testing types in the cloud environment, [3]. To reduce the cost of leased resources, and to maximize including service functional testing, integration testing, API resource utilization, usage-based dynamic resource allocation and connectivity testing, performance & scalability testing, is a basic principle of cloud management. With guaranteed security testing, interoperability & compatibility testing, and SLA, it automatically allocates more resources when the load regression testing are discussed. Particularly, cloud testing has increases (scale up), and releases unused resources when the the following features. load decreases (scale down). In addition, it can also change Multi-layer testing: Faults may exist in various cloud software configuration and runtime environment in response to components including hardware, network, virtualization man- SLA changes or the needs of failure recovery. The process is agement and storage system. As a large-scale distributed mostly carried out online following pre-defined policies, where system, a cloud may also have complex fault-tolerant and the users usually have low control over. failure recovery mechanisms such as writing into three copies

2 a cloud that continuously track the inputs and outputs of services whenever the services are activated for execution. Once sufficient number of inputs and outputs are collected, test oracle may be established statistically with voting [29], [30]. MTA can also be simulated [31]. Test cases generation from metadata: One of the key aspect of SaaS and PaaS is that the system is driven by meta- data. For example, GAE and Force.com both use metadata to maintain and track system and application status. Furthermore, failure of metadata nodes can have disastrous effect on the system as it did happen to GAE earlier. Another feature of Fig. 1. Taobao Order statictics [27] metadata is that it can be used to generate test cases. Test cases can be generated by examining metadata, e.g. Income length of customer must be 64 bits or so, hence some simple in GAE (Google App Engine). It is difficult to locate faults in test cases will be randomized with 64 bits. One can generate case an application fails. For thorough analysis, testing needs a collection of customers of 64 bits, another collection with to be performed on each component at all of these layers. Each 128 bits or any other bits. This is another new feature to be layer requires different testing focuses and techniques. explored in cloud testing. SLA-based testing: For conventional software, testing is Policy enforcement as testing: Policy is a common service- based on source code or software specifications that describe oriented technique. Policies can be enforced at runtime during expected software behavior using natural language or formal cloud execution. Policies can also be automatically generated models. For software deployed on the cloud, source code may from metadata [29]. Policy enforcement can be an integrated be unavailable. Instead, SLA is negotiated between software part for testing and evaluation of cloud applications. and infrastructure providers, including functionalities and QoS Scalability metrics: While traditional scalability metrics properties. SLA thus provides the basis not only for cloud pro- are available for parallel and distributed computing, cloud visioning, but also for test design, execution, and evaluation. computing needs new metrics as it needs to consider both Large-scale simulation: Testing needs to simulate various the performance gain versus the resource needed [28], [32], inputs and scenarios. As an open platform, a public cloud [33]. Specifically, allows for flexible access and operation. The number of pos- 1) T processing time and this reflects the traditional speed- sible usage scenarios is huge. The load is high and unexpected up. and large fluctuations can occur. For example, Taobao [27] is 2) R ∗ Tr resource consumption and this reflects the re- a large e-commerce system in China. It has more than 300 source usage in the system. million registered customers with about 300 billion RMB per 3) PRR performance resource ratio, which will be defined year. It sold about 47,000 products per minute on average in in the following and reflects the relationship between 2010. Figure 1 shows an example of a common shop’s order performance and the resources used. statistics in a week [27]. At the peak of the week, it has over 4) Metric variance, such as the variance of speed-up, vari- 3500 orders a day; while at the low, it is only about 100. In ance of resource consumption and variance of PRR. this example, each order is on average contributed by 318 page The variance of PRR will be defined later in this paper. browses from 201 customers. The performance resource ratio is defined as: To test the functionality and performance of such complex systems, large-scale simulation is needed. Actually, it needs to Tw = Tq + Te (1) simulate not only the usage of the system, but also the changes in the environment such as infrastructure configurations. X On-demand test environment: Testing needs to be trig- CR = Ri ∗ Ti (2) gered online whenever a change occurs in the cloud including where T represents the waiting time, T the queueing time the application, runtime environment, and infrastructure. An w q and T the execution time, R is the allocation of resource i, environment is helpful for test assets sharing, automatic test e i which can be I/O bandwidth, CPU and memory usage and T generation/ selection/execution, results collection and analysis. i is the time resource i is used. PRR is defined as: It usually takes effort to build test environment and maintain it for regression testing. To test the cloud under various usage 1 1 PRR = ∗ (3) scenarios, it is necessary to design testing mechanisms so that Tw CR various test environments can be deployed or invoked when Given PRR, the scalability of the SUT is measured by the needed. PC (Performance Change) when workload changes. Embedded continuous testing for SaaS: Most SaaS soft- ware use MTA in a cloud environment [20], [28], [29], [3]. PRR(t)W (t) PC = (4) It is possible to embed monitoring and testing services in PRR(t0)W (t0)

3 Fig. 3. Contract-Based Collaborative Verification and Validation [38]

service users, or third party independent test participants. Contracts are identified from two perspectives: TSC (Testing Service Contracts) and TCC (Test Collaboration Contracts). TSC is the communication between testing components and Fig. 2. Trustworthy Service Broker Architecture [37] the SUT (service under test), including test requirements and test invocation protocols. TCC defines the way that testing components collaboratively design test cases, execute test plan, with the ideal PC equals to unity. and evaluate test results. Figure 3 depicts the collaboration III.ARCHITECTURES activities among different parties. Cloud testing systems are usually designed based on cloud platform and service-oriented concepts. Some new architec- B. Testing as a Service (TaaS) tures are proposed in recent years to provide testing function- To reduce the cost of test design, execution, and main- alities as online services. tenance, a concept TaaS was proposed to establish a uni- A. Collaborative Verification and Validation fied service-based framework for promoting reuse of all test artifacts including test data, tools and process. It provides Like a service-oriented system, applications in a cloud is static/dynamic on-demand testing services in/on/over clouds often composed by services developed by different parties. for the third-parties at any time and all time (365/7/24) [6], For the application to work, each party needs to deliver their [39]. According to [40], TaaS concept was initially introduced components right, and the integration must be successful. Thus by Tieto in Denmark in 2009. TaaS has received wide attention the verification and validation of the application need to be due to its scalable testing environments, cost reduction, utility- contributed by multiple parties in a collaborative manner, thus based service models, and on-demand testing services. CV&V (collaborative verification and validation). One such As shown in Fig. 4, Yu, et al., [12] defined a 5-layer TaaS CV&V framework is the WebStrar [34], [35], [36] where test framework based on cloud infrastructure services, including: cases can be published and ranked for their potency. Most potent test cases will be used first to test new software to • Test service tenant and contributor layer: This layer pro- reduce the testing effort. vides functionality that supporting testing service tenant The conventional service broker can be extended with and contributor to interact with TaaS. check-in and check-out testing process by adding just-in-time • Test task management layer: This layer is a middle- service testing, evaluation, and ranking capacities, as shown ware layer, supporting service registry and repository, in Fig. 2 [37]. The check-in test process ensures that only scheduling and dispatching test tasks, and some other qualified services are accepted for publishing; while the check- functionality. out test on the candidate services satisfying a request to assess • Testing resource management layer: This layer acts as the whether the services have changed after publishing and the Cloud infrastructure, taking the responsibility of resource impacts of the changes. management and monitoring, test task provisioning. This concept can be further extended to a contract-based • Test layer: This layer is the kernel part of the platform, architecture [38], as shown in Fig. 3. The test broker enables consisting of service composition, service pooling and scalable and flexible collaborations among test participants. test-reduce sub layer. A test provider supplies test knowledge such as test cas- • Testing database layer: This layer is used to store the test es, executable test scripts, test results, defects, test ranks, task of tenants, targets-under-test, service images, and bug services ranks, and test/service evaluation models based on tracking results. testing statistics. A tester carries the published test cases and Candea [7] identified three categories of testing services: simulates testing on the target services. Both test providers TaaSD for developers, TaaSH for end users and TaaSC as and testers can be any party including service providers, certification service. They argue that with a pricing model,

4 Fig. 5. CloudSim Architecture [44]

IV. TECHNIQUESAND TYPICAL TOOL IMPLEMENTATIONS The section identifies some techniques used in cloud testing Fig. 4. 5-layer architecture of Testing-as-a-service [12] systems and their typical tool implementations. • Simulation. To reduce the complexity and separate qual- ity concerns, a cloud simulator is necessary for cloud TaaS can be operated as a public service and as a business, system testing so that it can focus on quality problems at targeting at the ”long-tail” small business companies. particular cloud component, and analyze system behavior The concepts have been incorporated into commercial prod- under various scenarios. ucts like Tieto [40] and Sogeti[41]. Sogeti provide a cloud- • Service mocking. The systems hosted on the cloud may based provisioning platform so that partners’ testing tools, use services from various providers. Conventional stub such as IBM Rational and HP Mercury, can be automatically and driver techniques need to be adapted to mock external deployed and executed following the pay-per-use approach. functionalities at the service interface level. • Test job parallelization. Parallel programming becomes a natural practice for applications built upon cloud. Tests C. Test Support as-a-Service (TSaaS) can benefit from the practice by dividing test activities into independent jobs and parallelizing them to reduce To enhance testability of autonomic services, TSaaS was testing time and cost. proposed so that each service will expose both production • Environment virtualization. Throughout software life- and test environment to external users. Test functions (such cyle, testing and regression testing usually need to main- as specification, execution, configuration and reporting) are tain various testing environment for different versions, exposed as API services. King, et al.,[42], [43] applied au- dependent software and platforms. It is always resource tonomic computing concepts to testing of adaptive systems, consuming to maintain the environment, and effort con- called autonomic self-testing (ATC). The technique was then suming to set up the environment for each test execution. migrated to the cloud platform [9], called TSaaS, so that Virtual machines can help to ease and speedup the services that are hosted on remote cloud platform can expose process, and to reduce test cost. their test support APIs for partner providers. A self-test harness is developed to manage testing workflow and activities. It A. Simulation monitors changes or updates on hosted services, utilizes nec- 1) CloudSim: CloudSim [44] is built by CLOUD (Cloud essary infrastructure services, and invokes TSaaS supporting Computing and Distributed Systems) Laboratory at the U- services to validate the changes. Test operations exposed as niversity of Melbourne in Australia. It aims to provide a supporting services include test setup, input, assertions, and toolkit for modeling and simulating the behavior of various teardown operations. These services are provided for cloud cloud components including data centers, virtual machines and partners during the development, testing and maintenance of resource provisioning services. It can be used for analyzing tailor-made cloud applications and services. They can also and evaluating cloud strategies in a controlled simulated envi- be used for design, build, and deployment of automated tests ronment. Particularly, CloudSim facilitates initial performance across administrative domains. testing with less time and effort to set up test environment.

5 a ”failure surface” into the target system between the target system (e.g., HDFS [18]) and the OS library (e.g., Java SDK). In this way, testers can flexible program failure testing policies. PreFail provides optimization techniques to reduce redundant failure scenarios before fault injections and simulation, and parallelization method to divide failure sequences into inde- pendent parts that can be exercised in parallel on different machines. PreFail has been adopted by Cloudera Inc. for Hadoop . 4) Distributed Load Simulation: Cloud storage system is characterized by highly parallelism and non-determinism. As part of the research and implementation of the Cloudy2 distributed database system, ETH Zurich proposed a new distributed testing architecture for simulating parallel jobs [10]. Fig. 6. D-Cloud Architecture [45] This framework contains two types of nodes - Master and Slave. Master is unique identified, which is responsible for distribution, synchronization and management of all slave As shown in Fig. 5, CloudSim is built with two layers: the nodes. Master is started with a given test. It waits for enough Simulation layer and the User Code layer. It is designed to slaves to connect to it, then sends every slave corresponding support modeling and simulating typical cloud features such as tasks. During the execution, the master controls the execution network behavior, VM allocation, Cloud federations, dynamic sequence of slaves to guarantee that all tasks in a step start workloads and power consumptions. It claims to have been at the same time. The slaves run testing tasks and store test used for research in companies like HP and universities. results locally, including nodes’ states collected by daemon 2) D-Cloud: D-Cloud [45] is developed by University of thread at each node. At the end of the test, master recollects Tsukuba in Japan. D-Cloud is a dedicated simulated test every slave’s results, analyzes and then generates statistics and environment built upon Eucalyptus, an open-source cloud in- graphs for analyzing test execution. frastructure providing similar functionalities as Amazon EC2. A workload description is organized at three layers: task, It uses QEMU, an open-source virtual machine software, step and test. A task is an atomic job in a test, which gives to build virtual machine for simulating faults in hardware a clear set of instruction to execute, containing configurations including disk, network and memory. about loads, actions, and limits. Each step is a collection of Fig. 6 shows D-Cloud architecture. D-Cloud can simulate tasks, which are performed in parallel with a group of slaves. various typical faults and inject them into host/guest OS. A test is a well-defined sequence of steps. In this way, this It supports the definition of fault types, fault injection time framework can simulate a variety of workload scenarios. and fault duration. A XML-based language is designed for describing test environment configuration and test scenarios. B. Service Mocking A tester first submit test plan in the specific language to Software deployed on the cloud platform may use third- set up infrastructure parameters, workload and fault injection party services. Such dependencies make it hard for test set scenarios. D-Cloud then initiates virtual machines to simulate up and fault localization. Stubs and drivers are widely used the execution process following the test plan. In this way, techniques for incremental integration testing. Stubs provide D-Cloud enables flexible and observable platform for testing simulated implementations while drivers provide simulated distributed system on the cloud. invocations for those functionalities that are unavailable or 3) PreFail: PreFail [46], [47] is developed by ParLab indifference for current integration testing. Similar to stubs, (Parallel Computing Laboratory) at University of California service mocking is used to break a service’s external depen- at Berkley. For the cloud built out of tens of thousands dencies so that it can be tested independently. Service mocking of unreliable computers, it usually experiences frequent and replace remote service with a simulated one which behaves as diverse failures. Failure testing is thus important to validate if the real one is called. The simulated implementation can the correctness and efficiency of cloud’s recovery protocols. provide the same interface functions, accept messages, and However, it is expensive to enumerate all of the possible return results. failure scenarios. PreFail thus introduces a framework for Service mocking has been used in many service testing systematically and efficiently explore failures. systems, such as soapUI and InfoQ. iTKO@LISA analyzes PreFail introduces a failure abstraction called failure ID the constraints in enterprise systems of restricting availability (FID), composed of an I/O ID (abstract information of an I/O and accessibility for development and delivery. It provides call) and the injected failure, to identify every failure. Unlike a constraints-free cloud environment with fully virtualized D-Cloud that provides simulated ”actual” faults, PreFail inserts services.

6 C. Test Job Parallelization Compared with traditional software, the size and complex- ity of cloud testing increase tremendously in following two aspects: 1) Huge number of testing tasks. Cloud is becoming a common practice for software hosting. Massive services are migrated to cloud platforms in recent years. For example, up to July 2011, Amazon has attracted over 400,000 customers and Google App Engines over 10 Fig. 7. Cloud9 Architecture [8] million users. To have online quality control over these services, the cloud platform has to devote a lot of time and resources for testing. According to a Google report on the infrastructure service provided by Amazon EC2. It [48], it needs to support testing of over 500,000 builds reduced testing time of 32 real UNIX utilities on a average by of 20,000 projects per day. It maintains over 120K a factor of 47, with a maximum of 250-fold speedup. test suites in the code base. Over 7.5M test suites are 2) HadoopUnit: HadoopUnit [53] migrates JUnit test exercised per day. framework to Hadoop platform. JUnit test cases are created 2) Large scale testing. Cloud services, including IaaS, PaaS as independent Hadoop MapReduce jobs. The map() function and SaaS, are expected to provide higher quality and de- receives test jobs as < testname, testcommand > pair. At pendability than traditional software, such as scalability each node, the command is executed as a process. The reducer and fault-tolerance capabilities. In accordance, testing gets < testname, testresult > from each map and combines needs to be exercised in a large scale to verify and all the results. Experiments shows that a 150-node cluster validate these properties. can produce 30x improvement compared with sequential test Parallel techniques are thus used to speed up testing[49], executions on a local computer. [50], [51]. Lastovetsky [52] once exercised a case study 3) YETI: YETI (York Extensible Testing Infrastructure) of parallel execution of 588 Orbix test suites on a cluster [11] also provides a cloud version random testing tools. It of multiprocessor workstations. The results showed that it uses MapReduce to parallelizes the processing of test inputs accelerates test executions by 3.8 to 3.9 times using single and results. A preliminary evaluation was carried using A- Solaris 4-processor workstations, and 6.8 to 7.7 using a cluster mazon EC2. It showed clearly performance improvements by of two 4-processor workstations. Cloud by nature provide employing more computers and distributing the jobs. parallelization support with easy-to-use programming models. Some testing tools are developed on, or migrated to, the cloud, D. Environment Virtualization taking the advantage of cloud virtual machines to parallel test Cloud can also be used for simulating domain-specific executions. instruments for large-scale application testing. For example, 1) Cloud9: Cloud9 [8], an academic research project from the Network Management System (NMS) needs to manage EPFL in Switzerland, migrates symbolic execution to the networks comprised of a huge number of elements. To test cloud platform. Symbolic execution is an important testing NMS applications, it either tests with real elements, or uses technique introduced in 1970s. It reasons all the possible simulators to simulate the required environments, both of executions by exploring program path-by-path. In spite of which require significant time, cost and resources. To address research over 30 years on path exploration, it still faces the the challenge, cloud infrastructure services can facilitate to challenging problem of scalability. With increasing program simulate different virtual elements and test scenarios[13]. size and complexity, its memory and CPU consumptions go Agents that emulate the managed elements are deployed in exponentially. It is difficult to be applied to industry software the cloud using public infrastructure services. The virtual in general where software often contains millions of lines machines (VM) created in the cloud simulates heterogeneous of code. Basically, Cloud9 parallelizes symbolic execution distributed environment with different OS and runtime config- on large shared-nothing clusters of computers. To do this, it urations. Metrics such as CPU utilization and memory utiliza- divides the path exploration work into independent jobs to be tion are collected through VM hypervisors for performance allocated to different worker nodes. The conventional search analysis. strategy is distributed to the workers, with reduced coupling Similarly, S. Baride, et al., proposed a cloud-based approach and coordination among workers. The global load balancer is for mobile application testing where infrastructure services are used to balance the workload of each worker, in order to get used to simulate diversified mobile devices, hardware config- high efficiency of testing service. Fig. 7 shows the architecture urations, heterogeneous application platforms, and complex of Cloud9. Each worker consists of a runtime, a searcher and dependencies [14]. Based on the cloud infrastructure, a test a constraint solver. It independently explores a sub-tree of environment is built including the emulator which provides program’s execution tree.An initial prototype is implemented unified interfaces to different simulated/actual mobile devices, with the single-node Klee symbolic execution engine running emulator of mobile environments and service platforms, test-

7 ing tools to perform various types of testing such as security YCSB supports workload generation using user-defined and performance testing. workload description, which can be a mix of workloads in OCT (Open Cloud Testbed) is representative large-scale the core package, or complete user-defined workloads with testbed [54]. Up to 2009, it has 120 nodes in four data centers new combination of operation and distribution. geographically distributed located at Baltimore, Chicago and San Diego, which are connected with high performance 10G- B. Enhanced TPC-W b/s network. It aims to investigate the performance of data C. Binning, et al., [17] identified following novel features intensive computing systems with extremely large data sets of cloud storage systems compared with transactional database and very high volume data streams. It has tested heterogeneous systems: cloud platforms such as Eucalyptus, CloudStore, Haddop, • Elasticity to changing conditions. Conventional systems Sector/Sphere, and Thrift. are mostly used in a managed environment using a fixed configuration. Cloud enables dynamic resources V. CLOUD BENCHMARKS allocation and deallocation so that the systems can be This section introduces typical benchmarks and testbed adapted to load fluctuations on the fly. developed to support cloud testing. • Tradeoffs between consistency and availability. It is im- possible for cloud storage systems to provide availabil- A. YCSB ity and strong consistency together in the presence of To achieve the objectives such as scale-out, elasticity and network failures. Most cloud providers thus offer only high availability, cloud systems typically sacrifice sophisticat- weaker forms of consistency for the sake of availability. ed mechanisms in traditional ACID databases such as complex • Pricing problem. Cloud aims to provide economy of joins and aggregates, and strong transaction model with two- scale. However, the promise of unlimited scalability phase commit. Special design tradeoffs are made to optimize is difficult to achieve. Given various pricing plan and system performance, such as read versus write performance, granularity by different providers, such as infrastructure latency versus durability, synchronous versus asynchronous services and platform services, it leads to different overall replication and data partitioning. cost. YCSB (Yahoo! Cloud Serving Benchmark)[15] by Yahoo! Existing benchmarks like TPC-benchmarks are mostly built as a benchmark framework. It is designed to be extensible for transactional database systems. The metrics and architec- and portable to heterogeneous clouds to give a reasonable ture of them are not suitable for cloud systems. C. Binnig comparison among cloud storage systems (e.g., Cassandra and et al. [17] hence suggested a new benchmark system specific HBase). The framework is intended to deal with various qual- for cloud scalability, pay-per-use and fault-tolerance testing ity concerns including performance, scalability, availability and evaluation. The benchmark uses e-commerce scenario and and replication. Two tiers, performance and scalability, have defines web interactions as benchmark drivers. Particularly, been published as open source tools. Performance tier tests following four new metrics are defined for cloud storage the latency of request with increasing throughput until the system evaluation, as shown in Fig. 8: database system is saturated. Scaling tier tests how system • Scalability. Cloud services are expected to scale linearly performance changes with increasing number of machines. with a constant cost per web interaction. The paper Two scaling metrics are used: scale-up and speedup. Scale- suggests measures the deviations of response time to the up shows the system’s static scalability, in which, the system perfect linear scale by using correlation coefficient R2 or should remain a constant performance, given a larger amount by determing the parameters of a power function of the of data with more servers. Elastic speedup shows the system’s form f(x) = xb. dynamic scalability, in which, the system should achieve a • Cost. The economy of cloud performance is measured as performance improvement, as new servers added when the $/WIPS, where WIPS is for web interactions per second system is still running. used by conventional TPC-W benchmark. In addition, it YCSB tool consists of two parts: a workload generator also measures standard deviation of the cost during the and a package of standard workload. The workload package scaling. provides a collection of programs representing typical cloud • Peaks. This is to measure how well a cloud can adapt to operations like read/write, data size, request distributions, and peak loads, including scale-up to reach peak loads and so on. The operations supported consist of insert, update, read scale-down after peak loads. The adaptability is defined and scan. It uses different distribution model to select the as the ratio between WIPS in RT (real-time) and Issued operations and records to perform. Four built-in distributions WIPS. are used including Uniform, Zipfian, Latest, and Multinomial. • Fault tolerance. Cloud infrastructure are usually based Typical workload patterns include update heavy (50% Read, on huge number of commodity hardware. Hardware fail- 50% Update), read heavy (95% Read, 5% Update), read only ures are common to the infrastructure services. Hence, the (100% Read), read latest (95% Read, 5% Insert), and short metric is introduce to analyze cloud self-healing capabili- ranges (95% Scan and 5% Insert). ties. Given failures in a period of time, the recoverability

8 is also defined as the ratio between WIPS in RT (real- time) and Issued WIPS. C. Terasort The Hadoop distribution comes with a number of bench- marks, including TestDFSIO, nnbench, mrbench and Tera- Gen/TeraSort/TeraValidate. The TeraSort benchmark is popu- lar for testing Hadoop clusters at both HDFS and MapReduce layers. It aims to compare cluster performance by sorting 1TB of data as fast as possible. Hadoop set a record in 2009 by sorting 100TB of data at 0.578 TB/minute using 2800 nodes. Fig. 9. SOASTA CloudTest architecture [56] The package of Hadoop Tera benchmark is composed of three components. SOASTA CloudTest is a production performance testing • TeraGen to generate random input data. tool for Web applications. It can simulate thousands of virtual • TereSort to run MapReduce sort jobs with the generated inputs. users visiting website simultaneously, using either private or public cloud infrastructure service. The worker nodes can be • TeraValidate to ensure that the outputs of TeraSort are correctly sorted. distributed across public and private clouds to cooperate in a large . Test results from distributed test agents are D. Cloudstone integrated for analysis. Memory-based analytic techniques are Cloudstone [16] is an open source project from the RADLad implemented to handle, in real-time, the huge data produced by at UC Berkeley. It aims to address the performance charac- large-scale testing. Provisioning data are displayed via analytic teristics of new system architectures and to compare across dashboard on a synchronized time-line. Through an Ajax- various development stacks. A benchmark toolkit is built using based web UI, testers can operate and supervise the whole pro- social applications and automated load generators that can be cess including launching hundreds of load generation servers, executed on different deployment environments. In accordance creating and running test agents geographically distributed, to the pay-as-you-go model of storage and computing, new and analyzing test results. metrics are introduced, called dollars-per-user-per-month, for Fig. 9 shows SOASTA architecture. CloudTest is composed evaluating cloud performance with regard to realistic usage of three distributed services to support test creation, test and cost. execution and test results analytics, as listed below: Cloudstone is built upon open source tools including Olio • SOASTA repository is responsible for delivering SOAS- which is social-event calendar web application, Faban which is TA CloudTest objects such as test scenario recordings, a Markov-based workload generator, and automation tools to test composition, and performance data. execute Olio following different workload patterns on various • Analytics Dashboard is a memory-based analytic service platforms. The benchmark has been exercised on different to handle large sets of results and analytics from dis- configurations of Amazon EC2 platform. tributed load tests. It is able to correlates many data streams from distributed environments into a single one E. MalStone with synchronized timeline. MalStone is developed by the Open Data Group [55]. It • Meastro is a test engine as a massively multi-thread is specially designed for performance testing of cloud mid- service, used for test execution (including sending and dlewares for data intensive computing. MalGen is developed validating responses). Multiple Meastros can execute d- to generate synthetic log-entity files that are used for testing ifferent parts of a test composition collaboratively, with inputs. It can generate tens of billions of events on cloud with the ability to be geographically distributed. over 100 nodes. B. iTKO LISA VI.COMMERCIAL TESTING TOOLS iTKO LISA [57] aims to provide a cloud-based environment A. SOASTA and virtual services for composite application development, SOASTA [56] is motivated by the necessity to test in verification and validation. It claims to reduce software deliv- production, rather than in a laboratory environment. Today’s ery timeline by 30% or more using its innovative approach web applications usually follow agile practices with frequent to support continuous integration for development and testing. builds and high change rates. Load testing with legacy tools in Central to LISA architecture is its virtualizaion technology. the laboratory can be significantly different from testing in the For unavailable or inaccessible resources, LISA provides vir- production environment in terms of scale, configuration, user tualized services by simulating the target system’s dynamic profiles and network environment. Running tests against pro- behavior so that they can respond as live systems. In this duction websites thus can achieve higher degree of accuracy way, it breaks dependence constraints of system integration and confidence, compared with lab practices. and supports continuous testing.

9 Fig. 8. Metrics of Cloud Benchmark [17]

LISA quality services are provided from three perspectives: To better understand the issues and needs of cloud testing, LISA Test, LISA Validate, and LISA Pathfinder. the paper looks into state-of-the-art tool implementations. • LISA Test: The capabilities it offers to enhance testing TABLE II selects 13 representative tools introduced above include coverage-based testing for heterogeneous dis- and compares them from following perspectives: tributed architecture, codeless testing, UI testing, load and • Testing objectives. Large-scale performance testing, scal- performance testing. It brings testability to all compo- ability testing, fault-tolerance testing, recovery testing, nents deployed in the cloud by LISA development toolkit. and cost-related testing are typical objectives. It establishes collaborative testing platform with portable • Testing activities. Testing is process of environment set- and executable test cases that are shared across different up, inputs generation, test configuration, test executable teams and environments. It provides a codeless testing deployment, execution, results collection and analysis. environment that allows QA, development and others to For test tools developed on a cloud, special activities are rapidly design and execute automated tests. needed, such as external service mocking, geographical • LISA Validate: LISA provides continuous regression simulation, parallel execution, resource provisioning, and testing, triggered by change events, for each software results aggregation. build and in the production environment. Policies are • Tool architecture. Some cloud testing tools are not de- enforced via governance infrastructure to ensure that veloped on the cloud platform. For those on the cloud, systems conform to quality requirements at design time, testing with simulation support can further enhance cloud change time and runtime. testability and observability. • LISA Pathfinder: It traces interactions step-by-step, re- The results show that most of the tools take the advantages viewing the dataflow and control flow among the con- of cloud platform to reduce testing cost and enhance tools’ stituents in a composite application. It facilitates testers to scalability. However, many of the features are not well sup- see through system execution process to localize defects ported yet. For example, fault-tolerance is one of the major and identify performance bottlenecks. promises of cloud computing. However, only D-Cloud and PreFail attempt to perform fault-tolerance and recovery testing. C. Cloud Testing Features that need future tool support include: Cloud Testing [58] is initiated by a group of architects and • Online and adaptive testing. This is to address the needs performance experts from UK’s largest Website Performance of dynamic service composition, deployment and on-line Monitoring & Load Testing Company. It aims to support cross evolution. browser and functional testing of Web applications. As web- • Cross-cloud testing. Services need to be able to migrate sites need to be compatible to various browsers and operating between clouds. Hence, a unified method is necessary systems, Cloud Testing offers a shared test environment so that to test and evaluate services on heterogeneous cloud users need not set up and maintain various testing platform to platforms. ensure website portability. Test scripts are recorded by users • Multi-tenancy testing. Services are shared across applica- from local browsers using Selenium IDE. The scripts are then tions following the multi-tenancy architecture. That is, the submitted to Cloud Testing to be executed automatically in the same database and code can configured to meet different cloud with various browsers on operating systems. customers’ requirements. MTA causes new problems to instance performance, reliability and security. VII.DISCUSSION • Real-time results mining. With enhanced productivity, Compared with conventional testing methods, cloud testing testing can produce huge number of data and results emphasizes more on system testing and online testing. This is analysis could be a bottleneck to follow-up activities due to the novel design and development methods imposed by such as fault discovery, localization, and recovery. Timely cloud computing. This is still an emerging research area with analysis of the log and events data is needed. many open problems. In addition, tools’ usability can also be improved, such

10 TABLE II COMPARISON OF CLOUD TESTING TOOLS impacts on software engineering are not fully analyzed yet. This paper is motivated by the following two observations:

[58] CloudT ITK SO YETI[11] T T Cloudstone Benchmark YCSB ETHZ CloudSim PreF D-Cloud 1) The new quality issues introduced by cloud collaborative estbed[14] estbed[13] SA[56] ASTA [57] O i [46] ail and dynamic architecture. New techniques and imple- [10] [15] esting mentations introduce new fault models. [45] [44] [16] [17] 2) The new opportunities to enhance testing system ca- pabilities by using cloud infrastructure services and

X X X × × × × X X X × × × ing Test- mance for- Per- scale Lar environment. Conventional testing techniques can be

ge parallelized using cloud infrastructure services to over- come cost and resource limitations. × × × × × × × X X X × × × T Scalability esting The paper shows that testing tools are emerging in recent years from both of the above two aspects. Especially, cloud T s Objective est storage testing receives special attentions with standard bench- × × × × × × × × × × × X X ing Test- ance Toler- F

ault marks and evaluation results across various implementations. New architectures are proposed to provide continuous test- ing services and large-scale system testing capabilities. With ing Test- × × × × × × × × × × × X X Reco growing maturity, some techniques have been incorporated very into commercial tools, especially performance testing of web × × × × × × X X × × X × × ing Test- r Cost-

elated applications. However, many cloud concepts are not well defined yet such as virtualization and multi-tenancy. There are still quite a × X × × X X × × × × X × × ing Mock- Ser few quality issues not identified and well understood. Specific vice testing techniques are badly needed to address the unique problems of cloud architecture. × × × × X X × × × × × × × simulation Geographical

ACKNOWLEDGMENT T

s Activity est This project is supported by National Science Foundation × X X X × X X × X × × X × P tion execu- China (No. 61073003), National Basic Research Program of arallel China (No. 2011CB302505), the Open Fund of the State Key Laboratory of Software Development Environment (No. ment age- Man- source Re- X X X × × × X × × × × × X T

est SKLSDE-2009KF-2-0X), U.S. National Science Foundation project DUE 0942453, and the European Regional Develop- ment Fund and the Government of Romania under the grant X X X X × × X × × × × X × gr Ag- Results tion

ega- no. 181 of 18.06.2010.

REFERENCES ability Observer- × × × × × × × × × X X × × Ar

chitecture [1] B. P. Rimal, E. Choi, and I. Lumb, “A Taxonomy and Survey of Cloud Computing Systems,” in Proceedings of Fifth Int. Joint Conf. INC, IMS T

o Architecture ool and IDC NCM ’09, 2009, pp. 44–51. [2] W. Tsai, Q. Shao, Y. Huang, and X. Bai, “Towards a Scalable and

X X X X X X X × × X X X X Scalability Robust Multi-Tenancy SaaS,” in Proceedings of the Second Asia-Pacific Symposium on Internetware, 2010, p. 8. [3] W. Tsai, Y. Huang, and Q. Shao, “EasySaaS: A SaaS development framework,” in SOCA ’11, 2011.

X X X X X X X × × × × × X Based Cloud- [4] Amazon Web Service. [Online]. Available: http://aws.amazon.com/ [5] “News Briefs,” Computer, vol. 44, pp. 18–20, 2011. [6] J. Gao, X. Bai, and W. T. Tsai, “Cloud-Testing:Issues, Challenges, Needs and Practice,” Software Engineering: An International Journal, vol. 1, no. 1, pp. 9–23, 2011. [7] G. Candea, S. Bucur, and C. Zamfir, “Automated Software Testing as a Service,” in Proceedings of the 1st ACM symposium on Cloud as supervision of execution status of tools on the cloud, computing, 2010, pp. 155–160. sharing mechanism of testing assets (test data, test cases, test [8] L. Ciortea, C. Zamfir, S. Bucur, V. Chipounov, and G. Candea, “Cloud9: scripts, and results.), and automation support (test generation, A Software Testing Service,” ACM SIGOPS Operating Systems Review, vol. 43, no. 4, pp. 5–10, 2010. execution, and deployment.). [9] T. King and A. Ganti, “Migrating Autonomic Self-Testing to the Cloud,” in Third International Conference on Software Testing, Verification, and VIII.CONCLUSION Validation Workshops, 2010, pp. 438–443. [10] J. Moreno, D. Kossmann, T. Kraska, and S. Loesing, “A Testing Cloud computing changes the way of software deployment, Framework for Cloud Storage Systems,” Master’s thesis, Swiss Federal hosting, delivery, and maintenance, even though its long-term Institute of Technology Zurich, 2010.

11 [11] M. Oriol and F. Ullah, “YETI on the cloud,” in Third International [35] W. T. Tsai, Y. Chen, R. Paul, N. Liao, and H. Huang, “Cooperative and Conference on Software Testing, Verification, and Validation Workshops, Group Testing in Verification of Dynamic Composite Web Services,” in 2010, pp. 434–437. Proceedings of 28th Annual Int. Computer Software and Applications [12] L. Yu, W. Tsai, X. Chen, L. Liu, Y. Zhao, L. Tang, and W. Zhao, “Testing Conf. (COMPSAC 2004), vol. 2, 2004, pp. 170–173. as a Service over Cloud,” in 2010 Fifth IEEE International Symposium [36] W. T. Tsai, R. Paul, Z. Cao, L. Yu, and A. Saimi, “Verification of on Service Oriented System Engineering, 2010, pp. 181–188. Web Services using an enhanced UDDI server,” in Proceedings of 8th [13] Z. Ganon and I. E. Zilbershtein, “Cloud-based Performance Testing Int. Workshop Object-Oriented Real-Time Dependable Systems (WORDS of Network Management Systems,” in Proceedings of IEEE 14th Int. 2003), 2003, pp. 131–138. Workshop Computer Aided Modeling and Design of Communication [37] X. Bai, Z. Cao, and Y. Chen, “Design of a Trustworthy Service Links and Networks (CAMAD ’09), 2009, pp. 1–6. Broker and Dependence-based Progressive Group Testing,” International [14] S. Baride and K. Dutta, “A Cloud Based Software Testing Paradigm Journal of Simulation and Process Modelling, vol. 3, no. 1, pp. 66–79, for Mobile Aapplications,” ACM SIGSOFT Software Engineering Notes, 2007. vol. 36, no. 3, pp. 1–4, 2011. [38] X. Bai, Y. Wang, G. Dai, W.-T. Tsai, and Y. Chen, “A Framework [15] B. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, for Contract-Based Collaborative Verification and Validation of Web “Benchmarking Cloud Serving Systems with YCSB,” in Proceedings Services,” in Component-Based Software Engineering. Springer Berlin of the 1st ACM symposium on Cloud computing, 2010, pp. 143–154. / Heidelberg, 2007, vol. 4608, pp. 258–273. [16] W. Sobel, S. Subramanyam, A. Sucharitakul, J. Nguyen, H. Wong, [39] L. Riungu, O. Taipale, and K. Smolander, “Software Testing as an Online S. Patil, A. Fox, and D. Patterson, “CloudStone: Multi-platform, Multi- Service: Observations from Practice,” in Third International Conference language Benchmark and Measurement Tools for Web 2.0,” in Proceed- on Software Testing, Verification, and Validation Workshops, 2010, pp. ings of Cloud Computing and Its Applications, 2008. 418–423. [17] C. Binnig, D. Kossmann, T. Kraska, and S. Loesing, “How is the Weather [40] Tieto. [Online]. Available: http://www.tieto.com/ Tomorrow? Towards a Benchmark for the Cloud,” in Proceedings of the [41] Sogeti. [Online]. Available: http://www.sogeti.com/testing Second International Workshop on Testing Database Systems, 2009, pp. [42] T. King, A. Ramirez, R. Cruz, and P. Clarke, “An Integrated Self-Testing 9:1–9:6. Framework for Autonomic Computing Systems,” Journal of Computers, [18] D. Borthakur. The Hadoop Distributed File System: Architecture and vol. 2, no. 9, pp. 237–249, 2007. Design. [Online]. Available: http://hadoop.apache.org/common/docs/r0. [43] T. King, “A Self-Testing Approach for Autonomic Software,” Ph.D. 18.0/hdfs design.pdf dissertation, Florida International University, 2010. [19] S. Ghemawat, H. Gobioff, and S. Leung, “The Google File System,” [44] R. Calheiros, R. Ranjan, A. Beloglazov, C. De Rose, and R. Buyya, in ACM SIGOPS Operating Systems Review, vol. 37, no. 5, 2003, pp. “CloudSim: a Toolkit for Modeling and Simulation of Cloud Computing 29–43. Environments and Evaluation of Resource Provisioning Algorithms,” [20] W. Tsai, G. Qi, and X. Bai, “AgileSaaS: An agile SaaS development Software: Practice and Experience, vol. 41, no. 1, pp. 23–50, 2011. framework,” Arizona State University, Tempe, AZ, USA, 2011. [45] T. Banzai, H. Koizumi, R. Kanbayashi, T. Imada, T. Hanawa, and [21] W. Tsai, Q. Shao, and W. Li, “OIC: Ontology-based intelligent cus- M. Sato, “D-Cloud: Design of a Software Testing Environment for tomization framework for SaaS,” in Proceedings of SOCA, 2010, pp. Reliable Distributed Systems using Cloud Computing Technology,” in 1–8. Proceedings of the 2010 10th IEEE/ACM International Conference on [22] X. Bai, W. T. Tsai, Y. Gong, and J. Huang, “Massive Scalability Cluster, Cloud and Grid Computing, 2010, pp. 631–636. and Dynamic Reconfiguration: Challenges to Software System Testing [46] P. Joshi, H. S. Gunawi, and K. Sen, “PreFail: a Programmable Tool for Introduced by Service-Oriented Computing,” Communications of the Multiple-Failure Injection,” in Proceedings of the 2011 ACM interna- CCF, vol. 9, pp. 35–41, 2010. tional conference on Object oriented programming systems languages [23] C. Frye. Cloud Computing Creates Software Testing Challenges. and applications, 2011, pp. 171–188. [Online]. Available: http://http://searchcloudcomputing.techtarget.com/ [47] H. S. Gunawi, T. Do, P. Joshi, P. Alvaro, J. Yun, J.-s. Oh, J. M. news/1355198/Cloud-computing-creates-software-testing-challenges Hellerstein, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, K. Sen, and [24] T. Parveen and S. Tilley, “When to Migrate Software Testing to D. Borthakur, “FATE and DESTINI: A Framework for Cloud Recovery the Cloud?” in Third International Conference on Software Testing, Testing,” EECS Department, University of California, Berkeley, Tech. Verification, and Validation Workshops, 2010, pp. 424–427. Rep. UCB/EECS-2010-127, Sep 2010. [Online]. Available: http: [25] L. Riungu, O. Taipale, and K. Smolander, “Research Issues for Software //www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-127.html Testing in the Cloud,” in 2010 IEEE Second International Conference [48] J. Penix, “Panel presentation,” in 33rd International Conference on on Cloud Computing Technology and Science (CloudCom), 2010, pp. Software Engineering. Google, 2011. 557–564. [49] A. Duarte, W. Cirne, F. Brasileiro, P. Duarte, and L. Machado, “Using [26] K. Yim, D. Hreczany, and R. Iyer, “HTAF: Hybrid Testing Automation the Computational Grid to Speed Up Software Testing,” in Proceedings Framework to Leverage Local and Global Computing Resources,” in of 19th Brazilian symposium on software engineering, 2005. Computational Science and Its Applications - ICCSA 2011. Springer [50] A. Duarte, W. Cirne, F. Brasileiro, and P. Machado, “GridUnit: Software Berlin / Heidelberg, 2011, vol. 6784, pp. 479–494. [Online]. Available: Testing on the Grid,” in Proceedings of the 28th international conference http://dx.doi.org/10.1007/978-3-642-21931-3 37 on Software engineering, 2006, pp. 779–782. [27] Taobao. [Online]. Available: http://www.Taobao.com/ [51] A. Duarte, G. Wagner, F. Brasileiro, and W. Cirne, “Multi-Environment [28] W. Tsai, Y. Huang, Q. Shao, and M. Barrett, “Testing the scalability of Software Testing on the Grid,” in Proceedings of the 2006 workshop SaaS applications,” in IEEE RTSOAA ’11, 2011. on Parallel and distributed systems: testing and debugging, 2006, pp. [29] W. Tsai, Y. Huang, Q. Shao, and X. Bai, “Data partitioning and 61–68. redundancy management for robust multi-tenancy SaaS,” Int J Software [52] A. Lastovetsky, “Parallel Testing of Distributed Software,” Information Informatics, vol. 4, no. 4, 2010. and Software Technology, vol. 47, no. 10, pp. 657–662, 2005. [30] W. Tsai, X. Zhou, Y. Chen, and X. Bai, “On Testing and Evaluating [53] T. Parveen, S. Tilley, N. Daley, and P. Morales, “Towards a Distributed Service-Oriented Software,” Computer, vol. 41, no. 8, pp. 40–46, 2008. Execution Framework for JUnit Test Cases,” in IEEE International [31] W. Tsai, W. Li, H. Sarjoughian, and Q. Shao, “SimSaaS: simulation Conference on Software Maintenance, 2009., sept. 2009, pp. 425 –428. software-as-a-service,” in Proceedings of the 44th Annual Simulation [54] R. L. Grossman, Y. Gu, M. Sabala, C. Bennett, J. Seidman, and Symposium, 2011, pp. 77–86. J. Mambretti, “The Open Cloud Testbed: A Wide Area Testbed for Cloud [32] P. Brebner and A. Liu, “Performance and Cost Assessment of Cloud Computing Utilizing High Performance Network Services,” CoRR, vol. Services,” in Proceedings of the 2010 international conference on abs/0907.4810, 2009. Service-oriented computing, 2011, pp. 39–50. [55] C. Bennett, R. L. Grossman, D. Locke, J. Seidman, and S. Vejcik, [33] A. Li, X. Yang, S. Kandula, and M. Zhang, “CloudCmp: Comparing “MalStone: Towards a Benchmark for Analytics on Large Data Clouds,” Public Cloud Providers,” in Proceedings of the 10th annual conference in Proceedings of the 16th ACM International Conference on Knowledge on Internet measurement, 2010, pp. 1–14. Discovery and Data mining (SIGKDD ’10), 2010, pp. 145–152. [34] W. Tsai, R. Paul, L. Yu, A. Saimi, and Z. Cao, “Scenario-based [56] SOASTA. [Online]. Available: http://www.SOASTA.com/ Web Services Testing with Distributed Agents,” IEICE Transactions on [57] ITKO. [Online]. Available: http://www.itko.com/ Information and Systems, vol. 86, no. 10, pp. 2130–2144, 2003. [58] Cloud Testing. [Online]. Available: http://www.CloudTesting.com/

12