Performance Testing in the Cloud. How Bad Is It Really?

Total Page:16

File Type:pdf, Size:1020Kb

Performance Testing in the Cloud. How Bad Is It Really? Performance Testing in the Cloud. How Bad is it Really? Christoph Laaber Joel Scheuner Philipp Leitner Department of Informatics Software Engineering Division Software Engineering Division University of Zurich Chalmers | University of Gothenburg Chalmers | University of Gothenburg Zurich, Switzerland Gothenburg, Sweden Gothenburg, Sweden [email protected] [email protected] [email protected] Abstract to evaluate the performance of applications under “realistic condi- Rigorous performance engineering traditionally assumes measur- tions”, which nowadays often means running it in the cloud. Finally, ing on bare-metal environments to control for as many confounding they may wish to make use of the myriad of industrial-strength 1 factors as possible. Unfortunately, some researchers and practition- infrastructure automation tools, such as Chef or AWS CloudFor- 2 ers might not have access, knowledge, or funds to operate dedicated mation , which ease the setup and identical repetition of complex performance testing hardware, making public clouds an attractive performance experiments. alternative. However, cloud environments are inherently unpre- In this paper, we ask the question whether using a standard dictable and variable with respect to their performance. In this study, public cloud for software performance experiments is always a we explore the effects of cloud environments on the variability of bad idea. To manage the scope of the study, we focus on a specific performance testing outcomes, and to what extent regressions can class of cloud service, namely Infrastructure as a Service (IaaS), still be reliably detected. We focus on software microbenchmarks and on a specific type of performance experiment (evaluating the as an example of performance tests, and execute extensive experi- performance of open source software products in Java or Go using 3 ments on three different cloud services (AWS, GCE, and Azure) and microbenchmarking frameworks, such as JMH ). In this context, for different types of instances. We also compare the results toa we address the following research questions: hosted bare-metal offering from IBM Bluemix. In total, we gathered more than 5 million unique microbenchmarking data points from RQ 1. How variable are benchmark results in different cloud envi- benchmarks written in Java and Go. We find that the variability ronments? of results differs substantially between benchmarks and instance RQ 2. How large does a regression need to be to be detectable in types (from 0.03% to > 100% relative standard deviation). We also a given cloud environment? What kind of statistical methods lend observe that testing using Wilcoxon rank-sum generally leads to themselves to confidently detect regressions? unsatisfying results for detecting regressions due to a very high number of false positives in all tested configurations. However, sim- ply testing for a difference in medians can be employed with good We base our research on 20 real microbenchmarks sampled from success to detect even small differences. In some cases, a difference four open source projects written in Java or Go. Further, we study in- as low as a 1% shift in median execution time can be found with a stances hosted in three of the most prominent public IaaS providers low false positive rate given a large sample size of 20 instances. (Google Compute Engine, AWS EC2, and Microsoft Azure) and con- trast the results against a dedicated bare-metal machine deployed Keywords using IBM Bluemix (formerly Softlayer). We also evaluate and com- pare the impact of common deployment strategies for performance performance testing, microbenchmarking, cloud tests in the cloud, namely running test and control experiments on different cloud instances, on the same instances, and randomized 1 Introduction multiple interleaved trials as recently proposed as best practice [1]. In many domains, renting computing resources from public clouds We find that result variability ranges from 0.03% relative stan- has largely replaced privately owning resources. This is due to dard deviation to > 100%. This variability depends on the particular economic factors (public clouds can leverage economies of scale), benchmark and the environment it is executed in. Some benchmarks but also due to the convenience of outsourcing tedious data center show high variability across all studied instance types, whereas or server management tasks [6]. However, one often-cited disad- others are stable in only a subset of the studied environments. We vantage of public clouds is that the inherent loss of control can conclude that there are different types of instability (variability in- lead to highly variable and unpredictable performance (e.g., due to herent to the benchmark, variability between trials, and variability co-located noisy neighbors) [7, 13, 17]. This makes adopting cloud between instances), which needs to be handled differently. computing for performance testing, where predictability and low- Further, we find that standard hypothesis testing (Wilcoxon rank- level control over the used hard- and software is key, a challenging sum in our case) is an unsuitable tool for performance-regression proposition. detection in cloud environments due to false positive rates of 26% There are nonetheless many good reasons why researchers and to 81%. However, despite the highly variable measurement results, practitioners might be interested in adopting public clouds for their medians tend to be stable. Hence, we show that simply testing for experiments. They may not have access to dedicated hardware, 1https://www.chef.io or the hardware that they have access to may not scale to the ex- 2https://aws.amazon.com/cloudformation periment size that the experimenter has in mind. They may wish 3http://openjdk.java.net/projects/code-tools/jmh PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.3507v1 | CC BY 4.0 Open Access | rec: 4 Jan 2018, publ: 4 Jan 2018 difference in medians can be used to surprisingly good results, as @State(Scope.Thread) long as a sufficient number of samples can be collected. Forthe public class ComputationSchedulerPerf { baseline case (1 trial at 1 instance), we are mostly only able to detect @State(Scope.Thread) regression sizes >20%, if regressions are detectable at all. However, public static class Input with increased sample sizes (e.g., 20 instances, or 5 trials), even a extends InputWithIncrementingInteger { median shift of 1% is detectable for some benchmarks. @Param({ "100" }) Following the findings, we can conclude that executing software public int size; } microbenchmarks is possible on cloud instances, albeit with some caveats. Not all cloud providers and instance types have shown to @Benchmark be equally useful for performance testing, and not all microbench- public void observeOn(Input input) { marks lend themselves to reliably finding regressions. In most set- LatchedObserver<Integer> o = tings, a substantial number of trials or instances are required to input.newLatchedObserver(); input.observable.observeOn( achieve good results. However, running test and control groups on Schedulers.computation() the same instance, optimally in random order, reduces the num- ).subscribe(o); ber of required repetitions. Practitioners can use our study as a o.latch.await(); blueprint to evaluate the stability of their own performance mi- } } crobenchmarks, and in the environments and experimental settings that are available to them. Listing 1: JMH example from the RxJava project. models: IaaS, Platform as a Service (PaaS), and Software as a Ser- 2 Background vice (SaaS). These levels differ mostly in which parts of the cloud We now briefly summarize important background concepts that we stack is managed by the cloud provider and what is left for the use in our study. customer. Most importantly, in IaaS, computational resources are acquired and released in the form of virtual machines or contain- 2.1 Software Microbenchmarking ers. Tenants do not need to operate physical servers, but are still Performance testing is a common term used for a wide varity of required to administer their virtual servers. We argue that for the different approaches. In this paper, we focus on one very specific scope of our research, IaaS is the most suitable model at the time of technique, namely software microbenchmarking (sometimes also writing, as this model still allows for comparatively low-level access referred to as performance unit tests [12]). Microbenchmarks are to the underyling infrastructure. Further, setting up performance short-running (e.g., < 1ms), unit-test-like performance tests, which experiments in IaaS is significanly simpler than doing the same in attempt to measure fine-grained performance counters, such as a typical PaaS system. method-level execution times, throughput, or heap utilization. Typ- In IaaS, a common abstraction is the notion of an instance: an ically, frameworks repeatedly execute microbenchmarks for a cer- instance is a bundle of resources (e.g., CPUs, storage, networking tain time duration (e.g., 1s) and report their average execution time. capabilites, etc.) defined through an instance type and an imagine. In the Java world, the Java Microbenchmarking Harness (JMH) The instance type governs how powerful the instance is supposed to has established itself as the de facto standard to implement software be (e.g., what hardware it receives), while the image defines the soft- microbenchmarking [24]. JMH is part of the OpenJDK implemen- ware initially installed.
Recommended publications
  • Microprocessor
    MICROPROCESSOR www.MPRonline.com THE REPORTINSIDER’S GUIDE TO MICROPROCESSOR HARDWARE EEMBC’S MULTIBENCH ARRIVES CPU Benchmarks: Not Just For ‘Benchmarketing’ Any More By Tom R. Halfhill {7/28/08-01} Imagine a world without measurements or statistical comparisons. Baseball fans wouldn’t fail to notice that a .300 hitter is better than a .100 hitter. But would they welcome a trade that sends the .300 hitter to Cleveland for three .100 hitters? System designers and software developers face similar quandaries when making trade-offs EEMBC’s role has evolved over the years, and Multi- with multicore processors. Even if a dual-core processor Bench is another step. Originally, EEMBC was conceived as an appears to be better than a single-core processor, how much independent entity that would create benchmark suites and better is it? Twice as good? Would a quad-core processor be certify the scores for accuracy, allowing vendors and customers four times better? Are more cores worth the additional cost, to make valid comparisons among embedded microproces- design complexity, power consumption, and programming sors. (See MPR 5/1/00-02, “EEMBC Releases First Bench- difficulty? marks.”) EEMBC still serves that role. But, as it turns out, most The Embedded Microprocessor Benchmark Consor- EEMBC members don’t openly publish their scores. Instead, tium (EEMBC) wants to help answer those questions. they disclose scores to prospective customers under an NDA or EEMBC’s MultiBench 1.0 is a new benchmark suite for use the benchmarks for internal testing and analysis. measuring the throughput of multiprocessor systems, Partly for this reason, MPR rarely cites EEMBC scores including those built with multicore processors.
    [Show full text]
  • Load Testing, Benchmarking, and Application Performance Management for the Web
    Published in the 2002 Computer Measurement Group (CMG) Conference, Reno, NV, Dec. 2002. LOAD TESTING, BENCHMARKING, AND APPLICATION PERFORMANCE MANAGEMENT FOR THE WEB Daniel A. Menascé Department of Computer Science and E-center of E-Business George Mason University Fairfax, VA 22030-4444 [email protected] Web-based applications are becoming mission-critical for many organizations and their performance has to be closely watched. This paper discusses three important activities in this context: load testing, benchmarking, and application performance management. Best practices for each of these activities are identified. The paper also explains how basic performance results can be used to increase the efficiency of load testing procedures. infrastructure depends on the traffic it expects to see 1. Introduction at its site. One needs to spend enough but no more than is required in the IT infrastructure. Besides, Web-based applications are becoming mission- resources need to be spent where they will generate critical to most private and governmental the most benefit. For example, one should not organizations. The ever-increasing number of upgrade the Web servers if most of the delay is in computers connected to the Internet and the fast the database server. So, in order to maximize the growing number of Web-enabled wireless devices ROI, one needs to know when and how to upgrade create incentives for companies to invest in Web- the IT infrastructure. In other words, not spending at based infrastructures and the associated personnel the right time and spending at the wrong place will to maintain them. By establishing a Web presence, a reduce the cost-benefit of the investment.
    [Show full text]
  • Overview of the SPEC Benchmarks
    9 Overview of the SPEC Benchmarks Kaivalya M. Dixit IBM Corporation “The reputation of current benchmarketing claims regarding system performance is on par with the promises made by politicians during elections.” Standard Performance Evaluation Corporation (SPEC) was founded in October, 1988, by Apollo, Hewlett-Packard,MIPS Computer Systems and SUN Microsystems in cooperation with E. E. Times. SPEC is a nonprofit consortium of 22 major computer vendors whose common goals are “to provide the industry with a realistic yardstick to measure the performance of advanced computer systems” and to educate consumers about the performance of vendors’ products. SPEC creates, maintains, distributes, and endorses a standardized set of application-oriented programs to be used as benchmarks. 489 490 CHAPTER 9 Overview of the SPEC Benchmarks 9.1 Historical Perspective Traditional benchmarks have failed to characterize the system performance of modern computer systems. Some of those benchmarks measure component-level performance, and some of the measurements are routinely published as system performance. Historically, vendors have characterized the performances of their systems in a variety of confusing metrics. In part, the confusion is due to a lack of credible performance information, agreement, and leadership among competing vendors. Many vendors characterize system performance in millions of instructions per second (MIPS) and millions of floating-point operations per second (MFLOPS). All instructions, however, are not equal. Since CISC machine instructions usually accomplish a lot more than those of RISC machines, comparing the instructions of a CISC machine and a RISC machine is similar to comparing Latin and Greek. 9.1.1 Simple CPU Benchmarks Truth in benchmarking is an oxymoron because vendors use benchmarks for marketing purposes.
    [Show full text]
  • Opportunities and Open Problems for Static and Dynamic Program Analysis Mark Harman∗, Peter O’Hearn∗ ∗Facebook London and University College London, UK
    1 From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis Mark Harman∗, Peter O’Hearn∗ ∗Facebook London and University College London, UK Abstract—This paper1 describes some of the challenges and research questions that target the most productive intersection opportunities when deploying static and dynamic analysis at we have yet witnessed: that between exciting, intellectually scale, drawing on the authors’ experience with the Infer and challenging science, and real-world deployment impact. Sapienz Technologies at Facebook, each of which started life as a research-led start-up that was subsequently deployed at scale, Many industrialists have perhaps tended to regard it unlikely impacting billions of people worldwide. that much academic work will prove relevant to their most The paper identifies open problems that have yet to receive pressing industrial concerns. On the other hand, it is not significant attention from the scientific community, yet which uncommon for academic and scientific researchers to believe have potential for profound real world impact, formulating these that most of the problems faced by industrialists are either as research questions that, we believe, are ripe for exploration and that would make excellent topics for research projects. boring, tedious or scientifically uninteresting. This sociological phenomenon has led to a great deal of miscommunication between the academic and industrial sectors. I. INTRODUCTION We hope that we can make a small contribution by focusing on the intersection of challenging and interesting scientific How do we transition research on static and dynamic problems with pressing industrial deployment needs. Our aim analysis techniques from the testing and verification research is to move the debate beyond relatively unhelpful observations communities to industrial practice? Many have asked this we have typically encountered in, for example, conference question, and others related to it.
    [Show full text]
  • Evaluation of AMD EPYC
    Evaluation of AMD EPYC Chris Hollowell <[email protected]> HEPiX Fall 2018, PIC Spain What is EPYC? EPYC is a new line of x86_64 server CPUs from AMD based on their Zen microarchitecture Same microarchitecture used in their Ryzen desktop processors Released June 2017 First new high performance series of server CPUs offered by AMD since 2012 Last were Piledriver-based Opterons Steamroller Opteron products cancelled AMD had focused on low power server CPUs instead x86_64 Jaguar APUs ARM-based Opteron A CPUs Many vendors are now offering EPYC-based servers, including Dell, HP and Supermicro 2 How Does EPYC Differ From Skylake-SP? Intel’s Skylake-SP Xeon x86_64 server CPU line also released in 2017 Both Skylake-SP and EPYC CPU dies manufactured using 14 nm process Skylake-SP introduced AVX512 vector instruction support in Xeon AVX512 not available in EPYC HS06 official GCC compilation options exclude autovectorization Stock SL6/7 GCC doesn’t support AVX512 Support added in GCC 4.9+ Not heavily used (yet) in HEP/NP offline computing Both have models supporting 2666 MHz DDR4 memory Skylake-SP 6 memory channels per processor 3 TB (2-socket system, extended memory models) EPYC 8 memory channels per processor 4 TB (2-socket system) 3 How Does EPYC Differ From Skylake (Cont)? Some Skylake-SP processors include built in Omnipath networking, or FPGA coprocessors Not available in EPYC Both Skylake-SP and EPYC have SMT (HT) support 2 logical cores per physical core (absent in some Xeon Bronze models) Maximum core count (per socket) Skylake-SP – 28 physical / 56 logical (Xeon Platinum 8180M) EPYC – 32 physical / 64 logical (EPYC 7601) Maximum socket count Skylake-SP – 8 (Xeon Platinum) EPYC – 2 Processor Inteconnect Skylake-SP – UltraPath Interconnect (UPI) EYPC – Infinity Fabric (IF) PCIe lanes (2-socket system) Skylake-SP – 96 EPYC – 128 (some used by SoC functionality) Same number available in single socket configuration 4 EPYC: MCM/SoC Design EPYC utilizes an SoC design Many functions normally found in motherboard chipset on the CPU SATA controllers USB controllers etc.
    [Show full text]
  • Software Testing: Essential Phase of SDLC and a Comparative Study Of
    International Journal of System and Software Engineering Volume 5 Issue 2, December 2017 ISSN.: 2321-6107 Software Testing: Essential Phase of SDLC and a Comparative Study of Software Testing Techniques Sushma Malik Assistant Professor, Institute of Innovation in Technology and Management, Janak Puri, New Delhi, India. Email: [email protected] Abstract: Software Development Life-Cycle (SDLC) follows In the software development process, the problem (Software) the different activities that are used in the development of a can be dividing in the following activities [3]: software product. SDLC is also called the software process ∑ Understanding the problem and it is the lifeline of any Software Development Model. ∑ Decide a plan for the solution Software Processes decide the survival of a particular software development model in the market as well as in ∑ Coding for the designed solution software organization and Software testing is a process of ∑ Testing the definite program finding software bugs while executing a program so that we get the zero defect software. The main objective of software These activities may be very complex for large systems. So, testing is to evaluating the competence and usability of a each of the activity has to be broken into smaller sub-activities software. Software testing is an important part of the SDLC or steps. These steps are then handled effectively to produce a because through software testing getting the quality of the software project or system. The basic steps involved in software software. Lots of advancements have been done through project development are: various verification techniques, but still we need software to 1) Requirement Analysis and Specification: The goal of be fully tested before handed to the customer.
    [Show full text]
  • Stress Testing Embedded Software Applications
    Embedded Systems Conference / Boston, MA September 20, 2007 ESC-302 Stress Testing Embedded Software Applications T. Adrian Hill Johns Hopkins University Applied Physics Laboratory [email protected] ABSTRACT This paper describes techniques to design stress tests, classifies the types of problems found during these types of tests, and analyzes why these problems are not discovered with traditional unit testing or acceptance testing. The findings are supported by citing examples from three recent embedded software development programs performed by the Johns Hopkins University Applied Physics Laboratory where formal stress testing was employed. It also defines what is meant by the robustness and elasticity of a software system. These findings will encourage software professionals to incorporate stress testing into their formal software development process. OUTLINE 1. INTRODUCTON 1.1 What can be learned by “breaking” the software 2. DESIGNING A STRESS TEST 2.1 What is a reasonable target CPU load? 2.2 Other ways to stress the system 2.3 Characteristics of a Stress Test 3. REAL WORLD RESULTS 3.1 Case #1: Software Missed Receiving Some Commands When CPU Was Heavily Loaded 3.2 Case #2: Processor Reset when Available Memory Buffers Were Exhausted 3.3 Case #3: Unexpected Command Rejection When CPU Was Heavily Loaded 3.4 Case #4: Processor Reset When RAM Disk Was Nearly Full 3.5 Synopsis of All Problems Found During Stress Testing 4. SUMMARY 1. INTRODUCTON Traditional Software Acceptance Testing is a standard phase in nearly every software development methodology. Test engineers develop and execute tests that are defined to validate software requirements.
    [Show full text]
  • Testing Techniques Selection Based on SWOT Analysis
    International Journal of Knowledge, Innovation and Entrepreneurship Volume 2 No. 1, 2014, pp. 56—68 Testing Techniques Selection Based on SWOT Analysis MUNAZZA JANNISAR, RUQIA BIBI & MUHAMMAD FAHAD KHAN University of Engineering and Technology, Pakistan Received 02 March 2014; received in revised form 15 April 2014; approved 22 April 2014 ABSTRACT Quality is easy to claim but hard to achieve. Testing is a most essential phase in software development life cycle not only to ensure a project’s success but also customer satisfaction. This paper presents SWOT Analysis of three testing tech- niques—White-Box, Black-Box and Grey Box. The testing techniques are evaluated on the basis of their strengths, weaknesses, opportunities and threats. The analysis in this paper shows that selection of the techniques should be based on a set of defined attrib- utes, context and objective. The findings in this paper might be helpful in highlighting and addressing issues related to testing techniques selection based on SWOT analysis. Keywords: Black Box Testing, White Box Testing, Grey Box, SWOT Introduction Software plays a substantial role in every sphere of life. It is a key reason why software engineering continually introducing new methodologies and improvement to software development. The capacity of organisations to achieve customer satisfaction or to cor- rectly identify the needs of customers can be misplaced, leading to ambiguities and un- wanted results 1. Researchers have suggested many testing techniques to deal with fault identification and removal subjects, before the product [software] is shipped to cus- tomer. Despite many verification and validation approaches to assuring product quality, defect-free software goal is not very often achieved.
    [Show full text]
  • Putting the Right Stress Into WMS Volume Testing Eliminate Unwelcome Surprises
    v i e w p o i n t Putting the Right Stress into WMS Volume Testing Eliminate unwelcome surprises. When implementing or upgrading a Warehouse Management System (WMS) in a high volume environment, it should be a given that a significant amount of effort needs to be dedicated to various phases of testing. However, a common mistake is to ignore or improperly plan for volume testing. So what types of processes should be tested? There is unit testing that validates segments of functionality, RF Floor Transactions - This should involve the more interface testing that checks the flow of data between systems, commonly used RF transactions such as putaway and picking. user acceptance testing that involves the end user group It is not a good idea to include every type of RF activity. That working through scripted functional flows, and field testing that means you probably will not be paying too much attention to executes an augmented version of the user acceptance testing cycle counts as a part of your stress test. on the distribution center floor. Wave Processing - It is absolutely necessary to know how Volume testing, also known as stress testing, is an assessment long it will take to process each day’s orders. In addition, it is that may be ignored. This is a mistake, because this puts a strain important to validate if there are any important processes that on the processes within the WMS, as well as its interfaces in cannot be executed during wave processing. For example, the order to identify conditions that would result in a slow-down testing may reveal that RF unit picking is an activity that cannot or catastrophic crash of the system.
    [Show full text]
  • An Effective Dynamic Analysis for Detecting Generalized Deadlocks
    An Effective Dynamic Analysis for Detecting Generalized Deadlocks Pallavi Joshi Mayur Naik Koushik Sen EECS, UC Berkeley, CA, USA Intel Labs Berkeley, CA, USA EECS, UC Berkeley, CA, USA [email protected] [email protected] [email protected] David Gay Intel Labs Berkeley, CA, USA [email protected] ABSTRACT for a synchronization event that will never happen. Dead- We present an effective dynamic analysis for finding a broad locks are a common problem in real-world multi-threaded class of deadlocks, including the well-studied lock-only dead- programs. For instance, 6,500/198,000 (∼ 3%) of the bug locks as well as the less-studied, but no less widespread or reports in the bug database at http://bugs.sun.com for insidious, deadlocks involving condition variables. Our anal- Sun's Java products involve the keyword \deadlock" [11]. ysis consists of two stages. In the first stage, our analysis Moreover, deadlocks often occur non-deterministically, un- observes a multi-threaded program execution and generates der very specific thread schedules, making them harder to a simple multi-threaded program, called a trace program, detect and reproduce using conventional testing approaches. that only records operations observed during the execution Finally, extending existing multi-threaded programs or fix- that are deemed relevant to finding deadlocks. Such op- ing other concurrency bugs like races often involves intro- erations include lock acquire and release, wait and notify, ducing new synchronization, which, in turn, can introduce thread start and join, and change of values of user-identified new deadlocks. Therefore, deadlock detection tools are im- synchronization predicates associated with condition vari- portant for developing and testing multi-threaded programs.
    [Show full text]
  • Scalability Performance of Software Testing by Using Review Technique Mr
    International Journal of Scientific & Engineering Research Volume 3, Issue 5, May-2012 1 ISSN 2229-5518 Scalability performance of Software Testing by Using Review Technique Mr. Ashish Kumar Tripathi, Mr. Saurabh Upadhyay, Mr.Sachin Kumar Dhar Dwivedi Abstract-Software testing is an investigation which aimed at evaluating capability of a program or system and determining that it meets its required results. Although crucial to software quality and widely deployed by programmers and testers, software testing still remains an art, due to limited understanding of the principles of software, we cannot completely test a program with moderate complexity. Testing is more than just debugging. The purpose of testing can be quality assurance, verification and validation, or reliability estimation. Testing can be used as a generic metric as well. Correctness testing and reliability testing are two major areas of testing. Software testing is a trade-off between budget, time and quality. Index Terms--Software testing modules, measurement process, performance and review technique. —————————— —————————— 1-INTRODUCTION throughput of a web application based on requests per esting is not just finding out the defects. Testing is second, concurrent users, or bytes of data transferred as not just seeing the requirements are Satisfied which well as measure the performance. T are necessary in software development. Testing is a process of verifying and validating all wanted requirements 3-PROCESS FOR MEASUREMENT is there in products and also verifying and validating any unwanted requirements are there in the products. It is also In the measurement of configured software there are seeing any latent effects are there in the product because of several technique can be applicable these requirements.
    [Show full text]
  • Performance Testing, Load Testing & Stress Testing Explained
    Performance Testing, Load Testing & Stress Testing Explained Performance testing is key for understanding how your system works. Without good performance testing, you don’t know how your system will deal with expected—or unexpected—demands. Load testing and stress testing are two kinds of performance testing. Knowing when to use load testing and when to use stressing testing depends on what you need: Do you need to understand how your system performs when everything is working normally? Do you want to find out what will happen under unexpected pressure like traffic spikes? To ensure that your systems remain accessible under peak demand, run your system through performance testing. Let’s take a look. Performance vs load vs stress testing Performance testing is an umbrella term for load testing and stress testing. When developing an application, software, or website, you likely set a benchmark (standard) for performance. This covers what happens under: Regular parameters: If everything goes as planned, how does it work? Irregular parameters: Can my website application survive a DDoS attack? Load testing is testing how an application, software, or website performs when in use under an expected load. We intentionally increase the load, searching for a threshold for good performance. This tests how a system functions when it faces normal traffic. Stress testing is testing how an application, software, or website performs when under extreme pressure—an unexpected load. We increase the load to its upper limit to find out how it recovers from possible failure. This tests how a system functions when it faces abnormal traffic. Now let’s look at each in more detail.
    [Show full text]