CS 408

Lecture 5: Higher-Order Testing 1/31/17

Myers, Chapter 6 Basics

• A software error occurs when the program does not do what its end user reasonably expects it to do

• Even if you could perform an absolutely-perfect module test, you still could not guarantee that you have found all software errors

• To complete testing, higher-order testing is necessary

• Software development is largely a process of communicating information about the eventual product and translating this information from one form to another

2 Basics

• Translate the software user's needs into a set of written requirements (Product Backlog in Scrum)

• Design -- partitions the system into individual programs, components, or subsystems, and defines their interfaces

• Translate the Design into source code

Most software errors stem from breakdowns in information communication

One solution is to orient distinct testing processes toward particular classes of errors

Need for higher-order testing increases along with the size of the program

3 Basics

4 Function Testing

• Function testing is the process of attempting to find discrepancies between the program and the external specification

• An external specification is a precise description of the program's behavior from the end-user point of view

• Function testing is normally a black-box activity

Equivalence partitioning, boundary value analysis, cause-effect graphing, and error-guessing methods

Covered back in Chapter 5

5

• System testing is not a process of testing the functions of the complete system or program

• System testing has a particular purpose: to compare the system or program to its original objectives

• Several System Test categories can be used. Many of these are related to non-functional Attributes

Some of these will be important in CS 40800 (e.g., , Performance Testing, Testing, ....) But, you are not required to do all (or even any!) of these

6 Sanity and Sanity testing is a very brief run-through of the functionality of a program to assure that the software works roughly as expected. This is often prior to a more exhaustive round of testing

Smoke Testing

Origin -- physical tests using smoke made to closed systems of pipes to detect cracks or breaks

Subset of test cases that cover the most important functionality of a component or system are selected and run to ascertain if the most crucial functions of a program work correctly

Purpose is to determine whether the application is so badly- broken that further testing is unnecessary

7 Performance Testing Many programs have specific performance or efficiency objectives, stating such properties as response times and throughput rates under certain workload and configuration conditions

Test cases should be designed to show that the program does not satisfy its performance objectives Volume Testing Subject the program to heavy volumes of data

Purpose of volume testing is to show that the program cannot handle the volume of data specified in its objectives

Example: The system is supposed to be able to store, retrieve, and modify information concerning 1.5 million customers

8 Stress Testing

Stress testing subjects the program to heavy loads or stresses

A heavy stress is a peak volume of data, or activity, encountered over a short span of time

If an air traffic control system is supposed to keep track of up to 200 planes in its sector, you could stress-test it by simulating the presence of 200 planes ... or more. If an operating system is supposed to support a maximum of 150 concurrent jobs, the system could be stressed by attempting to run 150 jobs simultaneously ... or more.

Web-based applications are common subjects of stress testing (Chapter 10)

You could stress a mobile device application -- a mobile phone operating system, for example -- by launching multiple applications that run and stay resident, then try making or receiving one or more telephone calls

9 Tasking the ultimate end user of an application with testing the software in a real-world environment

Covered in next chapter (Chapter 7) Many programs now have specific security objectives

Security testing is the process of attempting to devise test cases that subvert the program's security checks

One way to devise such test cases is to study known security problems in similar systems and generate test cases that attempt to demonstrate comparable problems in the system you are testing

Web-based applications often need a higher level of security testing than do most applications (Chapter 10)

10 Storage Testing Programs occasionally have storage objectives that state, for example, the amount of system memory the program uses and the size of temporary or log files

Configuration Testing

Some software must support a variety of hardware configurations, including various types and numbers of I/0 devices and communications lines, or different memory sizes

Often, the number of possible configurations is too large to test each one

Test a representative subset

11 Hey, You Have Given Me Too Many Knobs!

Understanding and Dealing with Over-Designed ConfigurationinSystemSoftware

Tianyin Xu*, Long Jin*, Xuepeng Fan*‡,YuanyuanZhou*, Shankar Pasupathy†,andRukmaTalwadker† *University of California San Diego, USA ‡Huazhong Univ. of Science & Technology, China †NetApp, USA {tixu, longjin, xuf001, yyzhou}@cs.ucsd.edu {Shankar.Pasupathy, Rukma.Talwadker}@netapp.com Configuration Testing ABSTRACT 700 500 s

Storage-A s r 600

r MySQL e e

t 400 Configuration problems are not only prevalent, but also severely t e 500 e

m 5.6.2 m

impair the reliability of today’s system software. One fundamental a a r r 300 5.5.0

a 400 a 5.1.3 p reason is the ever-increasing complexity of configuration, reflected p f f 5.0.16 o o 300 by the large number of configuration parameters (“knobs”). With 200 4.1.0 r r e e 4.0.12 b hundreds of knobs, configuring system software to ensure highre- b 200 3.23.0 m m 100 u u 100

liability and performance becomes a daunting, error-prone task. N N This paper makes a first step in understanding a fundamental 0 0 1/1999 1/2003 1/2007 1/2011 1/2014 question of configuration design: “do users really need so many 7/2006 7/2008 7/2010 7/2012 7/2014 Releasetime Releasetime knobs?”Toprovidethequantitativelyanswer,westudythecon- 600 200 figuration settings of real-world users, including thousands of cus- s Apache s r Hadoop 500 r 2.0.0 e e 160 t tomers of a commercial storage system (Storage-A), and hundreds t e 2.3.4 e 1.0.0 m 400 m

of users of two widely-used open-source system software projects. a a r r 120 a a 0.19.0 p Our study reveals a series of interesting findings to motivatesoft- p 300 f f 2.2.14 o o ware architects and developers to be more cautious and disciplined 2.0.35 80 r r e e 200 b in configuration design. Motivated by these findings, we provide b 1.3.24 0.1.0 m m 40 u afewconcrete,practicalguidelineswhichcansignificantlyreduce u 100 1.3.14 MapReduce N N HDFS the configuration space. Take Storage-A as an example, the guide- 0 0 lines can remove 51.9% of its parameters and simplify 19.7% of 1/1998 1/2002 1/2006 1/2010 1/2014 1/2006 1/2008 1/2010 1/2012 1/2014 Releasetime Releasetime the remaining ones with little impact on existing users. Also, we study the existing configuration navigation methods in the context Figure 1: The increasing number of configuration parameters with of “too many knobs” to understand their effectiveness in deaXuling et. al.,software FSE’15 evolution. Storage-A is a commercial storage system from a with the over-designed configuration, and to provide practices for major storage company in the U.S. 12 building navigation support in system software.

Categories and Subject Descriptors: D.2.10 [Software Engineer- all the customer-support cases in a major storage company in the ing]: Methodologies U.S., and were the most significant contributor (31%) among all the high-severity cases [75]. Rabkin and Katz reported that config- General Terms: Design, Human Factors, Reliability uration issues were the dominant source of support cost in Hadoop clusters (based on data from Cloudera Inc.), in terms of both the Keywords: Configuration, Complexity, Simplification, Navigation, number of support cases and the amount of supporting time [46]. Parameter, Difficulty, Error Moreover, configuration errors, the after-effects of configuration difficulties, have become one of the major causes of system fail- 1. INTRODUCTION ures. Barroso and Hölzle reported that configuration errors were the second major cause of service-level disruptions at one of Google’s 1.1 Motivation main services [16]. Recently, a number of outages of Internetand In recent years, configuration problems have drawn tremendous cloud services, including Google, LinkedIn, Microsoft Azure, and attention for their increasing prevalence and severity. Forexample, Amazon EC2, were caused by configuration errors [35, 59, 63, 68]. Yin et al. reported that configuration issues accounted for 27% of One fundamental reason for today’s prevalent configuration is- sues is the ever-increasing complexity of configuration, especially in system software. This is reflected by the large and still increasing Permission to make digital or hard copies of all or part of this work for personal or number of configuration parameters (“knobs”), as well as various classroom use is granted without fee provided that copies are not made or distributed configuration constraints and consistency requirements [32, 39, 45, for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM 72] (known as complexity of interaction and tightness of coupling must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, in human error studies [41,48]). For example, MySQL 5.6 database to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. server has 461 configuration parameters; 216 of them are not with simple data types (e.g., Boolean or enumerative) but rather more ESEC/FSE’15, August 30 – September 4, 2015, Bergamo, Italy c 2015 ACM. 978-1-4503-3675-8/15/08...$15.00 complex ones. These parameters control different buffer sizes, time- http://dx.doi.org/10.1145/2786805.2786852 outs, resource limits, etc. Setting them correctly requiresdomain-

307 Configuration error: Severity level Latent Non-latent Diagnosis (48 hrs) diskd_program = a non-existent path All cases 47.6% 52.4% - 26 rounds of diagnostic High severity 75.0% 25.0% conversations; Parse config files; Use the setting of - 5 collections of logs & Configuration store the settings diskd_program Testing (Examples runtime traces; from Xu et. al. OSDI’16 Table 1: Severity of latent versus non-latent errors among the cus- in program vars. for log rotation. - 2 incorrect patches. tomers’ configuration issues of COMP-A. LC errors contribute to Configuration error: 75%Severity of the high-severity level configurationLatent issues. Non-latent Diagnosis (48 hrs) diskd_program = a non-existent path All cases 47.6% 52.4% - 26 rounds of diagnostic Error class Mean Median Initialization Serving requests !Hogging the C P cUon fovre 7r+s ahtrions¥ s; High severity 75.0% 25.0% Parse config files; Use the setting of Latent 1.14 1.70 - 5 collections of logs & Squid store t he[Pa stectthi]ng Cshe ck exist diskd_programence of diskd_prog ram during ini t iraunliztaimtione t races; Table 1: SeverityNon-latent of latent versus0.87 non-latent errors0.41 among the cus- in program vars. for log rotation. - 2 incorrect patches. tomers’ configuration issues of COMP-A. LC errors contribute to 75%Table of the 2: high-severity Diagnosis time configuration of latent versusissues. non-latent errors among Figure 1: A real-world LC error from Squid [37]. The error caused customers’ configuration issues of COMP-A. The time is normalized system hanging for 7+ hours, and resulted in 48 hours of diagnosis ef- byError the average class time of all theMean reported issues. Median forts.Initia Later,lization a patchS waserving added reque tosts check the!Hogg existenceing the C ofPU the for c7onfigured+ hrs¥ Latent 1.14 1.70 path during initialization. Unfortunately, the patched check is still sub- ject to [ LCPatc errorsh] Che suchck ex asisten incorrectce of disk filed_p typesrogram and du permissionsring initializat.ion performNon-latent a comprehensive0.87 suite of test cases0.41 against con- Tablefiguration 2: Diagnosis settings, time of especially latent versus for non-latent those hard-to-test errors among ones Figure 1: A real-world LC error from Squid [37]. The error caused customers’(e.g., failure/error-handling configuration issues of COMP-A. relatedThe configurations) time is normalized that system1. Conf hangingiguration for err 7+or: hours, and3. Co resultedde snipp inets 48: hours/* TaskTracker of diagn.javaosis */ ef- by the average time of all the reported issues. forts.mapred.local. Later, a patchdir was added// to checkno check the at existence initialization of the configured may require complex setups and even fault injections. = directory path w/ wrong owner path during initialization. Unfortunately,while (running the patched) { check is still sub- Therefore, early detection should inevitably fall onto (mapred.local.dir is not used try { ject to LC errors such as incorrect file types andInfin permissionsite loops . the shoulder of the system itself—the system should au- until exec. of MapReduce jobs) ... perform a comprehensive suite of test cases against con- access mapred.local.dir Map reduce figurationtomatically settings, check especially as many configurationsfor those hard-to-test as possible one ats 2. Impact ... Throw The TaskTrackers were trapped Exception 1. Configuration error: 3 .} Co catch(Exceptionde snippets: /*e )TaskTracker { .java */ (e.g.,its failure/error-handling early stages (the startup related time). Unfortunately, configurations) many that of into infinite loops (!When I ran LOG.log(iRetrying!j); mapred.local.dir jobs on a big cluster, some map // } no check at initialization maytoday’s require systems complex either setups skip and the even checking fault injections. or only check = directory path w/ wrong owner Too late to avoid configurations right before the configuration values are tasks never got started.¥) }while (running ) { the failure! Therefore, early detection should inevitably fall onto (mapred.local.dir is not used try { User requests: !TaskTracker should check wheInthfineirt eit lcoopan saccess used, as shown in our study (§2). Typically, at the startup until exec. of MapReduce jobs) ... the shoulder of the system itself—the system should au- to the local dir at the initialization time, before taking any tasks.¥# time, only those configuration parameters needed for ini- access mapred.local.dir tomatically check as many configurations as possible at 2. Impact ... Throw tialization are checked (or directly used), while many FigureThe Task 2:Tr Aack real-worlders were trapped LC error from MapReduce [12]. WhenExce theption its early stages (the startup time). Unfortunately, many of } catch(Exception e) { 13 other parameters’ checking is delayed much later until exceptioninto infinite handlerloops (!W caughthen I ran the runtime LOG exception.log(iRetrying! induced byj); the LC er- today’s systems either skip the checking or only check ror,jobs iton was a bi alreadyg cluster too, som latee m toap avoid } the downtime. After thisToo incident,late to avoid when they are used in special tasks. Since such config- thetasks user nev requesteder got star toted check.¥) the configuration} “at the initial izationthe failu time.”re! configurationsuration parameters right before are neither the configuration used nor checked values during are used, as shown in our study (§2). Typically, at the startup User requests: !TaskTracker should check whether it can access normal operations, errors in their settings go undetected to the local dir at the initialization time, before taking any tasks.¥# Figure 1 shows a real-world LC error from Squid, a time,until only their those late configuration manifestation, parameters e.g., under needed circumstances for ini- Figurewidely 2: used A real-world open-source LC error Web from proxy MapReduce server. The [12]. LCWhen er- the tializationlike error are handling checked and (or fail-over. directly For used), simplicity, while we many refer exceptionror resided handler in diskd caughtprogram the runtime,aconfigurationparameter exception induced by the LC er- otherto parameters’such errors as checkinglatent configuration is delayed(LC) much errors. later until ror,used it was only already during too log late rotation. to avoid the Squid downtime. did not After check this incident, the whenLC they errors are used can result in special in severe tasks. failures, Since as such they config- are of- the user requested to check the configuration “at the initialization time.” configuration during initialization; thus, this error was urationten associated parameters with are configurations neither used nor used checked to control during criti- exposed much later after days of execution. It caused 7+ normalcal situations operations, such errors as fail-over in their [44 settings], error go handling undetected [42], hoursFigure of system1 shows downtime a real-world and cost LC 48 error hours from of diagnosis Squid, a untilbackup their late [37], manifestation, load balancing e.g., [9], mirroring under circumstances [45], etc. As widelyefforts. used After open-source the error was Web finally proxy discerned, server. the The Squid LC er- likeexplained error handling above, and their fail-over. detection For or simplicity,exposure is we often refer too rordevelopers resided added in diskd a patchprogram to proactively,aconfigurationparameter check the setting to suchlate errorsto limit as thelatent failure configuration damage. Take(LC) a real-world errors. case usedat system only startup during time log to rotation. prevent suchSquid latent did notfailures. check the LCas errorsan example can result (c.f., in§2:Figure severe failures,3a), an as LC they error are in of- the configurationFigure 2 shows during another initialization; real-world thus, example this in error which was tenfail-over associated configuration with configurations settings is detectedused to control only when criti- the exposedan LC error much failed later a large-scale after days of MapReduce execution. job It process- caused 7+ cal situationssystem encounters such as afail-over failure (e.g., [44], due error to handling hardware [ faults42], hoursing. This of system LC error downtime was replicated and cost to 48 multiple hours ofnodes diagnosis and backupor software [37], load bugs) balancing and tries [to9], fail-over mirroring to another [45], etc. compo- As efforts.crashed After the TaskTrackers the error was on finally those nodes. discerned, Specifically, the Squid explainednent. In above, this case, their the detection fail-over or attempt exposure also fails, is often making too the entire system unavailable to all the clients. developersthe error caused added a a runtime patch to exception proactively on each check node. the setting The late to limit the failure damage. Take a real-world case TaskTracker caught the exception and restarted the job. Tables 1 and 2 compare the severity level and diagno- at system startup time to prevent such latent failures. as an example (c.f., §2:Figure3a), an LC error in the Unfortunately, as the error is persistent in the configura- sis time of real-world configuration issues caused by LC Figure 2 shows another real-world example in which fail-over configuration settings is detected only when the tion file, restarting the job failed to get rid of the error errors versus non-latent configuration errors (detected at anbut LC induced error failedinfinite a loops. large-scale Note that MapReduce when the job exception process- systemthe system’s encounters startup a failure time) (e.g., of COMP-A due to1 hardware,amajorstorage faults ing.handler This caught LC error the error, was replicated it was already to multiple too late nodesto avoid and or softwarecompany bugs) in the and US. tries Although to fail-over there have to another been fewer compo- LC crasheddowntime the (the TaskTrackers best choice is on to those terminate nodes. the jobs). Specifically, nent.errors In this than case, non-latent the fail-over ones, LC attempt errors also contribute fails, making to 75% the entire system unavailable to all the clients. thePreventing error caused above a runtime LC-error exception issues would on each require node. soft- The of the high-severity issues and take much longer to diag- TaskTracker caught the exception and restarted the job. Tables 1 and 2 compare the severity level and diagno- ware systems to check configurations early during the nose, indicating their high impact and damage. Unfortunately, as the error is persistent in the configura- sis time of real-world configuration issues caused by LC initialization time, even though the configuration values 1 tion file, restarting the job failed to get rid of the error errors versusWe are required non-latent to keep configuration the company and errors its products (detected anonymou ats. are only needed in much later execution or during special but induced infinite loops. Note that when the exception the system’s startup time) of COMP-A1,amajorstorage handler caught the error, it was already too late to avoid company in the US. Although there have been fewer LC downtime (the best choice is to terminate the jobs). errors than non-latent ones, LC errors contribute to 75% 620 12th USENIX Symposium on Operating Systems Design and Implementation USENIX Association of the high-severity issues and take much longer to diag- Preventing above LC-error issues would require soft- nose, indicating their high impact and damage. ware systems to check configurations early during the initialization time, even though the configuration values 1We are required to keep the company and its products anonymous. are only needed in much later execution or during special

620 12th USENIX Symposium on Operating Systems Design and Implementation USENIX Association Auto-failover configuration parameters: HDFS-2.6.0 Software Not used during initialization Studied param. dfs.ha.fencing.ssh.connect-timeout HDFS 17 (38.6%) 44 dfs.ha.fencing.ssh.private-key-files YARN 9(25.7%) 35 1. LC Errors: HBase 3(12.0%) 25 Ill-formatted numbers (e.g., typos) for ssh timeout; Apache 4(28.6%) 14 Invalid paths for private-key files (e.g., non-existence, permission errors). Squid 4(19.0%) 21 2. Initial checks: None. MySQL 6(13.9%) 43 3. Late execution: Parse the timeout setting to an integer value; Read the file specified by the key-files setting. Table 5: The studied configuration parameters whose values are not used at the system’s initialization phase. public boolean tryFence(...) { ... int timeout = getInt(idfs.ha.fencing.ssh.connect-timeoutj); ... session.createSession (); ... getString(idfs.ha.fencing.ssh break the fail-over procedure upon the NameNode fail- } .private-key-filesj) ures (as the values are not checked or used early), making

/* hadoop-common/.../ha/ fis = new FileInputStream(prvFile); the entire HDFS service become unavailable. SshFenceByTcpPort.java */ 4. Manifestation: Apache, MySQL, and Squid all apply specific config- IllegalArgumentException (when parsing timeout to an integer) uration checking procedures at initialization, mainly for IOException (when reading the key file) checking data types and data ranges. However, for more 5. Consequence: HDFS auto-failover fails, and the entire HDFS service becomes unavailable. complicated parameters, some checking is incomplete. Figure 3b shows another new LC error we discovered. In Examples (a) Missing initial checking this case, though the initial checking code covers file ex- Error-handling configuration parameter: Apache httpd-2.4.10 istence and types, it misses other constraints such as file CoreDumpDirectory permissions. This leaves Apache subject to permission- 1. LC Errors: The running program has no permission to access coredump directory. related LC errors (which is reported as one common 2. Initial checks: Check if the path points to an existent directory. cause of core-dump failures upon server crash [41]). if (apr_stat(&finfo, fname, APR_FINFO_TYPE) != APR_SUCCESS) As shown by Figure 3b,oneconfigurationparameter return "CoreDumpDirectory does not exist"; if (finfo.filetype != APR_DIR) could have multiple subtle constraints depending on how return "CoreDumpDirectory is not a directory"; the system uses its value. For example, a configured file 3. Late execution: Change working directory (chdir) to the path. path used by chdir has different constraints from files static void sig_coredump(int sig) { iCoreDumpDirectoryj ... accessed by open;evenforfilesaccessedbythesame apr_filepath_set(ap_coredump_dir, ...); ... open O RDONLY O CREAT } call, different flags (e.g., versus ) if (chdir( rootpath ) != 0) /* server/mpm_unix.c */ return errno; would result in different constraints. Implementing code 4. Manifestation: to check such constraints is tedious and error-prone. Error code returned by the chdir call 5. Consequence: Finding 2: Many (12.0%–38.6%) of the studied RAS Apache httpd cannot switch to the configured directory, and thus fails to generate the coredump file upon server crashing. configuration parameters are not used at all during the system’s initialization phase. (b) Incomplete initial checking 14 Table 5 counts the studied configuration parameters Figure 3: New LC errors discovered in the latest versions of the that are not used at the system’s initialization phase, but studied software, both of which are found to have caused real- world failures [40, 41]. For all these LC errors, the correctness check- are consumed directly in late execution (e.g., when deal- ing is implicitly done when the parameters’ values are actually used in ing with failures). Figure 3a is such an example. Since operations, which is unfortunately too late to prevent the failures. all these parameters are from RAS features, it is natural for their usage to come late on demand. adopt the lazy practice of using configuration values3— Some Java programsput the checking or usage code of parsing and consuming configuration settings only when the parameters in the class constructors, so that the errors the values are immediately needed for the operations, can be exposed when the class objects are created (spe- without any systematic configurationchecking at the sys- cially, this is used as the practice for quickly fixing LC tem’s initialization phase. errors [18,19,54]). However, this may not fundamentally With such a practice, even trivial errors could result avoid LC errors if the class objects are not created during in big impact on the system’s dependability. Figure 3a the system’s initialization phase. exemplifies such cases using the new LC errors we dis- Note: RAS configurations can be implemented with covered in our study. In HDFS, any LC errors (such as a early usage at the system’s initialization phase. As shown na¨ıve type error) in the auto-failover configurations could in Table 5,themajorityofRASconfigurationsarein- deed used during initializaiton. For example, all the stud- 3 This is a bad but commonly adopted practice in Java and Python ied systems choose to open error-log files at initialization programs which rely on libraries (e.g., java.util.Properties and configparser)todirectlyretrieveanduseconfigurationvaluesfrom time, rather than waiting until they have to print the error configuration files on demand, without systematic early checks. messages to the log files upon failures.

USENIX Association 12th USENIX Symposium on Operating Systems Design and Implementation 623 Configuration Testing

#Parameters ceptions and error code) and the emulated execution runs Software Description Lang. Total RAS in a short period. PCHECK inserts instructions to capture HDFS Dist. filesystem Java 164 44 the anomalies that may occur during the emulated execu- YARN Dataprocessing Java 116 35 tion, as the evidence to report configuration errors. HBase Distributed DB Java 125 25 As an enforcement, PCHECK encapsulates the emu- Apache Web server C 97 14 lated execution and error capturing code into checkers for Squid Proxy server C/C++ 216 21 MySQL DB server C++ 462 43 every configuration parameter, and invokes the checkers at the system’s initialization phase. This can minimize Table 3: The systems and the RAS parameters studied in §2. potential LC errors, and compensate for the missing and Deficiency of initial checking Studied incomplete configuration checks in real-world systems. Software Missing Incomplete param. We implement P C HECK for C and Java programs on HDFS 41 (93.2%) 3(6.9%) 44 top of the LLVM [4]andSoot[3]compilerframeworks. YARN 29 (82.9%) 5(14.3%) 35 We apply P C HECK to 58 real-world LC errors of various HBase 18 (72.0%) 5(2.0%) 25 error types occurred in widely-used systems (each leads Apache 4(28.6%) 2(14.3%) 14 Squid 9(42.9%) 4(19.0%) 21 to severe failure damage), including 37 new LC errors MySQL 6(14.0%) 6(14.0%) 43 that have not been exposed before. Our results show that Table 4: Number of configuration parameters that do not have any HECK PC can detect 75+% of these real-world LC errors initial checking code (“missing”) and that only have partialcheck- at the system’s startup time. Compared with the existing ing and thus cannot detect all potential errors (“incomplet15 e”). detection tools, it can detect 31% more LC errors. parameter setting at the system’s initialization phase2 (if 2UnderstandingRootCausesofLatent any) and the code that later uses the parameter’s value. Configuration Errors Then, we compare these two sets of code (checking ver- sus usage) and examineif the initial checking is sufficient To understand the root causes and characteristics of LC to detect configuration errors. If an error can escape from errors, we study the practices of the configuration check- the initialization phase and break the usage code, it is a ing and error detection in six mature, widely-deployed potential LC error. open-source software systems (c.f., Table 3). They cover We verify each LC error discovered from source code multiple functionalities and languages, and include both by exposing and observing the impact of the error. We single-machine and distributed systems. first inject the errors into the system’s configuration files We focus on configuration parameters used in compo- and launch the system; then we trigger the manifestation nents related to the system’s Reliability, Availability, and conditions to expose the error impact. For example, to Serviceability (known as RAS for short [50]). For each verify the LC errors in the HDFS auto-failover feature, system considered, we select all the configuration param- we start HDFS with the erroneous fail-over settings, trig- eters of RAS-related features based on the software’s of- ger the fail-over procedure by killing the active NameN- ficial documents, including error handling, fail-over, data ode, and examine if the fail-over can succeed. As all the backup, recovery, error logging and notification, etc. The LC errors are verified through their manifestation, there last column of Table 3 shows the number of the studied is no false positive in the reported numbers. RAS parameters. Compared with configurations of other system components, configurations used by RAS com- 2.2 Findings ponents are more likely to be subject to LC errors due to their inherently latent nature; moreover, the impact of Finding 1: Many (14.0%–93.2%) of the studied RAS errors in RAS configurations is usually more severe. parameters do not have any special code for checking Note: LC errors are not limited to RAS components. the correctness of their settings. Instead, the correctness Thus, the reported numbers may not represent the overall is verified (implicitly) when the parameters’ values are statistics of all the LC errors in the studied systems. In actually used in operations such as a file open call. addition, PCHECK,thetoolpresentedin§3,appliestoall Table 4 shows the number of the studied RAS parame- the configuration parameters; it does not require manual ters that rely on the usage code for verifying correctness, efforts to select out RAS parameters. because their initial checks are either missing or incom- plete.MostofthestudiedRASparametersinHDFS, 2.1 Methodology YARN, and HBase do not have any special code for checking the correctness of their settings. These systems We manuallyinspect the sourcecoderelated to RAS con- figuration parameters of the studied systems. First, for 2Asystem’sinitializationphaseisdefinedfromitsentrypoint to the each RAS parameter, we study the code that checks the point it starts to serve user requests or workloads.

622 12th USENIX Symposium on Operating Systems Design and Implementation USENIX Association Reliability Testing

If the program's objectives contain specific statements about reliability, specific reliability tests might be devised

For example, medical monitoring software must perform for 100 days without need to restart

This is often difficult to test in the short run May be possible to simulate long periods of use

16 2/7/2017 unit tests - Asking for help with this Therac 25 bugged code. I don't understand the explanation - Software Quality Assurance & Testing Stack Exchange

Therac 25

Then I got stuck! 17

The Magnet process is to setup all magnets and the entire process takes about 8secs. The BUG is that the PTime clears the magnet setting flag in its first execution, which is not reset later. The paper indicates, as a result, the PTime process will only check the input change for the first magnet setting because of the bug. But what I don't understand is the line in the red rectangle.

http://sqa.stackexchange.com/questions/9798/asking-for-help-with-this-therac-25-bugged-code-i-dont-understand-the-explanat 4/6 Recovery Testing

Recovery objectives state how the system is to recover from programming errors, hardware failures, and data errors

Programming errors can be purposely injected into a system to determine whether it can recover from them

Hardware failures can be simulated

Data errors such as noise on a communications line or an invalid pointer in a database can be created purposely or simulated to analyze the system's reaction

18

Some types of software systems have complicated installation procedures

Testing the installation procedure is an important part of the system testing process

Acceptance Testing Usually is performed by the program's customer or end user and normally is not considered the responsibility of the development organization

But, development organization should simulate this!

Alpha testing is simulated or actual operational testing by potential users/customers or an independent test team at the developers' site

Beta testing comes after alpha testing Versions of the software, known as beta versions, are released to a limited audience outside of the programming team This audience can include selected end users

19 System Tests

One of the most vital considerations in implementing the system test is determining who should do it

1. Programmers should not perform a system test on their own software

2. Of all the testing phases, this is the one that the organization responsible for developing the programs definitely should not perform

An ideal system test team might be composed of a few professional system test experts (people who spend their lives performing system tests) and a representative end user or two

20 Test Planning and Control

Immense project management challenge in planning, monitoring, and controlling the testing process

Major mistake most often made in planning a testing process is the tacit assumption that no errors will be found

Obvious result of this mistake is that the planned resources (people, calendar time, and computer time) will be grossly underestimated -- a notorious problem in the computing industry

People who will design, write, execute, and verify test cases, and the people who will repair discovered errors, should be identified

Define mechanisms for reporting detected errors, tracking the progress of corrections, and adding the corrections to the system

21 Test Completion Criteria

Criteria must be designed to specify when each testing phase will be judged to be complete. It is unreasonable to expect that all errors will eventually be detected. The two most common criteria are: l. Stop when the scheduled time for testing expires Can satisfy this by doing absolutely nothing!

2. Stop when all the test cases execute without detecting errors -- that is, stop when the test cases are unsuccessful Subconsciously encourages you to write test cases that have a low probability of detecting errors

Since the goal of testing is to find errors, why not make the completion criterion the detection of some predefined number of errors?

— You might state that a module test of a particular module is not complete until three errors have been discovered — Number of errors that exist in typical programs at the time that coding is completed (before a code walkthrough or inspection is employed) is approximately 4 to 8 errors per 100 program statements * This would say that a 2500-line program would contain 100-200 defects

Best practice is to continue testing until discovery of new defects drops significantly

22 Independent Testing

Advantages usually noted are...

1. Increased motivation in the testing process

2. Healthy competition with the development organization

3. Removal of the testing process from under the management control of the development organization

4. Advantages of specialized knowledge that independent testers bring to bear on the problem

23