JavaScript: The (Un)covered Parts

Amin Milani Fard Ali Mesbah University of British Columbia University of British Columbia Vancouver, BC, Canada Vancouver, BC, Canada [email protected] [email protected]

Abstract—Testing JavaScript code is important. JavaScript has (hidden scopes), are considered to be harder to test [1], [2], grown to be among the most popular programming languages [12], [11], [29], [44]. However, there is no evidence that to and it is extensively used to create web applications both on what extent this is true in real-world practice. the client and server. We present the first empirical study of JavaScript tests to characterize their prevalence, quality metrics In this work, we study JavaScript (unit) tests in the wild (e.g. code coverage), and shortcomings. We perform our study from different angles. The results of this study reveal some across a representative corpus of 373 JavaScript projects, with of the shortcomings and difficulties of manual testing, which over 5.4 million lines of JavaScript code. Our results show provide insights on how to improve existing JavaScript test that 22% of the studied subjects do not have test code. About generation tools and techniques. We perform our study across a 40% of projects with JavaScript at client-side do not have a test, while this is only about 3% for the purely server-side representative corpus of 373 popular JavaScript projects, with JavaScript projects. Also tests for server-side code have high over 5.4 million lines of JavaScript code. To the best of our quality (in terms of code coverage, test code ratio, test commit knowledge, this work is the first study on JavaScript tests. The ratio, and average number of assertions per test), while tests for main contributions of our work include: client-side code have moderate to low quality. In general, tests written in Mocha, Tape, Tap, and Nodeunit frameworks have • A large-scale study to investigate the prevalence of high quality and those written without using any framework JavaScript tests in the wild; have low quality. We scrutinize the (un)covered parts of the • A tool, called TESTSCANNER, which statically extracts code under test to find out root causes for the uncovered code. different metrics in our study and is publicly available Our results show that JavaScript tests lack proper coverage for [18]; event-dependent callbacks (36%), asynchronous callbacks (53%), and DOM-related code (63%). We believe that it is worthwhile • An evaluation of the quality of JavaScript tests in terms for the developer and research community to focus on testing of code coverage, average number of assertions per test, techniques and tools to achieve better coverage for difficult to test code ratio, and test commit ratio; cover JavaScript code. • An analysis of the uncovered parts of the code under test Keywords-JavaScript applications; testing; empirical study; to understand which parts are difficult to cover and why. test quality; code coverage. II.METHODOLOGY

I.INTRODUCTION The goal of this work is to study and characterize JavaScript tests in practice. We conduct quantitative and qualitative JavaScript is currently the most widely used program- analyses to address the following research questions: ming language according to a recent survey of more than RQ1: How prevalent are JavaScript tests? 56K developers conducted by Stack Overflow [43], and RQ2: What is the quality of JavaScript tests? also exploration of the programming languages used across RQ3: Which part of the code is mainly uncovered by tests GitHub repositories [22]. JavaScript is extensively used to and why? build responsive modern web applications, and is also used to create desktop and mobile applications, as well as server-side A. Subject Systems network programs. Consequently, testing JavaScript applica- We study 373 popular open source JavaScript projects. 138 tions and modules is important. However, JavaScript is quite of these subject systems are the ones used in a study for challenging to test and analyze due to some of its specific JavaScript callbacks [21] including 86 of the most depended- features. For instance, the complex and dynamic interactions on modules in the repository [15] and 52 JavaScript between JavaScript and the (DOM), repositories from GitHub Showcases1 [13]. Moreover, we makes it hard for developers to test effectively [32], [29], [19]. added 234 JavaScript repositories from Github with over 4000 To assist developers with writing tests, there exist number stars. The complete list of these subjects and our analysis of JavaScript frameworks, such as Mocha [6], results, are available for download [18]. We believe that Jasmine[4], QUnit [9], and Nodeunit [8], each having its own this corpus of 373 projects is representative of real-world advantages [10]. Also the research community have proposed JavaScript projects as they differ in domain (category), size some automated testing tools and test generation techniques (SLOC), maturity (number of commits and contributors), and for JavaScript programs [33], [29], [32], [19], [23], though popularity (number of stars and watchers). they are not considerably used by testers and developers yet. Some JavaScript features, such as DOM interactions, event- 1GitHub Showcases include popular and trending open source repositories dependent callbacks, asynchronous callbacks, and closures organized around different topics. TABLE I: Our JavaScript subject systems (60K files, 3.7 M production SLOC, 1.7 M test SLOC, and 100K test cases). ID Category # Subject Ave # Ave Prod Ave Test Ave # Ave # Ave # systems JS files SLOC SLOC tests assertions stars C1 UI Components, Widgets, and Frameworks 52 41 4.7K 2.8K 235 641 9.8K C2 Visualization, Graphics, and Animation Libraries 48 53 10.2K 3.8K 425 926 7.5K C3 Web Applications and Games 33 61 10.6K 1.4K 61 119 4K C4 Software Development Tools 29 67 12.7K 7.8K 227 578 6.9K C5 Web and Mobile App Design and Frameworks 25 91 22.3K 6.9K 277 850 14.4K C6 Parsers, Code Editors, and Compilers 22 167 27K 9.5K 701 1142 5.5K C7 Editors, String Processors, and Templating Engines 19 26 4.3K 1.9K 102 221 6.5K C8 Touch, Drag&Drop, Sliders, and Galleries 19 10 1.9K 408 52 72 7.9K C9 Other Tools and Libraries 17 93 9.1K 7.6K 180 453 8.5K C10 Network, Communication, and Async Utilities 16 19 4.1K 7.6K 279 354 7.6K C11 Game Engines and Frameworks 13 86 17K 1.2K 115 293 3.5K C12 I/O, Stream, and Keyboard Utilities 13 8 0.6K 1K 40 61 1.5K C13 Package Managers, Build Utilities, and Loaders 11 47 3.4K 5.4K 200 300 8.5K C14 Storage Tools and Libraries 10 19 4K 7K 222 317 5.5K C15 Testing Frameworks and Libraries 10 28 2.8K 3.6K 271 632 5.7K C16 Browser and DOM Utilities 9 45 5.6K 7.1K 76 179 5.2K C17 Command-line Interface and Shell Tools 9 9 2.8K 1K 26 244 2.6K C18 Multimedia Utilities 9 11 1.6K 760 17 97 6.2K C19 MVC Frameworks 9 174 40.1K 15.2K 657 1401 14.2K Client-side 128 39 8.2K 3.2K 343 798 7.9K Server-side 130 63 9.4K 7.2K 231 505 6.7K Client and server-side 115 73 12.7K 4.7K 221 402 7.4K Total 373 57 10.1K 4.5K 263 644 7.3K

We categorize our subjects into 19 categories using topics Client side Server side Client and server side 100% from JSter JavaScript Libraries Catalog [14] and GitHub 90% Showcases [13] for the same or similar projects. Table I 80% 70% presents these categories with average values for the number 60% of JavaScript files (production code), source lines of code 50% 40% (SLOC) for production and test code, number of test cases, 30% 20% and number of stars in Github repository for each category. 10% We used SLOC [17] to count lines of source code excluding 0% C7 C8 C9 C1 C2 C3 C4 C5 C6 C10 C17 C18 C19 C11 C12 C13 C14 C15 C16 libraries. Overall, we study over 5.4 million (3.7 M production Total and 1.7 M test) source lines of JavaScript code. Fig. 1: Distribution of studied subject systems. Figure 1 depicts the distribution of our subject systems with respect to the client or server side code. Those systems that contain server-side components are written in Node.js2, a B. Analysis popular server-side JavaScript framework. We apply the same To address our research questions, we statically and dynam- categorization approach as explained in [21]. Some projects ically analyze test suites of our subject programs. To extract such as MVC frameworks, e.g. Angular, are purely client-side, some of the metrics in our study, we develop a static analyzer while most NPM modules are purely server-side. We assume tool, called TESTSCANNER [18], which parses production and that client-side code is stored in directories such as www, test code into an abstract syntax tree using Mozilla Rhino [7]. public, static, or client. We also use code annotations such In the rest of this section we explain details of our analysis as /* browser:true, :true */ to for each research question. identify client-side code. 1) Prevalence of tests (RQ1): To answer RQ1, we look The 373 studied projects include 128 client-side, 130 server- for presence of JavaScript tests written in any framework (e.g. side, and 115 client&server-side code. While distributions Mocha, Jasmine, or QUnit). Tests are usually located at folders in total have almost the same size, they differ per project namely tests, specs3, or similar names. category. For instance subject systems in categories C1 (UI We further investigate the prevalence of JavaScript tests components), C2 (visualization), C8 (touch and drag&drop), with respect to subject categories, client/server-side code, C19 (MVC frameworks), and C18 (multimedia) are mainly popularity (number of stars and watchers), maturity (num- client-side and those in categories C4 (software dev tools), C6 ber of commits and contributors), project size (produc- (parsers and compilers), C12 (I/O), C13 (package and build tion SLOC), and testing frameworks. To distinguish testing managers), C14 (storage), C16 (browser utils), and C17 (CLI frameworks, we analyze package management files (such and shell) are mainly server-side. as package.), task runner and build files (such as grunt.js and gulpfile.js), and test files themselves.

3For instance Jasmine and Mocha tests are written as specs and are usually 2https://nodejs.org located at folders with similar names. 2) Quality of tests (RQ2): To address RQ2, for each subject not executed to be invoked, or (3) f is set to a variable with test we compute four quality metrics as following: and that variable was never used or its usage was not Code coverage. Coverage is generally known as an indicator executed. of test quality. We compute statement, branch, and function 2) s belongs to a covered function f, where coverage for JavaScript code using JSCover [5] (for tests that a) the execution of f was terminated, by a return run in the browser), and Istanbul [3]. To calculate coverage statement or an exception, prior to reaching s. of the minified JavaScript code, we beautify them prior to b) s falls in a never met condition in f (e.g. browser or executing tests. We also exclude dependencies, such as files DOM dependent statements). under the node_modules directory, and libraries (unless the 3) The test case responsible for covering s was not executed subject system is itself a library). due to a test execution failure. Average number of assertions per test. Code coverage 4) s is a dead (unreachable) code. does not directly imply a test suite effectiveness [24], while Uncovered statement in uncovered function ratio. If an assertions have been shown to be strongly correlated with it uncovered statement s belongs to an uncovered function f, [49]. Thus, TESTSCANNER also computes average number making f called could possibly cover s as well. This is of assertions per test case as a test suite quality metric. Our important specially if f needs to be called in a particular way, analysis tools detects usage of well-known assertion libraries such as through triggering an event. such as assert.js, should.js, expect.js, and chai. In this regard, our tool uses coverage report information (in Test code ratio. This metric is defined as the ratio of test json or lcov format) to calculate the ratio of the uncovered SLOC to production and test SLOC. A program with a high statements that fall within uncovered functions over the total test code ratio may have a higher quality test suite. number of uncovered statements. If this value is large it Test commit ratio. This metric is the ratio of test commits indicates that the majority of uncovered statements belong to total commits. Higher test commit ratio may indicate more to uncovered functions, and thus code coverage could be mature and higher quality tests. We assume that every commit increased to a high extent if the enclosing function is called that touches at least one file in a folder named test, tests, spec, by a test case. or specs is a test commit. In rare cases that tests are stored Hard-to-test JavaScript code. Some JavaScript features, elsewhere, such as the root folder, we manually extract number such as DOM interactions, event-dependent callbacks, asyn- of test commits by looking at its Github repository page and chronous callbacks, and closures (hidden scopes), are con- counting commits on test files. sidered to be harder to test [1], [2], [12], [11], [29], [44]. We investigate these quality metrics with respect to subject In this section we explain four main hard-to-test code with categories, client/server-side code, and testing frameworks. an example code snippet depicted in Figure 2. Also we fine- 3) (Un)covered code (RQ3): Code coverage is a widely grain statement and function coverage metrics to investigate accepted test quality indicator, thus finding the root cause of these hard-to-test code separately in detail. To measure these why a particular statement is not covered by a test suite, can coverage metrics, TESTSCANNER maps a given coverage help in writing higher quality tests. Some generic possible report to the locations of hard-to-test code. cases for an uncovered (missed) statement s, are as following: DOM related code coverage. In order to unit test a JavaScript 1) s belongs to an uncovered function f, where code with DOM read/write operations, a DOM instance has to a) f has no calling site in both the production and the be provided as a test fixture in the exact structure expected by test code. In this case, f could be (1) a callback the code under test. Otherwise, the test case can terminate function sent to a callback-accepting function (e.g., prematurely due to a null exception. Writing such DOM setTimeout()) that was never invoked, or (2) an based fixtures can be challenging due to the dynamic nature unused utility function that was meant to be used in of JavaScript and the hierarchical structure of the DOM previous or future releases. Such unused code can be [29]. For example, to cover the if branch at line 29 in considered as code smells [28]. Consequently we can- Figure 2, one needs to provide a DOM instance such as not pinpoint such an uncovered function to a particular

. To cover the else branch, a b) the calling site for f in the production code was never DOM instance such as
is required. If (1) f is used as a callback (e.g. event-dependent or such fixtures are not provided, $("checkList") returns asynchronous) that was never invoked, (2) the call to null as the expected element is not available, and thus f statement was never reached because of an earlier checkList.children causes a null exception and the test return statement or an exception, or the function case terminates. call falls in a never met condition branch. DOM related code coverage is defined as the fraction c) f is an anonymous function. Possible reasons that f of number of covered over total number of DOM related was not covered can be that (1) f is used as a call- statements. A DOM related statement is a statement that back that was never invoked (e.g. an event-dependent can affect or be affected by DOM interactions such as callback while the required event was not triggered, or a DOM API usage. To detect DOM related statements an asynchronous callback while did not wait for the TESTSCANNER extracts all DOM API usages in the code (e.g. response), (2) f is a self-invoking function that was getElementById, createElement, appendChild, 1 function setFontSize(size){ backs, our tool checks if a callback function is an event 2 return function() { 3 // this is an anonymous closure method such as bind, click, focus, hover, keypress, 4 document.body.style.fontSize= size+'px'; emit, addEventListener, onclick, onmouseover, 5 }; 6 } and onload. 7 var small= setFontSize(12); 8 var large= setFontSize(16); Asynchronous callback coverage. Callbacks are functions 9 ... passed as an argument to another function to be invoked either 10 function showMsg() { 11 // this is an async callback immediately (synchronous) or at some point in the future 12 alert("Some message goes here!"); (asynchronous) after the enclosing function returns. Callbacks 13 } 14 ... are particularly useful to perform non-blocking operations. 15 $("#smallBtn").on("click", small); Function showMsg in lines 10–13 is an asynchronous call- 16 $("#largeBtn").on("click", large); 17 $("#showBtn").on("click", function() { back function as it was passed to the setTimeout() asyn- 18 // this is an event-dependent anonymous callback chronous API call. Testing asynchronous callbacks requires 19 setTimeout(showMsg, 2000); 20 $("#photo").fadeIn("slow", function() { waiting until the callback is called, otherwise the test would 21 // this is an anonymous callback probably finish unsuccessfully before the callback is invoked. 22 alert("Photo animation complete!"); 23 }); For instance QUnit’s asyncTest allows tests to wait for 24 }); asynchronous callbacks to be called. 25 ... 26 checkList=$("#checkList"); Asynchronous callback coverage is defined as the fraction of 27 checkList.children("input").each(function () { number of covered over total number of asynchronous callback 28 // this is an DOM-related code 29 if (this.is(':checked')) { functions. Similar to a study of callbacks in JavaScript [21], if 30 ... a callback argument is passed into a known deferring API call 31 }else{ 32 ... we count it as as an asynchronous callback. TESTSCANNER 33 } detects some asynchronous APIs including network calls (e.g. 34 }); XMLHTTPRequest.open), DOM events (e.g. onclick), timers (setImmediate, setTimeout, setInterval, Fig. 2: A hard to test JavaScript code snippet. and process.nextTick), and I/O (e.g. APIs of fs, http, and net). Closure function coverage. Closures are nested functions that addEventListener, $, and innerHTML) and their for- make it possible to create hidden scope to privatize variables ward slices. Forward slicing is applied on the variables that and functions from the global scope in JavaScript. A closure were assigned with a DOM element/attribute. For example function, i.e., the inner function, has access to all parameters the forward slice of checkList at line 26 in Figure 2 and variables – except for this and argument variables – are lines 27–34. A DOM API could be located in a (1) of the outer function, even after the outer function has returned return statement of a function f, (2) conditional statement, [20]. The anonymous function in lines 2–5 is an instance of a (3) function call (as an argument), (4) an assignment statement, closure. or (5) other parts within a scope. In case (1), all statements Such hidden functions cannot be called directly in a test case that has a call to f are considered DOM related. In case and thus testing them is challenging. In fact writing a unit test (2), the whole conditional statements (condition and the body for a closure function without code modification is impossible. of condition) are considered DOM related. In case (3) the Simple solutions such as making them public or putting the statements in the called function, which use that DOM input test code inside the closure are not good software engineering will be considered DOM related. In other cases, the statement practices. One approach to test such private functions is adding with DOM API is DOM related. code inside the closure to store references to its local variables inside objects and return it to the outer scope [2]. Closure Event-dependent callback coverage. The execution of some function coverage is defined as the fraction of number of JavaScript code may require triggering an event such as covered over total number of closure functions. clicking on a particular DOM element. For instance it is very common in JavaScript client-side code to have an (anonymous) Average number of function calls per test. Some code func- function bound to an element’s event, e.g. a click, which has tionalities depend on the execution of a sequence of function to be simulated. The anonymous function in lines 17–24 is calls. For instance in a shopping application, one needs to add an event-dependent callback function. Such callback functions items to the cart prior to check out. We perform a correlation would only be passed and invoked if the corresponding event analysis between average number of unique function calls per is triggered. In order to trigger an event, testers can use meth- test and code coverage. We also investigate whether JavaScript ods such as jQuery’s .trigger(event, data, ...) unit tests are mostly written at single function level or they or .emit(event, data, ...) of Node.js EventEmitter. execute sequence of function calls. Note that if an event needs to be triggered on a DOM element, III.RESULTS a proper fixture is required otherwise the callback function cannot be executed. A. Prevalence of Tests (RQ1) Event-dependent callback coverage is defined as the fraction The stacked bar charts in Figure 3(a) depicts the percentage of number of covered over total number of event-dependent of JavaScript tests, per system category (Table I), per clien- callback functions. In order to detect event-dependent call- t/server side, and in aggregate. The height of each bar indicates Nodeunit Vows 3% Others 3% 4% Mocha No tests Jasmine QUnits Other frameworks Its own tests Tape 4% 100% Tap 90% 5% Mocha 80% 38% 70% Own tests 60% 6% 50% 40% 30% 20% 10% QUnits 0% 18% C7 C8 C9 C1 C2 C3 C4 C5 C6 C10 C17 C18 C19 C11 C12 C13 C14 C15 C16 Total Client Server Jasmine 19% Client-Server (a) Distribution within all subjects. (b) Testing frameworks distribution.

Fig. 3: Distribution of JavaScript tests.

100% 100% 100% 100%

80% 80% 80% 80%

60% 60% 60% 60%

40% 40% 40% 40%

20% 20% 20% 20%

0% 0% 0% 0% 1-4K 4K-5.6K 5.6K-8.9K 8.9K-92K 1-151 151-262 262-444 444-6K 1-251 254-701 710-1.8K 1.8K-27.6K 1-19 19-46 47-102 102-1.4K (a) Number of stars (b) Number of watchers (c) Number of commits (d) Number of contributors

Fig. 4: Percentage of subjects with test per each quartile with respect to popularity (number of stars and watchers) and maturity (number of commits and contributors). the percentage of subjects in that category. In total, among Almost all (95%) of purely server-side JavaScript projects the 373 studied subjects, 83 of them (i.e., 22%) do not have have tests, while this is 61% for client-side and 76% for JavaScript tests. The majority (78%) of subjects have at least client&server-side ones. Note that the number of subjects in one test case. each category are not very different (i.e., 128 client-side, 130 Finding 1: 22% of the subject systems that we studied do server-side, and 115 client and server-side code). Interestingly not have any JavaScript test, and 78% have at least one test the distribution of test frameworks looks very similar for case. client-side and client-server side projects. As shown in figure 3(b), amongst subjects with test, the As shown in Figure 3(a), all subjects systems in categories majority of tests are written in Mocha (38%), Jasmine (19%), C6 (parsers and compilers), C12 (I/O), C13 (package and and QUnit (18%). 6% does not follow any particular frame- build managers), C14 (storage), C19 (MVC frameworks), and work and have their own tests. Minor used frameworks are C17(CLI and shell), have JavaScript unit tests. Projects in Tap (5%), Tape (4%), Nodeunit (3%), Vows (3%) and others all of these categories, except for C19, are mainly server- (4%) including Jest, Evidence.js, Doh, CasperJS, Ava, UTest, side as depicted in Figure 1. In contrast, many of subjects in TAD, and Lab. We also observe that 3 repositories have categories C1 (UI components), C3 (web apps), C8 (touch and tests written in two testing frameworks: 2 projects (server drag&drop), and C18 (multimedia) do not have tests, which and client-server) with Nodeunit+Mocha test, and one (client- are mainly client-side. Thus we can deduce that JavaScript server) with Jasmine+QUnit test. tests are written more for server-side code than client-side, or client and server-side code. Finding 2: The most prevalent used test frameworks for JavaScript unit testing are Mocha (38%), Jasmine (19%), Finding 4: While almost all subjects (95%) in the server- and QUnit (18%). side category have tests, about 40% of subjects in client-side and client-server side categories do not have tests. We also investigate the prevalence of UI tests and observe that only 12 projects (i.e., 3%) among all 373 ones have UI tests for which 9 are written using Webdriverio and Selenium We believe the more prevalence of tests for server-side code webdriver, and 3 uses CasperJS. 7 of these projects are client can be attributed to (1) the difficulties in testing client-side and server side, 3 are client-side, and 2 are server-side. One code, such as writing proper DOM fixtures or triggering events of these subjects does not have any JavaScript test. on DOM elements, and (2) using time-saving test scripts for most Node.js based projects, such as npm test that is Finding 3: Only 3% of the studied repositories have func- included by default when initializing a new package.json tional UI tests. file. This pattern is advocated in the Node.js community [16] 100 100 100 80 80 80 60 60 60 40 40 40 Coverage (%) Coverage (%) Coverage Coverage (%) Coverage 20 20 20 0 0 0

Client Server Client-Server Total Client Server Client-Server Total Client Server Client-Server Total (a) Statement coverage (b) Branch coverage (c) Function coverage

Fig. 5: Boxplots of the code coverage of the executed JavaScript tests. Mean values are shown with (*).

and thus many server-side JavaScript code, such as NPM 6 modules, have test code. 5

We also consider how popularity (number of stars and 4

watchers) and maturity (number of commits and contributors) 3

of subject systems are related to the prevalence of unit tests. 2 Figure 4(a) shows the percentage of subjects with tests in each quartile. As popularity and maturity increase, the percentage 1 Ave # assertions per test # assertions Ave of subjects with test increases as well. 0

Client Server Client-Server Total B. Quality of Tests (RQ2) Fig. 6: Average number of assertions per test. Code coverage. Calculating the code coverage requires ex- ecuting tests on a properly deployed project. In our study, however, we faced number of projects with failure in build/de- Table II also depicts the achieved coverage per testing ployment or running tests. We tried to resolve such prob- framework. Tests written in Tape, Tap, and Mocha have lems by quick changes in build/task configuration files or generally higher code coverage. The majority of server-side by retrieving a later version (i.e., some days after fetching JavaScript projects are tested using these frameworks. On the the previous release). In most cases build failure was due to other hand, tests written in QUnit, which is used more often errors in dependent packages or their absence. We could finally for the client-side than the server-side, has generally lower calculate coverage for 231 out of 290 (about 80%) subjects code coverage. Developers that used their own style of testing with tests. We could not properly deploy or run tests for 44 without using popular frameworks write tests with the poorest subject systems (41 with test run failure, freeze, or break, and coverage. 3 build and deployment error), and could not get coverage Finding 6: Tests written in Tape, Tap, and Mocha frame- report for 15 projects with complex test configurations. works, generally have higher coverage compared to those Boxplots in Figure 5 show that in total tests have a median written in QUnit, Nodeunit, and those without using any test of 83% statement coverage, 84% function coverage, and framework. 69% branch coverage. Tests for server-side code have higher coverage in all aspects compared to those for client-side code. Average number of assertions per test. Figure 6 depicts We narrow down our coverage analysis into different subject boxplots of average number of assertions per test case. While categories. As depicted in Table II, subjects in categories C6 median values are very similar (about 2.2) for all cases, (parsers and compilers), C10 (Network and Async), C12 (I/O), server-side code has a slightly higher mean value (3.16) C13 (package and build managers), C14 (storage), C15 (testing compared to client-side (2.71). As shown in Table II, subjects frameworks), and C19 (MVC frameworks) on average have in categories C3 (web apps), C11 (game engines), C15 (testing higher code coverage. Projects in these categories are mainly frameworks), C17 (CLI and shell), C18 (multimedia), and C19 server-side. In contrast, subjects in categories C2 (visualiza- (MVC frameworks) on average have higher average number tion), C3 (web apps), C8 (touch and drag&drop), C11 (game of assertions per test compared to others. Interestingly among engines), C17 (CLI and shell), and C18 (multimedia), have these categories only for C15 and C19 code coverage is also lower code coverage. Note that subjects under these categories high while it is low for the rest. are mainly client-side. Finding 7: The studied test suites have a median of 2.19 Finding 5: The studied JavaScript tests have a median of and a mean of 2.96 for the average number of assertions 83% statement coverage, 84% function coverage, and 69% per test. These values do not differ much among server-side branch coverage. Tests for server-side code have higher and client-side code. coverage in all aspects compared to those for client-side code. Also results shown in Table II indicate that tests written in QUnit, Tape, Nodeunit, other frameworks (e.g. Jest, CasperJS, TABLE II: Test quality metrics average values. Statement Branch Function Ave # Test Test coverage coverage coverage assertions code commit per test ratio ratio C1 77% 57% 76% 2.83 0.41 0.16 C2 67% 52% 65% 2.72 0.28 0.14 C3 60% 38% 58% 3.75 0.88 0.14 C4 79% 68% 78% 2.50 0.58 0.24 C5 75% 63% 75% 2.53 0.52 0.21 C6 87% 79% 88% 2.53 0.47 0.24 C7 80% 67% 72% 2.51 0.46 0.22 C8 64% 47% 60% 2.04 0.35 0.12 C9 73% 58% 69% 2.67 0.49 0.23 C10 91% 79% 90% 2.73 0.72 0.24 C11 64% 45% 57% 3.41 0.18 0.11 C12 90% 77% 89% 2.36 0.59 0.20

Subject category C13 86% 67% 84% 2.27 0.59 0.18 C14 88% 77% 87% 2.74 0.62 0.26 C15 81% 69% 79% 5.79 0.59 0.25 C16 78% 67% 79% 1.67 0.49 0.29 C17 67% 54% 63% 8.32 0.47 0.21 C18 60% 31% 62% 4.42 0.31 0.16 C19 81% 67% 80% 3.58 0.53 0.21 Mocha 82% 70% 79% 2.39 0.49 0.20 Jasmine 74% 60% 75% 1.93 0.41 0.21 QUnit 71% 54% 71% 3.93 0.41 0.16 Own test 61% 41% 58% 5.99 0.30 0.16 Tap 89% 80% 89% 1.56 0.58 0.21 Tape 93% 81% 94% 2.93 0.70 0.18 Others 80% 65% 77% 5.60 0.46 0.24

Testing framework Nodeunit 74% 63% 72% 6.20 0.57 0.24 Vows 74% 66% 72% 1.92 0.55 0.27 Client 70% 53% 70% 2.71 0.36 0.16 Server 85% 74% 83% 3.16 0.58 0.23 C&S 72% 56% 70% 2.9 0.4 0.18 Total 78% 64% 76% 2.96 0.46 0.2

and UTest), and those without using a framework, have on 1.0 average more assertions per test. The majority of server-side 0.8

JavaScript projects are tested using these frameworks. Again 0.6 we observe that only for tests written in Tape framework code 0.4 coverage is also high while it is low for the rest. 0.2 Test code ratio. Figure 7 shows test to total (production and Test code ratio test) code ratio comparison. The median and mean of this 0.0 ratio is about 0.6 for server-side projects and about 0.35 for Client Server Client-Server Total client-side ones. As shown in Table II, on average subjects Fig. 7: Test to total code ratio. with higher test code ratio belongs to categories C3, C4, C5, C10, C12, C13, C14, C15, and C19 while those in C2, C8, C11, and C18 have lower test code ratio. Also tests written 0.5 in Tap, Tape, Nodeunit, and Vows have higher test code ratio 0.4

while tests written without using any framework have lower 0.3 test code ratio. 0.2 We further study the relationship between test code ratio and 0.1 total code coverage (average of statement, branch, and function Test commit ratio coverage) through the Spearman’s correlation analysis4. The 0.0 result shows that there exists a moderate to strong correlation Client Server Client-Server Total (ρ = 0.68, p = 0) between test code ratio and code coverage. Fig. 8: Test to total commits ratio. Finding 8: Tests for server-side code have higher test code ratio (median and mean of about 0.6) compared to client- side code (median and mean of about 0.35). Also there exists a moderate to strong correlation (ρ = 0.68, p = 0) between Test commit ratio. Figure 8 depicts test to total (production test code ratio and code coverage. and test) commit ratio comparison. The median and mean of this ratio is about 0.25 for server-side projects and about 0.15 for client-side ones. As shown in Table II, on average subjects 4The non-parametric Spearman’s correlation coefficient measures the mono- tonic relationship between two continuous random variables and does not with higher test commit ratio belongs to categories C4, C6, require the data to be normally distributed. C9, C10, C14, C15, and C16 while those in C1, C2, C3, C8, TABLE III: Statistics for analyzing uncovered code. The "–" sign indicates no instance of a particular code. Function coverage Statement coverage All Async Event Closure All DOM Ave # USUF callback dependent related func calls ratio callback per test C1 76% 65% 33% 79% 77% 73% 2.91 0.59 C2 65% 43% 17% 62% 67% 61% 2.82 0.73 C3 58% 21% 10% 38% 60% 27% 3.94 0.82 C4 79% 49% 48% 70% 81% 75% 2.89 0.53 C5 75% 52% 33% 65% 75% 62% 3.05 0.72 C6 88% 60% 32% 87% 87% 57% 3.30 0.33 C7 72% 34% 28% 81% 80% 52% 3.11 0.52 C8 60% 40% 39% 80% 64% 78% 2.18 0.77 C9 69% 14% 8% 80% 73% 23% 2.89 0.59 C10 90% 65% 60% 95% 91% 81% 4.98 0.5 C11 57% 33% 7% 68% 64% 51% 2.79 0.85 C12 89% 85% 68% 98% 90% 85% 3.56 0.32

Subject category C13 84% 71% 74% 60% 86% 85% 2.86 0.49 C14 87% 66% 36% 89% 88% – 2.98 0.62 C15 79% 70% 39% 62% 81% 58% 2.16 0.59 C16 79% 40% 5% 43% 78% 48% 2.69 0.39 C17 63% 7% 5% 56% 67% – 2.42 0.65 C18 62% – 0% 89% 60% 40% 2.19 0.86 C19 81% 61% 47% 76% 82% 62% 2.92 0.53 Mocha 79% 50% 34% 71% 82% 58% 3.62 0.56 Jasmine 75% 65% 34% 69% 74% 62% 2.28 0.71 QUnit 71% 53% 28% 76% 71% 68% 3.35 0.66 Own test 58% 45% 26% 66% 61% 51% 1.78 0.63 Tap 89% 68% 87% 94% 89% – 2.52 0.24 Tape 94% 79% 65% 92% 93% 88% 3.19 0.22 Others 77% 33% 30% 66% 80% 79% 2.14 0.48

Testing framework Nodeunit 72% 53% 63% 74% 74% 52% 4.08 0.62 Vows 72% 60% 38% 79% 74% 0% 1.60 0.6 Client 70% 46% 25% 69% 70% 66% 2.96 0.68 Server 83% 64% 48% 82% 85% 67% 3.19 0.45 C&S 70% 48% 29% 69% 72% 57% 2.93 0.69 Total 76% 53% 36% 74% 78% 63% 3.05 0.57

C11, and C18 have lower test commit ratio. Also tests written f, making f called could possibly cover c as well. As in Nodeunit, Vows, and other frameworks (e.g. Jest, CasperJS, described in Section II-B3, we calculate the ratio of uncovered and UTest) have higher test commit ratio while tests written in statements that fall within uncovered functions over the total QUnit or without using any framework have lower test commit number of uncovered statements. ratio. Table III shows average values for this ratio (USUF). The Similar to the correlation analysis for test code ratio, we mean value of USUF ratio is 0.57 in total, 0.45 for server- study the relationship between test commit ratio and total code side projects, and about 0.7 for client-side ones. This indicate coverage. The result indicates that there exists a moderate to that the majority of uncovered statements in client-side code low correlation (ρ = 0.49, p = 0) between test commit ratio belong to uncovered functions, and thus code coverage could and code coverage. be increased to a high extent if the enclosing function could Finding 9: While test commit ratio is relatively high for be called during test execution. server-side projects (median and mean of about 0.25), it is Finding 10: A large portion of uncovered statements fall moderate in total and relatively low for client-side projects in uncovered functions for client-side code (about 70%) (median and mean of about 0.15). Also there exists a compared to server-side code (45%). moderate to low correlation (ρ = 0.49, p = 0) between test commit ratio and code coverage. Hard-to-test-function coverage. We measure coverage for hard-to-test functions as defined in Section II-B3. While the average function coverage in total is 76%, the average C. (Un)covered Code (RQ3) event-dependent callback coverage is 36% and the average As explained earlier in Section II-B3, one possible root asynchronous callback coverage is 53%. The average value of cause for uncovered code is that the responsible test code closure function coverage in total is 74% and for server-side was not executed. In our evaluation, however, we observed subjects is 82% while it is 69% for client-side ones. that for almost all the studied subjects, test code had very Finding 11: On average, JavaScript tests have low coverage high coverage meaning that almost all statements in test code for event-dependent callbacks (36%) and asynchronous call- were executed properly. Thus the test code coverage does not backs (53%). Average values for client-side code are even contribute in the low coverage of production code. worse (25% and 46% respectively). The average, closure Uncovered statement in uncovered function (USUF) ratio. function coverage is 74%. If an uncovered code c belongs to an uncovered function We measure the impact of tests with event triggering meth- Node.js community [16]. To assist developers with testing ods on event-dependent callback coverage, and writing async their JavaScript code, we believe that it is worthwhile for the tests on asynchronous callback coverage through correlation research community to invest on developing test generation analysis. The results show that there exists a weak correlation techniques in particular for the client-side code, such as [33], (ρ = 0.22) between number of event triggers and event- [29], [32]. dependent callback coverage, and a very weak correlation (ρ = For RQ2, the results indicate that in general, tests written 0.1) between number of asynchronous tests and asynchronous for mainly client-side subjects in categories C2 (visualization), callback coverage. C8 (touch and drag&drop), C11 (game engines), and C18 Finding 12: There is no strong correlation between number (multimedia) have lower quality. Compared to the client-side of event triggers and event-dependent callback coverage. projects, tests written for the server-side have higher quality Also number of asynchronous tests and asynchronous call- in terms of code coverage, test code ratio, and test commit back coverage are not strongly correlated. ratio. The branch coverage in particular for client-side code is low, which can be ascribed to the challenges in writing tests This was contrary to our expectation for higher correlations, for DOM related branches. We investigate reasons behind the however, we observed that in some cases asynchronous tests code coverage difference in Section III-C. The higher values and tests that trigger events were written to merely target spe- for test code ratio and test commit ratio can also be due to the cific parts and functionalities of the production code without fact that writing tests for server-side code is easier compared covering most asynchronous or event-dependent callbacks. to client-side. DOM related code coverage. On average, JavaScript tests Developers and testers could possibly increase code cover- have a moderately low coverage of 63% for DOM-related age of their tests by using existing JavaScript test generator code. We also study the relationship of existence of DOM tools, such as Kudzu [41], ARTEMIS [19], JALANGI [42], fixtures and DOM related code coverage through correlation SymJS [27], JSEFT [32], and CONFIX [29]. Tests written in analysis. The result shows that there exists a correlation of ρ Mocha, Tap, Tape, and Nodeunit generally have higher test = 0.4, p = 0 between having DOM fixtures in tests and DOM quality compared to other frameworks and tests that do not related code coverage. Similar to the cases for event-dependent use any testing framework. In fact developers that do not write and async callbacks, we also observed that DOM fixtures were their test by leveraging an existing testing framework write mainly written for executing a subset of DOM related code. low quality tests almost in all aspects. Thus we recommend JavaScript developers community to use a well-maintained and Finding 13: On average, JavaScript tests lack proper cov- mature testing framework to write their tests. erage for DOM-related code (63%). Also there exists a As far as RQ3 is concerned, our study shows that JavaScript moderately low correlation (ρ = 0.4) between having DOM tests lack proper coverage for event-dependent callbacks, asyn- fixtures in tests and DOM related code coverage. chronous callbacks, and DOM-related code. Since these parts of code are hard to test they can be error prone and thus Average number of function calls per test. As explained requires effective targeted tests. For instance a recent empirical in Section II-B3, we investigate number of unique function study [36] reveals that the majority of reported JavaScript bugs calls per test. The average number of function calls per test and the highest impact faults are DOM-related. has a mean value of about 3 in total and also across server- It is expected that using event triggering methods in tests, in- side and client-side code. We further perform a correlation crease coverage for event-dependent callbacks, asynchronous analysis between the average number of function calls per test callbacks, and DOM-related statements. However, our results and total code coverage. The result shows that there exists a do not show a strong correlation to support this. Our manual weak correlation (ρ = 0.13, p = 0) between average number analysis revealed that tests with event triggering methods, of function calls per test and code coverage. async behaviours, and DOM fixtures are mainly written to Finding 14: On average, there are about 3 function calls cover only particular instances of event-dependent callbacks, to production code per test case. The average number of asynchronous callbacks, or DOM-related code. This again can function calls per test is not strongly correlated with code imply difficulties in writing tests with high coverage for such coverage. hard-to-test code. We believe that there is a research potential in this regard D. Discussion for proposing test generation techniques tailored to such uncovered parts. While most current test generation tools for Implications. Our findings regarding RQ1 indicate that the JavaScript produce tests at single function level, in practice majority (78%) of studied JavaScript projects and in particular developers often write tests that invoke about 3 functions per popular and trending ones have at least one test case. This test on average. It might also worth for researchers to develop indicates that JavaScript testing is getting attention, however, test generation tools that produce tests with a sequence of it seems that developers have less tendency to write tests for function calls per test case. client-side code as they do for the server-side code. Possible Finally, we observed that UI tests are much less prevalent reasons could be difficulties in writing proper DOM fixtures in the studied JavaScript projects. Our investigation of the or triggering events on DOM elements. We also think that coverage report did not show a significant coverage increase the high percentage of test for server-side JavaScript can on the uncovered event-dependent callbacks or DOM-related be ascribed to the testing pattern that is advocated in the code between UI and unit tests. Since UI tests do not need DOM fixture generation, they should be able to trigger more vulnerabilities in JavaScript have also been studied on remote of the UI events, compared to code level unit tests. It would be JavaScript inclusions [35], [47], cross-site scripting (XSS) interesting to further investigate this in JavaScript applications [46], and privacy violating information flows [25]. Milani Fard with large UI tests. et al. [28] studied code smells in JavaScript code. Nguyen et Test effectiveness. Another test quality metric that is interest- al. [34] performed usage patterns mining in JavaScript web ing to investigate is test effectiveness. An ideal effective test applications. suite should fail if there is a defect in the code. Mutation score, Researchers also studied test cases and mining test suites i.e., the percentage of killed mutants over total non-equivalent in the past. Inozemtseva et al. [24] found that code coverage mutants, is often used as an estimate of defect detection does not directly imply the test suite effectiveness. Zhang et capability of a test suite. In fact it has been shown that there al. [49] analyzed test assertions and showed that existence of exists a significant correlation between mutant detection and assertions is strongly correlated with test suite effectiveness. real fault detection [26]. In this work, however, we did not Vahabzadeh et al. [45] studied bugs in test code. Milani Fard consider mutation score as a quality metric as it was too costly et al. proposed Testilizer [30] that mines information from to generate mutants for each subject and execute the tests existing test cases to generate new tests. Zaidman et al. [48] on each of them. We believe that it is worthwhile to study investigated co-evolution of production and test code. the effectiveness of JavaScript tests using mutation testing These work, however, did not study JavaScript tests. Related techniques, such as Mutandis [31], which guides mutation to our work, Mirshokraie et al. [31] presented a JavaScript generation towards parts of the code that are likely to affect mutation testing approach and as part of their evaluation, the program output. This can help to find out which aspects assessed mutation score for test suites of two JavaScript of code are more error-prone and not well-tested. Apart from libraries. To the best of our knowledge, our work is the first test quality evaluation based on mutation score, studying (large scale) study on JavaScript tests and in particular their JavaScript bug reports [37] and investigating bug locations, quality and shortcomings. can give us new insights for developing more effective test V. CONCLUSIONSAND FUTURE WORK generation tools. JavaScript is heavily used to build responsive client-side Threats to validity. With respect to reproducibility of the web applications as well as server-side projects. While some results, our tool and list of the studied subjects are publicly JavaScript features are known to be hard to test, no empirical available [18]. Regarding the generalizability of the results to study was done earlier towards measuring the quality and other JavaScript projects, we believe that the studied set of coverage of JavaScript tests. This work presents the first empir- subjects is representative of real-world JavaScript projects as ical study of JavaScript tests to characterize their prevalence, they differ in domain (category), size (SLOC), maturity (num- quality metrics, and shortcomings. ber of commits and contributors), and popularity (number of We found that a considerable number of JavaScript projects stars and watchers). With regards to the subject categorization, do not have any tests and this is in particular for projects with we used some existing categories proposed by JSter Catalog client-sideJavaScript code. On the other hand, almost all purely [14] and GitHub Showcases [13]. server-side JavaScript projects have tests and the quality of There might be case that TESTSCANNER cannot detect those tests are higher compared to client-side tests. On average, a desired pattern in the code as it performs complex static JavaScript tests lack proper coverage for event-dependent code analysis for detecting DOM-related statements, event- callbacks, asynchronous callbacks, and DOM-related code. dependent callbacks, and asynchronous APIs. To mitigate this The results of this study can be used to improve JavaScript threat, we made a second pass of manual investigation through test generation tools in producing more effective test cases that such code patterns using grep with regular expressions in target hard-to-test portions of the code. We also plan to evalu- command line and manually validated random cases. Such ate effectiveness of JavaScript test by measuring their mutation a textual search within JavaScript files through grep was score, which reveals the quality of written assertions. Another especially done for a number of projects with parsing errors in possible direction could be designing automated JavaScript their code for which TESTSCANNER cannot generate a report code refactoring techniques towards making the code more or the report would be incomplete. Since our tool statically testable and maintainable. analyzes test code to compute the number of function calls per test, it may not capture the correct number of calls that ACKNOWLEDGMENT happen during execution. While dynamic analysis could help This work was supported by the National Science and with this regard, it can not be used for the unexecuted code Engineering Research Council of Canada (NSERC) through and thus is not helpful to analyze uncovered code. its Strategic Project Grants programme and Alexander Graham IV. RELATED WORK Bell Canada Graduate Scholarship.

There are number of previous empirical studies on REFERENCES JavaScript. Ratanaworabhan et al. [38] and Richards et al. [40] [1] Examples of hard to test JavaScript. https://www.pluralsight.com/blog/ studied JavaScript’s dynamic behavior and Richards et al. [39] software-development/6-examples-of-hard-to-test-. analyzed security issues in JavaScript projects. Ocariza et al. [2] How to unit test private functions in JavaScript. https://philipwalton. [37] performed study to characterize root causes of client- com/articles/how-to-unit-test-private-functions-in-javascript/. [3] Istanbul - a JS code coverage tool written in JS. https://github.com/ side JavaScript bugs. Gallaba et al. [21] studied the use of gotwarlost/istanbul. callback in client and server-side JavaScript code. Security [4] Jasmine. https://github.com/pivotal/jasmine. [5] Jscover. http://tntim96.github.io/JSCover/. Testing, Verification and Validation (ICST). IEEE Computer Society, [6] Mocha. https://mochajs.org/. 2013. [7] Mozilla Rhino. https://github.com/mozilla/rhino. [32] S. Mirshokraie, A. Mesbah, and K. Pattabiraman. Jseft: Automated [8] Nodeunit. https://github.com/caolan/nodeunit. JavaScript unit test generation. In Proceedings of the International [9] QUnit. http://qunitjs.com/. Conference on Software Testing, Verification and Validation (ICST), page [10] Which JavaScript test library should 10 pages. IEEE Computer Society, 2015. you use? http://www.techtalkdc.com/ [33] S. Mirshokraie, A. Mesbah, and K. Pattabiraman. Atrina: Inferring which-javascript-test-library-should-you-use-qunit-vs-jasmine-vs-mocha/. unit oracles from GUI test cases. In Proceedings of the International [11] Writing testable code in JavaScript: A brief overview. https://www. Conference on Software Testing, Verification, and Validation (ICST), toptal.com/javascript/writing-testable-code-in-javascript. page 11 pages. IEEE Computer Society, 2016. [12] Writing testable JavaScript. http://www.adequatelygood.com/ [34] H. V. Nguyen, H. A. Nguyen, A. T. Nguyen, and T. N. Nguyen. Writing-Testable-JavaScript.. Mining interprocedural, data-oriented usage patterns in JavaScript web [13] Github Showcases. https://github.com/showcases, 2014. applications. In Proceedings of the 36th International Conference on [14] JSter JavaScript Libraries Catalog. http://jster.net/catalog, 2014. Software Engineering, pages 791–802. ACM, 2014. [15] Most depended-upon NMP packages. https://www.npmjs.com/browse/ [35] N. Nikiforakis, L. Invernizzi, A. Kapravelos, S. Van Acker, W. Joosen, depended, 2014. C. Kruegel, F. Piessens, and G. Vigna. You are what you include: large- [16] Testing and deploying with ordered npm run scale evaluation of remote JavaScript inclusions. In Proceedings of the scripts. http://blog.npmjs.org/post/127671403050/ 2012 ACM conference on Computer and communications security, pages testing-and-deploying-with-ordered-npm-run-scripts, 2015. 736–747. ACM, 2012. [17] SLOC (source lines of code) counter. https://github.com/flosse/sloc/, [36] F. Ocariza, K. Bajaj, K. Pattabiraman, and A. Mesbah. An empirical 2016. study of client-side JavaScript bugs. In Proceedings of the Interna- [18] TestScanner. https://github.com/saltlab/testscanner, 2016. tional Symposium on Empirical Software Engineering and Measurement [19] S. Artzi, J. Dolby, S. Jensen, A. Møller, and F. Tip. A framework for (ESEM), pages 55–64. IEEE Computer Society, 2013. automated testing of JavaScript web applications. In Proceedings of the [37] F. Ocariza, K. Bajaj, K. Pattabiraman, and A. Mesbah. A study of causes International Conference on Software Engineering (ICSE), pages 571– and consequences of client-side JavaScript bugs. IEEE Transactions on 580. ACM, 2011. Software Engineering (TSE), page 17 pages, 2017. [20] D. Crockford. JavaScript: the good parts. O’Reilly Media, Incorporated, [38] P. Ratanaworabhan, B. Livshits, and B. G. Zorn. JSMeter: Comparing 2008. the behavior of JavaScript benchmarks with real web applications. [21] K. Gallaba, A. Mesbah, and I. Beschastnikh. Don’t call us, we’ll In Proceedings of the 2010 USENIX Conference on Web Application call you: Characterizing callbacks in JavaScript. In Proceedings of the Development, WebApps’10, pages 3–3, Berkeley, CA, USA, 2010. ACM/IEEE International Symposium on Empirical Software Engineering USENIX Association. and Measurement (ESEM), pages 247–256. IEEE Computer Society, [39] G. Richards, C. Hammer, B. Burg, and J. Vitek. The eval that men do. 2015. In ECOOP 2011–Object-Oriented Programming, pages 52–78. Springer, [22] GitHut. A small place to discover languages in GitHub. http://githut.info, 2011. 2016. [40] G. Richards, S. Lebresne, B. Burg, and J. Vitek. An analysis of [23] P. Heidegger and P. Thiemann. Contract-driven testing of Javascript the dynamic behavior of JavaScript programs. In Conference on code. In Proceedings of the 48th International Conference on Objects, Programming Language Design and Implementation (PLDI), pages 1– Models, Components, Patterns, TOOLS’10, pages 154–172. Springer- 12. ACM, 2010. Verlag, 2010. [41] P. Saxena, D. Akhawe, S. Hanna, F. Mao, S. McCamant, and D. Song. [24] L. Inozemtseva and R. Holmes. Coverage is not strongly correlated with A symbolic execution framework for JavaScript. In Proceedings of the test suite effectiveness. In Proceedings of the International Conference Symposium on Security and Privacy, pages 513–528. IEEE Computer on Software Engineering (ICSE), 2014. Society, 2010. [25] D. Jang, R. Jhala, S. Lerner, and H. Shacham. An empirical study of [42] K. Sen, S. Kalasapur, T. Brutch, and S. Gibbs. Jalangi: A selective privacy-violating information flows in JavaScript web applications. In record-replay and dynamic analysis framework for JavaScript. In Proceedings of the 17th ACM conference on Computer and communi- Proceedings of the 9th Joint Meeting on Foundations of Software cations security, pages 270–283. ACM, 2010. Engineering, ESEC/FSE, pages 488–498. ACM, 2013. [26] R. Just, D. Jalali, L. Inozemtseva, M. D. Ernst, R. Holmes, and G. Fraser. [43] Stack Overflow. 2016 Developer Survey. http://stackoverflow.com/ Are mutants a valid substitute for real faults in software testing? In research/developer-survey-2016, 2016. Proceedings of the ACM SIGSOFT International Symposium on the [44] M. E. Trostler. Testable JavaScript. O’Reilly Media, Incorporated, 2013. Foundations of Software Engineering (FSE), FSE 2014, pages 654–665, [45] A. Vahabzadeh, A. Milani Fard, and A. Mesbah. An empirical study of New York, NY, USA, 2014. ACM. bugs in test code. In Proceedings of the International Conference on [27] G. Li, E. Andreasen, and I. Ghosh. SymJS: Automatic symbolic testing Software Maintenance and Evolution (ICSME), pages 101–110. IEEE of JavaScript web applications. In Proceedings of the ACM SIGSOFT Computer Society, 2015. International Symposium on the Foundations of Software Engineering [46] J. Weinberger, P. Saxena, D. Akhawe, M. Finifter, R. Shin, and D. Song. (FSE), page 11 pages. ACM, 2014. An empirical analysis of XSS sanitization in web application frame- [28] A. Milani Fard and A. Mesbah. JSNose: Detecting JavaScript code works. Electrical Engineering and Computer Sciences University of smells. In Proceedings of the International Conference on Source Code California at Berkeley, Technical Report, pages 1–17, 2011. Analysis and Manipulation (SCAM), pages 116–125. IEEE Computer [47] C. Yue and H. Wang. Characterizing insecure JavaScript practices on the Society, 2013. web. In Proceedings of the International World Wide Web Conference [29] A. Milani Fard, A. Mesbah, and E. Wohlstadter. Generating fixtures for (WWW), pages 961–970. ACM, 2009. JavaScript unit testing. In Proceedings of the IEEE/ACM International [48] A. Zaidman, B. van Rompaey, S. Demeyer, and A. van Deursen. Mining Conference on Automated Software Engineering (ASE), pages 190–200. software repositories to study co-evolution of production and test code. IEEE Computer Society, 2015. In Proceedings of the International Conference on Software Testing, [30] A. Milani Fard, M. Mirzaaghaei, and A. Mesbah. Leveraging existing Verification and Validation (ICST), pages 220–229, 2008. tests in automated test generation for web applications. In Proceedings [49] Y. Zhang and A. Mesbah. Assertions are strongly correlated with test of the IEEE/ACM International Conference on Automated Software suite effectiveness. In Proceedings of the joint meeting of the European Engineering (ASE), pages 67–78. ACM, 2014. Software Engineering Conference and the ACM SIGSOFT Symposium [31] S. Mirshokraie, A. Mesbah, and K. Pattabiraman. Efficient JavaScript on the Foundations of Software Engineering (ESEC/FSE), pages 214– mutation testing. In Proc. of the International Conference on Software 224. ACM, 2015.