A Sense of Time for Node.js: Timeouts as a Cure for Event Handler Poisoning

Anonymous

Abstract—The software development community has begun to new Denial of Service attack that can be used against EDA- adopt the Event-Driven Architecture (EDA) to provide scalable based services. Our Event Handler Poisoning attack exploits web services. Though the Event-Driven Architecture can offer the most important limited resource in the EDA: the Event better scalability than the One Per Client Architecture, Handlers themselves. its use exposes service providers to a Denial of Service attack that we call Event Handler Poisoning (EHP). The source of the EDA’s scalability is also its Achilles’ heel. Multiplexing unrelated work onto the same thread re- This work is the first to define EHP attacks. After examining EHP vulnerabilities in the popular Node.js EDA framework and duces overhead, but it also moves the burden of time sharing open-source npm modules, we explore various solutions to EHP- out of the thread library or and into the safety. For a practical defense against EHP attacks, we propose application itself. Where OTPCA-based services can rely on Node.cure, which defends a large class of Node.js applications preemptive multitasking to ensure that resources are shared against all known EHP attacks by making timeouts a first-class fairly, using the EDA requires the service to enforce its own member of the JavaScript language and the Node.js framework. cooperative multitasking [89]. An EHP attack identifies a way to defeat the cooperative multitasking used by an EDA-based Our evaluation shows that Node.cure is effective, broadly applicable, and offers strong security guarantees. First, we show service, ensuring Event Handler (s) remain dedicated to the that Node.cure defeats all known language- and framework- attacker rather than handling all clients fairly. Our analysis level EHP attacks, including real reported EHP attacks. Second, identifies many EHP vulnerabilities among previously-reported we evaluate its costs, finding that Node.js incurs reasonable npm vulnerabilities, and a sample of npm modules suggests overheads in micro-benchmarks, and low overheads ranging from that EHP vulnerabilities may be an epidemic (§IV). 1.02x (real servers) to 1.24x (middleware). Third, we show that Node.cure offers strong security guarantees against EHP attacks In §V we define criteria for EHP-safety, and use these for the majority of the Node.js ecosystem. criteria to evaluate different EHP defenses (§VI). The general solution we propose is to relax but not eliminate the coop- erative multitasking constraint of the EDA. Our prototype, I.INTRODUCTION Node.cure (§VII), makes timeouts a first-class member of the Node.js framework, and is the first framework we know of that Web services are the lifeblood of the modern Internet. To does so. If an Event Handler in Node.cure blocks, Node.cure handle the demands of millions of clients, service providers will emit a TimeoutError. Although cooperative multitasking replicate their services across many servers. Since every server is ultimately maintained, Node.cure thus informs every Event costs time and money, service providers want to maximize the Handler when it ought to yield. number of clients each server can handle. Over the past decade, this goal has led the software community to seriously consider Our findings on EHP-safety have been corroborated by a paradigm shift in their software architecture — from the One the Node.js community (§IX). We have developed a guide for Thread Per Client Architecture (OTPCA) used in Apache to practitioners on building EHP-proof systems (PR preliminarily the Event-Driven Architecture (EDA) championed by Node.js1. approved), updated the Node.js documentation to warn devel- Although using the EDA may increase the number of clients opers about the perils of particular (two PRs merged), each server can handle, it also exposes the service to a new and submitted a pull request to improve the safety of these class of Denial of Service (DoS) attack unique to the EDA: APIs (PR preliminarily approved). Event Handler Poisoning (EHP). In our evaluation of Node.cure (§VIII), we describe the The OTPCA and the EDA differ in the resources they strong security guarantees against EHP that Node.cure offers, dedicate to each client. In the OTPCA, each client is assigned and demonstrate this by showing that Node.cure defeats rep- a thread, and the overhead associated with each thread limits resentatives of each of the classes of EHP attack. Our cost the number of concurrent clients the service can handle. In the evaluation on micro- and application benchmarks shows that EDA, a small pool of Event Handler threads is shared by all Node.cure introduces acceptable monitoring overhead, as low clients. The per-client overhead is thus smaller in the EDA, as 0% on real server applications. improving scalability. The first steps toward this paper were presented at a Though researchers have investigated application- and workshop2. There we introduced the idea of EHP attacks and language-level security issues that affect many services, we are argued that some previously-reported security vulnerabilities the first to examine the security implications of using the EDA were actually EHP vulnerabilities. We also proposed the “C- from a software architecture perspective. In §III we describe a WCET Principle”, which we have evaluated and found wanting

1The EDA paradigm is also known as the reactor pattern. 2We omit the reference to meet the double-blind requirement. (§VI-C). Here we offer the complete package: we refine system call3), and pending events are handled sequentially our previous definition of EHP, increase our survey of EHP by an . The Event Loop may offload expensive vulnerabilities in the wild, and describe a prototype that offers Tasks to the queue of a small, fixed-size Worker Pool, whose strong security guarantees using timeouts. Workers execute Tasks and generate “Task Done” events for the Event Loop when they finish. One common use of the In summary, our work makes the following contributions: Worker Pool is to provide quasi-asynchronous (blocking but 1) We are the first to define Event Handler Poisoning (EHP) concurrent) access to the file system. (§III), a DoS attack against the EDA. EHP attacks con- sume Event Handlers, a limited resource in the EDA. Our definition of EHP accommodates all known EHP attacks. 2) After evaluating the solution space for EHP defenses (§VI), we describe a general antidote for Event Handler Poisoning attacks: first-class timeouts (§VII). We demonstrate its effectiveness with a prototype for the popular Node.js EDA framework. To the best of our knowledge, our prototype, Node.cure, is the first EDA framework to incorporate first- class timeouts. 3) We evaluate the security guarantees of timeouts (strong), their costs (small), and their effectiveness (quite) in the Node.js ecosystem (§VIII). 4) We released a guide on EHP-safe techniques for new developers in the EDA, which has been accepted. Our PR Fig. 1. This is the AMPED Event-Driven Architecture. Incoming events from clients A and B are stored in the event queue, and the associated Callbacks improving the safety of Node.js APIs has been accepted (CBs) will be executed sequentially by the Event Loop. The Event Loop may and our PRs documenting unsafe APIs were merged (§IX). generate new events, to be handled by itself later or as Tasks offloaded to the Worker Pool. Alas, B has performed an EHP attack on this service. His evil input (third in the event queue) will eventually poison the Event Loop, and II.BACKGROUND the events after his input will not be handled (§III-D). In this section we contrast the EDA with the OTPCA (§II-A). We then explain our choice of EDA framework for Application developers must adopt the EDA paradigm study (§II-B), and finish by introducing formative work on when developing EDA-based software. Developers define the Algorithmic Complexity attacks (§II-C). set of possible Event Loop Events for which they supply a set of Callbacks [60]. During the execution of an EDA-based application, events are generated by external activity or internal A. Overview of the EDA processing and routed to the appropriate Callbacks. There are two main architectures for scalable web servers, We refer to the Event Loop and the Workers as Event the established One Thread Per Client Architecture (OTPCA) Handlers. Events for the Event Loop are supplied by the and the upstart Event-Driven Architecture (EDA). In the OT- operating system or the framework, while events for the PCA style, exemplified by the Apache Web Server [61], each Worker Pool (i.e. Tasks) are supplied by the Event Loop. We client is assigned a thread or a process, with a large maximum say that the Event Loop “executes a Callback” in response to number of concurrent clients. The OTPCA model isolates an event, and that a Worker “performs a Task”. A Callback clients from one another, reducing the risk of interference. is executed synchronously on the Event Loop, and a Task is However, each additional client incurs memory and context performed synchronously on a Worker, asynchronously from switching overhead. In contrast, by multiplexing multiple client the Event Loop’s perspective. connections onto a single-threaded Event Loop and offering a small Worker Pool for asynchronous tasks, the EDA approach In the EDA, each Callback and Task is guaranteed atom- reduces a server’s per-client resource use. While the OTPCA icity: once scheduled, it runs to completion. Because there are and EDA architectures have the same expressive power [71], relatively few Event Handlers, cooperative multitasking [89] in practice EDA-based servers can offer greater scaling [84]. is a must. Developers therefore partition the generation of responses into multiple stages (forming its Lifeline [38], a Although EDA approaches have been discussed and im- DAG) and regularly yield to other requests by deferring plemented in the academic and professional communities for work until the next stage [52]. Unfortunately, in Figure 1 an decades (e.g. [94], [40], [63], [56], [98], [87]), historically the attacker B carried out an EHP attack, defeating the cooperating EDA was only applied in user interface settings like GUIs and multitasking and preventing other requests from being handled. web browsers. The EDA is increasingly relevant in systems programming today thanks largely to the explosion of interest B. Node.js among other EDA frameworks in Node.js, an event-driven server-side JavaScript framework. An EDA approach can be (and has been) implemented in The most common version of the EDA on the server most programming languages. Examples of EDA frameworks side is the Asymmetric Multi-Process Event-Driven (AMPED) include Node.js (JavaScript) [17], libuv (C/C++) [11], Event- architecture [83], used by all mainstream general-purpose EDA Machine (Ruby) [5], Vert.x (Java) [28], Zotonic (Erlang) [31], frameworks. The AMPED architecture (hereafter “the EDA”) is illustrated in Figure 1. In the EDA the operating system or a 3“[P]oll...shall identify those file descriptors on which an application can framework places events in a queue (e.g. through ’s poll read or write data, or on which certain events have occurred” [35].

2 Twisted (Python4) [27], and Microsoft’s P# [55]. These frame- 1 const cb = (req, resp) => { works have been used to build a wide variety of industry and 2 let f = requestedFile(req); 3 if (f.match(/(\/.+)+$/)) { // ReDoS open-source services, at companies including LinkedIn [78], 4 FS.readFile(f, (err, d) => { // ReadDoS PayPal [66], eBay [82], and IBM [68], and for projects like 5 resp.end(’Finished read’); Ubuntu One [33], Apple’s Calendar and Contacts Server [32], 6 }); and IoT frameworks [9], [4]. 7 } 8 else { resp.end(’Invalid file’); } We selected Node.js for further investigation based on its 9 } 10 const srv = _HTTP.createServer(cb).listen(3000); community interest and maturity. To measure this, we surveyed each framework’s GitHub [7] repository for the number of contributors and forks (community interest), and the number Fig. 2. Gist of our simple server vulnerable to EHP attacks. Line 3 is of lifetime and recent commits (maturity). As of November 14, vulnerable to ReDoS. Line 4 is vulnerable to ReadDoS. 2017, Node.js had the most community interest (1,782 contrib- utors against libuv’s 269; 8,775 forks against Vert.x’s 1,177), III.EVENT HANDLER POISONING ATTACKS and strong maturity (19,832 lifetime commits to Twisted’s In this section we provide our threat model (§III-A), 22,161; 2,382 commits in the last six months against Twisted’s define Event Handler Poisoning (EHP) attacks more formally 1,106). Thus, we selected Node.js (JavaScript) for closer (§III-B), and present a taxonomy for EHP attacks (§III-C). investigation, though as described in §III, EHP attacks apply To illustrate the problem, we demonstrate simple EHP attacks to any EDA-based service. using ReDoS and ReadDoS (§III-D). Node.js is a server-side event-driven framework for JavaScript that uses the architecture shown in Figure 1. Node.js A. Threat model executes an application’s JavaScript using Google’s V8 en- gine [3], and offers a set of core JavaScript modules with Our threat model is straightforward: the attacker crafts evil C++ bindings that implement system calls using libuv [11]. input that triggers an expensive operation, and convinces the Introduced in 2009, Node.js claimed 3.5 million users in EDA-based victim to feed it to a Vulnerable API (§III-C). 2016 [34], and as of August 2017 its open-source package An ideal EHP attack is asymmetric, such that a small evil ecosystem, npm, is the largest of any programming language input triggers a large amount of work on the part of the or framework [13], with 550,000 modules and counting. poisoned Event Handler. Depending on how much work the evil input triggers, the attacker can achieve DoS outright, or can repeatedly apply poison in a Slow DoS attack [44]. Servers are in general vulnerable to DDoS attacks, whether OTPCA C. Algorithmic complexity attacks or EDA, and we exclude these from our model. Our work is inspired by Algorithmic Complexity (AC) This model is reasonable because our victim is an EDA- attacks, a form of Denial of Service attack ([76], [51]). In an based server; servers exist solely to process client input using AC attack, attackers force victims to take longer than expected APIs. If the server uses a Vulnerable API, and if it invokes to perform an operation by incurring the worst-case complexity it with unsanitized client input, the server can be poisoned. of the victim’s algorithms. Rather than overwhelming the In §III-C we show examples of Vulnerable APIs in Node.js victim with volume, the attacker turns the victim’s vulnerable and other EDA frameworks, at the language, framework, and algorithms against him. AC vulnerabilities can therefore be application levels. exploited to launch DoS attacks, when the attacker forces the victim to spend a long time processing his request at the expense of legitimate clients, or when the attacker increases B. What happens in an EHP attack? the cost even of legitimate requests. Total/synchronous complexity and time. Before we can Well-known examples of AC attacks include attacks on define EHP attacks, we must introduce a few definitions. First, hash tables, whose worst-case O(N) behavior can be triggered recall the standard EDA formulation illustrated in Figure 1. by attackers [51], and Regular expression Denial of Service Each client request is the initial event of a Lifeline [38] (ReDoS) attacks, which exploit the worst-case exponential- partitioned into one or more events and Tasks handled by the time behavior typical of regular expression (regexp) engines appropriate Event Handler. A Lifeline is thus a graph of event ([50], [88], [93], [70], [86], [95]). Callbacks and Tasks, whose vertices represent a Callback or Task and whose edges indicate events or Task submission. As will be made clear in §III, our work is not simply the To illustrate a Lifeline, consider the server in Figure 2. The application of AC attacks to the EDA. The first distinction arrival of an HTTP request on port 3000 is the initial event. is a matter of perspective: where AC attacks focus on the The corresponding Callback cb (lines 1-9) is handled by the algorithms the service employs, EHP attacks target the soft- Event Loop, which synchronously invokes an application-level ware architecture used by the service. The second distinction API (line 2) and a language-level regexp API (line 3), and then is one of definition: AC attacks rely on CPU-bound activities creates another vertex in the Lifeline graph by submitting an to achieve DoS, while EHP attacks may use either CPU-bound asynchronous file system request. The Worker Pool handles or I/O-bound activities to poison an Event Handler. this Task, and on completion it adds another vertex to the Lifeline by emitting a “Task Complete” event to the Event 4In addition, Python 3.4 introduced native EDA support. Loop (Callback defined on lines 4-6).

3 Event Loop (§VII-B) Worker Pool (§VII-A) Vuln. APIs We define the total complexity of a Lifeline as the com- CPU-bound I/O-bound CPU-bound I/O-bound bined complexity (cost) of each vertex (Callbacks and Tasks) Language Regexp, JSON N/A N/A N/A as a function of the cumulative input to all vertices. The Framework Crypto, zlib FS Crypto, zlib FS, DNS synchronous complexity of a Lifeline is the greatest individual Application while(1) DB query Regexp [15] DB query complexity among all vertices in the Lifeline. Similarly, we Table I. Taxonomy of Vulnerable APIs in Node.js, with examples. An define a Lifeline’s total time and synchronous time as the EHP attack through a Vulnerable API poisons the Event Loop or a Worker, and its synchronous time is due to CPU-bound or I/O-bound activity. A cumulative and greatest-individual time taken, respectively, Vulnerable API might be part of the language, framework, or application, by the vertices in the Lifeline. Since the EDA relies on and might be largely synchronous (Event Loop) or asynchronous (Worker cooperative multitasking, a Lifeline’s synchronous complexity Pool). zlib is the Node.js compression library. N/A: JavaScript has no native and synchronous time provide theoretical and practical upper Worker Pool nor any I/O APIs. bounds on how effective an EHP attack against it might be. We also refer to the synchronous complexity and time of individual EDA server’s Worker Pool is too few to draw attention from Callbacks and Tasks, defined as the individual complexity and network-level defenses, placing the burden of defense on the time of the corresponding Lifeline vertex. language, framework, and application developers — or, as we propose in §VI-D, the runtime system. EHP attacks. Remember that each Callback and Task is handled in its turn by one of the Event Handlers, and that We believe EHP attacks on the EDA have not previously the Event Handlers are not dedicated one to each client, but been explored because the EDA has only recently seen popular shared among all of the clients. This sharing means that the adoption by the server-side community. On the client side EHP responsiveness of the server to its clients depends not on OS attacks are not a concern, as misbehaving users will hurt only preemption, but on cooperative multitasking; each Callback themselves. On the server side, the stakes are rather higher, and Task must be handled quickly by its Event Handler to especially as the EDA is adopted in safety-critical settings like give Callbacks and Tasks from other clients a turn. IoT controllers [9] and robotics [4]. An EHP attack exploits this architecture by poisoning one C. What makes an API Vulnerable? or more Event Handlers with evil input. After identifying a vul- nerable Lifeline, one with a large worst-case synchronous com- Thus far we have relied on the intuitive notion of a plexity, an attacker submits evil input that triggers this worst- Vulnerable API. We turn now to a more precise taxonomy. case complexity. The Lifeline’s large synchronous complexity Using the terms from §III-B, a Vulnerable API begins a leads to large synchronous time, causing the affected Event Lifeline that can have large synchronous time, due either to Handler to block, perhaps indefinitely, while handling the its large synchronous complexity or to the size of the input. expensive event or Task. While an Event Handler is blocked, A Vulnerable API poisons the Event Loop or a Worker. Its it will not continue to process other events or Tasks, and the synchronous time can be attributed to heavy computation or Lifelines of pending events and Tasks will starve. EHP attacks the need for I/O. Note that an API that creates a Lifeline thus exploit a fundamental property of the EDA, namely the with large total time is not vulnerable so long as each vertex need to correctly implement fair cooperative multitasking. (Callback/Task) has a small synchronous time; it is large An EHP attack can be carried out against either the Event synchronous time that leads to EHP. Loop or the Workers in the Worker Pool. If the Event Loop is Table I classifies Vulnerable APIs along two axes. Along poisoned, neither pending nor future events will be handled. On the first axis, a Vulnerable API affects one of the two types of the other hand, for each poisoned Worker the responsiveness of Event Handlers, and it might be CPU-bound or I/O-bound. the Worker Pool will degrade, until every Worker is poisoned Along the second axis, a Vulnerable API can be found in and all pending and future requests whose Lifelines submit a one of three places: the language, the framework, or the Task to the Worker Pool will receive no responses. Thus, an application. Table I lists examples of Vulnerable APIs for attacker’s aim is to poison either the Event Loop or enough of applications written in JavaScript for the Node.js framework. the Worker Pool to harm the throughput of the server. In our evaluation we provide an exhaustive list of Vulnerable You might be surprised that we name the Worker Pool as APIs at the language and framework levels (§VIII-A). a potential target for EHP attacks. Remember, however, that Although the framework component of Table I is specific to in the EDA the Worker Pool is deliberately quite small, as Node.js, the same general classification can be applied to other part of the EDA’s low-overhead philosophy. For example, in EDA frameworks. For example, most mainstream languages a Node.js process the Worker Pool defaults to 4 Workers, (Java, Python, , Ruby, etc.) support regular expressions with a maximum of 128 Workers [20]. Though a small and are vulnerable to ReDoS; and unless care is taken, file Worker Pool is in keeping with the spirit of the EDA, its size system or DNS requests to slow resources will poison Event makes it a realistic target. In contrast, though typical OTPCA- Handlers in many EDA frameworks. based servers also maintain a pool of threads (Workers) to bound the total number of concurrent connections, this pool is D. Two simple EHP attacks comparatively enormous. Apache’s httpd server has a default connection pool of size 256-400, with a maximum around To illustrate EHP attacks, we developed a small vulnerable 20,000 [61]. Due to the size of an OTPCA server’s “Worker file server. a simplified version is shown in Figure 2. This Pool”, attempts to poison it would presumably be detected server has two EHP vulnerabilities. First, it sanitizes input with by an IDS or addressed through rate limiting before they a regexp vulnerable to ReDoS. This regular expression ensures poisoned enough of the pool to have an effect. By comparison, that paths are /-delimited, but the inclusive “.” will also match the small number of requests needed to poison a vulnerable a “/”. On the evil input “/.../\n” (50 /’s and a newline that “.”

4 does not match), this input triggers exponential-time behavior in Node.js’s regexp engine while it backtracks in search of a division of the /’s that will match, poisoning the Event Loop in a CPU-bound EHP attack and blocking all subsequent requests. The second EHP vulnerability might be more surprising. While the sanitization regexp fulfills its primary purpose, pre- venting readFile from throwing a “malformed path” error, it permits a client to read an arbitrary file — a directory traversal vulnerability. In the EDA, directory traversal vulnerabilities can be parlayed into I/O-bound EHP attacks, “ReadDoS”, provided the attacker can identify a Slow File5 or a Large File from which to read. Since line 4 uses the asynchronous framework API FS.readFile, each ReadDoS attack on this Fig. 3. This figure shows the effect of evil input on the throughput of a server will poison a Worker. simple server based on Figure 2, using either Baseline Node.js (grey) or our proposed runtime, Node.cure (black). For ReDoS (triangles), evil input was In Figure 3 you can see the baseline server throughput injected after three seconds, poisoning the Baseline Event Loop. For ReadDoS (circles) and the throughput while undergoing two attacks (circles), evil input was injected four times at one second intervals beginning (ReDoS in triangles, ReadDoS in squares), averaged over five after three seconds, eventually poisoning the Baseline Worker Pool. The lines for Node.cure shows its effectiveness against these EHP attacks. Node.cure’s runs. In each condition we configured a Worker Pool of size throughput on ReDoS dips until the first Callback times out (§VII-B), after 4, and used the ab benchmark [1] to make 2500 legitimate which it catches up to pending requests from clients. Because the Worker requests per second divided among 80 clients. We measured Pool is not saturated, Node.cure does not suffer while it waits for the timeout the instantaneous throughput of the server at 200 ms intervals. of the first Task containing a ReadDoS request for /dev/random (§VII-A), and marks /dev/random’s inode Slow. Subsequent ReadDoS requests time out Exploiting the ReDoS vulnerability has an immediate ef- immediately (§VII-D). fect. After 1 second the attacker submits the evil input, poi- soning the Event Loop and blocking all subsequent requests. Second, Node.js maintains documentation for the APIs of its In contrast, the ReadDoS attack only becomes noticeable after core modules [21]. Third, the nodejs.org website offers a set three evil requests because the Worker Pool is not saturated of guides for new developers [19]. in this example. At seconds 1, 2, 3, and 4 an attacker submits Although there are potential EHP vulnerabilities in an one read request to a Slow File. Each evil input poisons one implementation of JavaScript, like ReDoS or large JSON of the four Workers. The throughput dips after three malicious manipulation, language specifications do not always discuss requests, as the legitimate requests saturate the sole surviving their risks. For example, in MDN’s documentation only eval Worker. The fourth evil input entirely poisons the Worker Pool, has warnings. flatlining the throughput. The Node.js documentation does not indicate any vulner- The architecture-level behavior of the ReDoS attack, an abilities in its JavaScript engine. However, it does give hints EHP attack on the Event Loop, is illustrated in Figure 1. After about avoiding EHP attacks on its framework APIs. Node.js client A’s benign request is sanitized (CBA1), the file read APIs include several modules with Vulnerable APIs, and de- Task goes to the Worker Pool (T askA1), and when the read velopers are warned away from the synchronous versions. The completes the Callback returns the file content to A (CBA2). asynchronous versions, which could poison the Worker Pool, Then client B’s malicious request arrives, and the service’s bear the warning: “[These] APIs use libuv’s threadpool, which input sanitization triggers ReDoS (CBB1). can have surprising and negative performance implications for some applications,” and refer to another page reading “Because IV. EVENT HANDLER POISONING ATTACKS INTHE WILD libuv’s threadpool has a fixed size, it means that if for whatever reason any of these APIs takes a long time, other (seemingly With a definition of EHP attacks in hand, we now assess unrelated) APIs that run in libuv’s threadpool will experience the presence of EHP vulnerabilities in the Node.js ecosystem. degraded performance.” We found that the Node.js documentation does not carefully outline the need for cooperative multitasking (§IV-A), which Missing from the Node.js new developer guides is a discus- might contribute to the fact that many popular npm modules sion about how to design a Node.js application to avoid EHP expose Vulnerable APIs and rely on the application to sanitize vulnerabilities. Later we describe the guide we have prepared input (§IV-B), and that 35% of known npm module vulnera- on these topics (§IX); it has been preliminarily approved for bilities can be used for EHP attacks (§IV-C). placement on nodejs.org.

A. What do the Node.js docs say about EHP attacks? B. Do npm modules have Vulnerable APIs? Users in search of Node.js documentation can look in three In December 2016 we examined 80 npm modules to places. First, there is the JavaScript language specification [14]. determine how they handle potentially computationally expen- sive operations, i.e. application-level CPU-bound potentially- 5In addition to files exposed on network file systems, /dev/random is Vulnerable APIs (Table I). We manually inspected both their a good example of a Slow File. According to the Linux manual, “[R]eads from /dev/random may block...users will usually want to open it in documentation and their implementation. Using npm’s “Best nonblocking mode (or perform a read with timeout)” [36]; by default, Node.js overall” sorting scheme, we examined the first 20 (unique) on Linux does neither. modules returned by npm searches for string, string

5 manipulation, math, and algorithm, in hopes of striking a balance between popularity and potential complexity. These modules were downloaded over 200 million times that month. We found that many of these modules had Vulnerable APIs, performing operations with high synchronous complex- ity and thus leaving callers vulnerable to EHP attack. Although knowing the synchronous complexity of an API is critical to preventing EHP, the documentation for the majority of the APIs did not state their running times. Nearly all of the hundreds of APIs we examined executed synchronously on the Event Loop; though two were asynchronous, one of these synchronously executed an exponential-time algorithm before Fig. 4. Classification of the 713 npm module vulnerabilities, by category yielding control. During our analysis of the source code we and by usefulness in EHP attacks. Vulnerabilities were iteratively categorized identified algorithms with complexities including O(1), O(n), using regexp. We analyzed the contents of the database as of 8 August 2017. O(n2) O(n3) O(2n) , , and . Arbitrary File Write vulnerabilities that, like ReadDoS, can be In summary, in our survey of 80 popular npm modules used for EHP attacks by writing to Slow Files. we found examples of Vulnerable APIs in the majority of the modules. Module and application developers who use these V. WHAT DOESIT MEANTOBE EHP-SAFE? APIs must weigh the risks of calling these APIs, but this is Having defined the EDA (§II), explained EHP attacks difficult without knowing the synchronous complexity of the (§III), and surveyed their reality (§IV), we now ponder what API (usually undocumented) or studying its implementation to it means to be EHP-Safe. The fundamental cause of EHP learn what input might be evil (impractical). vulnerabilities should be clear: EHP vulnerabilities stem from Given the prevalence of Vulnerable APIs in these modules, Vulnerable APIs that fail to honor cooperative multitasking. we did not open bug reports against each of them. We believe As categorized in Table I, Vulnerable APIs can poison the the problem of EHP attacks is systemic in the Node.js commu- Event Loop or the Worker Pool, they can be CPU-bound or nity. Rather than pursuing ad hoc bug reports and patches, we I/O-bound, and they occur at any level of the application stack: took fundamental steps to improve the state of affairs, with a language (vulnerable constructs), framework (vulnerable core detailed guide for new developers (§IX) and a comprehensive modules), or application (vulnerable code in the application or defense against EHP in the form of Node.cure (§VII). the library modules on which it depends). C. Are there examples of EHP vulnerabilities? To be EHP-safe, an application must use APIs whose synchronous time is bounded. If an application cannot bound We identified two databases describing vulnerabilities in the synchronous time of its APIs, it is vulnerable to EHP npm modules, one maintained by the Node Security Platform attack; conversely, if an application can bound the synchronous (NSP) [16] and the other by Snyk.io [25]. Our evaluation time of its APIs, then it is EHP-safe. This application-level of the first several hundred vulnerabilities in each suggested guarantee must span the application’s full stack: the language that NSP’s database (355 entries) is a subset of Snyk.io’s constructs it uses, the framework APIs it invokes, and the database (713 entries), so we performed a detailed analysis application code itself. only of Snyk.io’s database. In this section we summarize the vulnerabilities in the Snyk.io database and analyze them In §VI, we outline several approaches to EHP-safety. How through the lens of EHP attacks. shall we choose between them? We suggest three criteria: First, we divided the vulnerabilities by category. Each raw 1) General. A developer must be able to use existing APIs or Snyk.io database entry includes a title and description, and slight modifications thereof. This ensures that applications some have additional material like CWE labels and links to are general, that they can solve meaningful problems. GitHub patches. Since the titles indicate the high-level exploit, 2) Developer-friendly. A developer should not need expertise e.g. “Directory Traversal in servergmf...”, we sorted the in topics outside of their application domain. npm vulnerabilities into 17 common categories and one Other 3) Composable. If a developer uses only EHP-safe APIs, they category for the remaining 40. Figure 4 shows the distribution, should be unable to create a Vulnerable API. In a com- absorbing categories with fewer than 10 vulnerabilities into posably EHP-safe system, an API containing while(1); Other. A high-level CWE number is given next to each class. would not be Vulnerable, because it is composed only of non-Vulnerable “language APIs”. In contrast, a non- Then, we studied each vulnerability to see if it could be composably EHP-safe system has only ad hoc EHP safety, employed in an EHP attack. The dark bars in Figure 4 indicate because in addition to making language- and framework- the 186 vulnerabilities (35%) that could be used to launch an level APIs non-Vulnerable, application-level APIs com- EHP under our threat model (§III-A). The 145 EHP-relevant posed of them must also be assessed. Directory Traversal vulnerabilities are exploitable because they allow arbitrary file reads, which can poison the Worker Pool VI.ANTIDOTESFOR EVENT HANDLER POISONING through ReadDoS. The 32 EHP-relevant Denial of Service vulnerabilities poison the Event Loop; 28 are ReDoS, 3 lead Here we discuss four schemes by which to guarantee EHP- to infinite loops, and 1 allows an attacker to trigger worst- safety: removing Vulnerable APIs (§VI-A), only calling Vul- case algorithm behavior in input processing. In Other are 9 nerable APIs with safe input (§VI-B), refactoring Vulnerable

6 APIs into safe ones (§VI-C), and strictly bounding an Event to take responsibility for applying the C-WCET Principle to Handler’s synchronous time at runtime (§VI-D). We evaluate every API at the application and library levels. these schemes using the criteria from §V. Nevertheless, because the C-WCET Principle has attractive benefits, we developed a Node.C-WCET prototype to evaluate A. Remove calls to Vulnerable APIs: The Blacklist Approach its practicality at the language and framework levels. Node.C- To become EHP-safe, an application might eliminate calls WCET aimed to defeat ReDoS (language-level) and ReadDoS to Vulnerable APIs. Because this Blacklist Approach fails the (framework-level) EHP attacks using the C-WCET Princi- generality criterion, it does not merit further discussion. ple. To combat ReDoS, wherever possible Node.C-WCET uses the linear-time RE2 regexp engine [22] instead of V8’s B. Sanitize input to Vulnerable APIs: The Sanitary Approach exponential-time regexp engine. And to combat ReadDoS, we replaced the file system read API’s synchronous-on-a-Worker Many Vulnerable APIs are only problematic if they are implementation with truly asynchronous kernel AIO7 so that called with unsafe input. Another approach to EHP-safety, a Worker would not be poisoned by a Slow File, and further then, is the Sanitary Approach: ensure that every Vulnerable ensured that all reads of Large Files are partitioned. Due to API is always called with a safe input. The Sanitary Ap- space limitations we omit a detailed description of Node.C- proach is developer-unfriendly, because it requires developers WCET and our experiments. to identify evil input for the APIs they call. For example, Vulnerable APIs like regexp are themselves commonly used to Ultimately, we deemed Node.C-WCET a failure, and re- identify invalid input. If the sanitization regexps are vulnerable jected the C-WCET Principle as a solution to EHP attacks. to ReDoS, quis custodiet ipsos custodes? From an engineering perspective, we found that Node.C- WCET introduced significant overheads (e.g. Linux kernel AIO C. Refactor Vulnerable APIs: The C-WCET Principle requires O_DIRECT), and that the effort to understand and fully refactor all language- and framework-level Vulnerable A more promising path to EHP-safety is to refactor Vul- APIs would be considerable. But more importantly, from a nerable APIs into safe ones. Since the goal of EHP-safety is to security perspective Node.C-WCET remained unsound in two bound the synchronous time of a Vulnerable API, a good rule ways. First, Node.C-WCET was still vulnerable to real ReDoS of thumb might be to restrict the cost of every Callback/Task and ReadDoS attacks from known vulnerabilities, because vertex in a Lifeline to Constant Worst-Case Execution Time not all regexps can be evaluated in linear time, and Linux’s (C-WCET) by partitioning the generation of a response into kernel AIO does not support device files like the Slow File (ideally constant-time) chunks wherever possible. Such a C- /dev/random. Second, the C-WCET Principle is ad hoc, not WCET partitioning would bound the synchronous complexity composable: even if Node.C-WCET partitioned all lower-level of the Lifeline rooted at this Vulnerable API for all inputs. Vulnerable APIs, application-level APIs composed of these If all Vulnerable APIs were replaced with APIs following APIs could still be Vulnerable. the C-WCET Principle, the application would be EHP-safe. The C-WCET Principle is the logical extreme of cooperative D. Dynamically bound running time: The Timeout Approach multitasking, extending input sanitization from the realm of data into the realm of the algorithms used to process it. The goal of the C-WCET Principle was to bound an API’s synchronous complexity as a way to bound its synchronous Evaluating the C-WCET Principle. The C-WCET Prin- time. Instead of statically bounding an API’s synchronous ciple has two benefits. First, by bounding the synchronous complexity, what if we could dynamically bound its syn- complexity of all Callbacks and Tasks, every client request chronous time? Then the complexity of each Callback and could be handled whether or not its input was evil. Second, Task would become irrelevant, because it would be unable the C-WCET Principle can be applied without changing the to take more than the quantum provided it by the runtime. In language specification. For example, as has previously been this Timeout Approach, the runtime detects and aborts long- proposed [49], a non-exponential-time regexp engine could running Callbacks and Tasks by emitting a TimeoutError, be used in place of Node.js’s exponential-time Irregexp thrown from Callbacks and returned from Tasks. engine [48], and the application would be none the wiser. The Timeout Approach provides EHP-safety. Like the C- The C-WCET Principle meets the generality criterion; WCET Principle, the Timeout Approach is general. It is every API can be partitioned towards C-WCET. It is also more developer-friendly because it only requires developers somewhat developer-friendly: language- and framework-level to handle TimeoutErrors in their own APIs. And where APIs can be C-WCET partitioned by framework developers, the C-WCET Principle was ad hoc, the Timeout Approach leaving developers responsible for C-WCET partitioning their is composable. If the EDA runtime had a sense of time, then own APIs. However, since application developers rely heav- a long-running Callback/Task could be interrupted no matter ily on third-party npm modules that might not have been where it was, whether in the language, the framework, or C-WCET-partitioned, they might be called on to C-WCET the application. An application that calls only APIs from the partition modules outside their own expertise. Critically, like language and the framework would always be EHP-safe. the Blacklist and Sanitary Approaches, the C-WCET Principle is not composable. Since Vulnerable APIs can always be Although the Timeout Approach fulfills all of the composed from non-Vulnerable APIs6, developers would need EHP-safety criteria, like C-WCET Partitioning it requires

6As noted in §V, an API consisting of a while(1); loop is Vulnerable 7Unlike Posix asynchronous I/O, which is implemented by libc using in the Blacklist, Sanitary, and C-WCET Approaches, even though it only calls a threadpool, true Linux kernel AIO via io_submit gives the kernel the constant-time “language APIs” while(1) and ;. responsibility for managing the I/O.

7 application-level refactoring. Because the TimeoutErrors emitted by the runtime are not part of the language and framework specification, existing applications would need to be ported. We do not anticipate this to be a heavy burden for well-written applications, which should already be exception- aware to catch errors like buffer overflow (RangeError) and null pointer exceptions (ReferenceError), and which should test the result of a Task for errors. Though the C-WCET Principle and the Timeout Approach both meet the generality criterion, the Timeout Approach is Fig. 5. This figure illustrates Node.cure’s timeout-aware Worker Pool, including the roles of Event Loop, Executors (Worker Pool and Priority), more developer-friendly, and is the only EDA-centric solution and Hangman. Grey entities were present in the original Worker Pool, black 8 we analyzed that is composable . In §VII we describe our are new. The Event Loop can synchronously access the Priority Executor, or Node.cure prototype, which applies the Timeout Approach to asynchronously offload Tasks to the Worker Pool. If an Executor’s Manager all levels of the Node.js stack. sees its Worker time out, it creates a replacement Worker and passes the Dangling Worker to a Hangman.

VII. NODE.CURE:TIMEOUTSFOR NODE.JS Callback Description void work Perform Task. Our Node.cure prototype implements the Timeout Ap- int timed out* When Task has timed out. Can request extension. void done When Task is done. Special error code for timeout. proach for Node.js by bounding the synchronous time of every void killed* When a timed out Task’s thread has been killed. Callback and Task. To do so, Node.cure makes timeouts a first-class member of Node.js, extending the JavaScript speci- Table II. Summary of the Worker Pool API. work is invoked on the Worker. done is invoked on the Event Loop. The new callbacks, fication by introducing a TimeoutError to abort long-running timed_out and killed, are invoked on the Manager and the Tasks and Callbacks. A long-running Task poisons its Worker. Hangman, respectively. On a timeout, work, timed_out, and done Such a Task is aborted and fulfilled with a TimeoutError are invoked, in that order; there is no ordering between the done and (§VII-A). A long-running Callback poisons the Event Loop; killed callbacks, which sometimes requires reference counting for safe in Node.cure these throw a TimeoutError whether they are memory cleanup. *New callbacks. in JavaScript or in a Node.js C++ binding (§VII-B). As a invokes the Task’s timed_out callback. If the submitter does result, Node.cure meets the composability criterion of §V: the not request an extension, the Manager creates a replacement Node.js language and framework APIs are timeout-aware, and Worker so that it can continue to process subsequent Tasks, an application whose APIs are composed of these APIs will be creates a Hangman thread for the poisoned Worker, and EHP-safe (§VII-C). Node.cure also has an exploratory Slow notifies the Event Loop that the Task is complete. The Event Resource Policy to avoid unnecessary timeouts (§VII-D). Loop then invokes its done Callback with a TimeoutError. We describe the Timeout Approach for Tasks first, because Concurrently, once the Hangman successfully kills the Worker it introduces machinery used to apply the Timeout Approach thread, it invokes the Task’s killed callback and returns. to Callbacks. One difficulty with concurrently creating a replacement Worker and assigning the poisoned Worker to a Hangman is A. Timeout-aware Tasks that the poisoned Worker may not have been canceled before its TimeoutError is delivered to the Event Loop. Such a Node.cure defends against all EHP attacks that target the Dangling Worker may be in a system call like write that Worker Pool by bounding the synchronous time of Tasks. To modifies externally-visible state. It is therefore critical that if a make Tasks timeout-aware, we modified both the Worker Pool timeout occurs, a Dangling Worker cannot corrupt shared state. and the Node.js APIs that use it. In general this can be dealt with using transactional techniques Timeout-aware Worker Pool. Asynchronous Vulnerable like undo-logging or write buffering, but we investigate a APIs submit long-running Tasks that can poison a Worker on simpler alternative for the file system in §VII-D. In addition the Worker Pool. We modified the libuv Worker Pool to be to external shared state like the file system, memory safety is timeout-aware, replacing libuv’s Workers with Executors that also a consideration for Dangling Workers because the Node.js combine a permanent Manager with a disposable Worker. If a Worker Pool is implemented using threads. Workers cannot use Task times out, the Executor creates a new Worker to resume memory allocated by the Task submitter because a Dangling handling Tasks and creates a Hangman to pthread_cancel Worker may exist after its Task’s done is invoked. the poisoned Worker. These roles are illustrated in Figure 5. Although the new uv_queue_work API is more complex Here is a slightly more detailed description of our timeout- than the previous one, only experts like framework developers aware Worker Pool. Node.js’s Worker Pool is implemented will need to use it. This and other libuv-level changes are only in libuv, exposing a Task submission API uv_queue_work, visible to the average application-level developer in the form which we extended as shown in Table II. The original Workers of TimeoutErrors. of the Worker Pool would simply invoke work and then notify Timeout-aware Worker Pool Customers. The Node.js the Event Loop to invoke done. This is also the behavior of our APIs affected by this change (i.e. that create Tasks) are from timeout-aware Workers on the “good path”. When a Task takes the file system, DNS, encryption, and compression modules. too long, however, the potentially-poisoned Worker’s Manager Node.js implements its file system and DNS APIs by 8We discuss other approaches to EHP-safety in §IX. relying on libuv’s file system and DNS support, which on

8 Linux make the appropriate calls to libc. When asynchronous collection are checked regularly (e.g. each loop iteration, func- file system and DNS calls time out in libc, we allow the tion call, etc.), but no earlier than after the current JavaScript Manager to dispose of its Dangling Worker. We modified the instruction finishes. High priority interrupts take effect “imme- libuv file system and DNS implementations to use buffered diately”, and can interrupt a running JavaScript instruction. The memory for memory safety of Dangling Workers. We used only high priority interrupt is TerminateExecution, which pthread_setcancelstate to ensure that threads will not be throws an uncatchable termination exception. Timeouts require canceled while in a non-cancelable libc API [8]. Although this the use of a high priority interrupt because they must be able requires the Hangman to await the completion of the blocked to interrupt long-running individual JavaScript instructions like API, the Manager concurrently replaces its poisoned Worker. str.match(regexp) (possible ReDoS). It remains to handle Node.js’s asynchronous encryption and To support a TimeoutError, we modified V8 as follows: compression APIs, implemented in the Node.js C++ bindings (1) We added the definition of a TimeoutError into the by invoking APIs from openssl and zlib, respectively. If Error class hierarchy; (2) We added a TimeoutInterrupt the Worker Pool notifies these APIs of a timeout, they allow into the list of high-priority interrupts; and (3) We exposed an the Task to be canceled and wait for it to be killed before API by which Node.js can trigger a TimeoutInterrupt. The returning, to ensure that the Worker is no longer modifying TimeoutWatchdog calls this API, which interrupts the current state in these “black-box” libraries nor accessing memory that JavaScript stack by throwing a TimeoutError. might be released after done is invoked. The only JavaScript instructions that V8 instruments to be Finally, when the Event Loop sees that a Task timed out, interruptible are regexp matching and JSON parsing; these are it invokes the application’s Callback with a TimeoutError. the language-level Vulnerable APIs. Other JavaScript instruc- tions are treated as effectively constant-time, so these interrupts B. Timeouts for Callbacks are evaluated e.g. at the end of basic blocks. We agreed with the V8 developers in this9, and did not attempt to instrument Node.cure defends against EHP attacks that target the other JavaScript instructions to poll for pending interrupts. Event Loop by bounding the synchronous time of Callbacks. To make Callbacks timeout-aware, we introduce a Timeout- 2) Timeouts for the Node.js C++ bindings: Watchdog that monitors the start and end of each Callback The TimeoutWatchdog described in §VII-B1 will interrupt and ensures that no Callback exceeds the timeout threshold. any Vulnerable APIs implemented in JavaScript, including We time out JavaScript instructions using V8’s interrupt mech- language-level APIs like regexp and application-level APIs that anism (§VII-B1), and we modify Node.js’s C++ bindings to contain blocking code like while(1);. It remains to give a ensure that Callbacks that enter these bindings will also be sense of time to the Node.js C++ bindings that allow Node.js timed out (§VII-B2). applications to interface with the broader world. 1) Timeouts for JavaScript: Asynchronous C++ bindings are already timeout aware TimeoutWatchdog. Our TimeoutWatchdog instruments (§VII-A), because these perform a fixed amount of work to every Callback using the experimental Node.js async-hooks launch a Task and then return. However, Node.js also has module [18]. These hooks cannot be disabled by an applica- synchronous C++ bindings that complete the entire operation tion. before returning. Node.js includes an asynchronous API for the synchronous operations deemed Vulnerable by the developers, Before a Callback begins, our TimeoutWatchdog starts e.g. fs.readSync/fs.read; we made the synchronous ver- a timer. If the Callback completes before the timer expires sions of these APIs timeout-aware. (“good path”), we erase the timer. If the timer expires, the watchdog signals V8 to interrupt JavaScript execution by In Node.js, the synchronous versions of these APIs execute throwing a TimeoutError. The watchdog then starts another on the Event Loop, either by calling libuv APIs without a timer, ensuring that timeouts while handling the previous callback (file system), or by invoking the openssl and zlib TimeoutError(s) are also detected. APIs directly (encryption and compression). Node.js does not expose a synchronous DNS API. We reviewed the rest of the Although a precise TimeoutWatchdog can be implemented Node.js APIs and determined that execSync also appeared to to carefully honor the timeout threshold, the resulting com- be Vulnerable. We did not make it timeout-aware because this munication overhead between the Event Loop and the Time- API is not intended for use in Callbacks in the server context, outWatchdog can be non-trivial (§VIII-B). A cheaper alter- per our threat model. native is a lazy TimeoutWatchdog, which simply wakes up at intervals of the timeout threshold and checks whether the Because the Event Loop holds the state of all pend- same Callback it last observed is still executing; if so, it ing clients, we cannot time it out using pthread_cancel, emits a TimeoutError. A lazy TimeoutWatchdog reduces the as this would result in the DoS the attacker desired. An- overhead of making a Callback, but decreases the precision of other alternative is to offload the request to the Worker the TimeoutError threshold: it can only guarantee to catch Pool and await its completion, but this would incur high Callbacks that take at least twice the threshold. request latencies when the Worker Pool’s queue is not empty. Instead, we extended the Worker Pool paradigm with a V8 interrupts. To handle the TimeoutWatchdog’s request dedicated Priority Executor whose queue is exposed via a for a TimeoutError, Node.cure extends the interrupt in- frastructure of Node.js’s V8 JavaScript engine [3] to support 9For example, we found that string operations complete in milliseconds timeouts. In V8, low priority interrupts like a pending garbage even when a string is hundreds of MBs long.

9 new API: uv_queue_work_prio. This Executor follows the timeouts are likely due to a network hiccup, and a temporary same Manager-Worker-Hangman paradigm as the Executors blacklist might be more appropriate. in Node.cure’s Worker Pool. To make these Vulnerable syn- Blacklisting slow resources simplifies the management of chronous APIs timeout-aware, we offloaded them to the Prior- Dangling Workers that were accessing these resources. The ity Executor using the existing asynchronous version, and had moment a Worker times out, its Manager marks the resource the Event Loop await the result. Because these synchronous it was accessing as slow, and subsequent accesses will be timed APIs are performed on the Event Loop as part of a Callback, out without interacting with the resource. Any modifications when using a precise TimeoutWatchdog we propagate the Call- made by the Dangling Worker are thus invisible to the rest of back’s remaining time to this Executor’s Manager to ensure the application. that the TimeoutWatchdog’s timer is honored.

C. Timeouts for application-level Vulnerable APIs E. Prototype details 12 Let us revisit Node.cure’s composability guarantee. As de- Node.cure is built on top of Node.js v8.8.1 , the most scribed above, Node.cure makes Tasks (§VII-A) and Callbacks recent long-term support version of Node.js. Our prototype is (§VII-B) timeout-aware to defeat EHP attacks against language for Linux, and added 4,000 lines of C, C++, and JavaScript and framework APIs. The Timeout Approach is composable, code across 50 files spanning V8, libuv, the Node.js C++ so an application whose APIs are composed of calls to these bindings, and the Node.js JavaScript libraries. APIs will be EHP-safe. With an appropriate choice of timeouts, Node.cure passes However, applications can still escape the reach of the all but 42 of the 1,966 Node.js core tests. Six of the failures are Timeout Approach by defining their own C++ bindings. These due to rarely-used file system APIs we did not make timeout- bindings would need to be made timeout-aware, following aware. The rest are due to interactions between experimental the example we set while making Node.js’s Vulnerable C++ or deprecated features and the experimental async-hooks bindings timeout-aware (file system, DNS, encryption, and module on which our TimeoutWatchdog relies, rather than compression). Without refactoring, applications with their own fundamental limitations of our prototype. C++ bindings will not be EHP-safe. In our evaluation we found In Node.cure, timeouts for Callbacks and Tasks are con- that application-defined C++ bindings are rare (§VIII-C). trolled by separate environment variables. Our implementation would readily accommodate a fine-grained assignment of time- D. Exploring policies for slow resources outs for individual Callbacks and Tasks. Our Node.cure runtime detects and aborts long-running Callbacks and Tasks executing on Node.js’s Event Handlers. VIII.EVALUATING NODE.CURE For unique evil input this is the best we can do, as accurately We evaluated Node.cure in terms of its effectiveness predicting whether a not-yet-seen input will time out is diffi- (§VIII-A), costs (§VIII-B), and security guarantees (§VIII-C). cult. If an attacker might re-use the same evil input multiple With a lazy TimeoutWatchdog, Node.cure defeats all known times, however, we can track whether or not an input led to EHP attacks with low overhead on the “good path”, estimated a timeout and short-circuit subsequent requests that use this at 1.3x using micro-benchmarks and manifesting at 1.0x-1.24x input with an early timeout. using real applications. Node.cure offers EHP-safety to all While evil input memoization could in principle be applied Node.js applications that do not define their own C++ bindings, to any API, the size of the input space to track is a limiting a rare practice. factor. The evil inputs that trigger CPU-bound EHP attacks — All measurements provided in this section were obtained e.g. ReDoS— exploit properties of the vulnerable algorithm on an otherwise-idle desktop running Ubuntu 16.04.1 (Linux and are thus usually not unique. In contrast, the evil inputs 4.8.0-56-generic), 16GB RAM, Intel i7 @3.60GHz, 4 physical that trigger I/O-bound EHP attacks like ReadDoS must name a cores with 2 threads per core. For a baseline we used Node.js particularly slow resource, presenting an opportunity to short- LTS v8.8.1 (dc6bbb44da) from which Node.cure was derived, circuit requests on this slow resource. compiled with the same flags. We ran Node.js with 4 Workers. In Node.cure we implemented a slow resource manage- ment policy for libuv’s file system APIs, targeting those A. Effectiveness that reference a single resource (e.g. open, read, close). To evaluate the effectiveness of Node.cure, we developed Node.cure instruments libuv’s open and close to maintain an EHP test suite that makes EHP attacks of all types shown a mapping from file descriptors to inode numbers. When one in Table I. Our suite is comprehensive and conducts EHP of the APIs we manage times out, we mark the associated attacks using every Vulnerable API we identified, including inode number as slow, as well as the file descriptor if one the language level (regexp, JSON), framework level (all rel- was provided. We took the simple approach of permanently evant APIs from the file system, DNS, cryptography, and blacklisting these aliases, immediately aborting all subsequent compression modules), and application level (infinite loops, accesses10. This policy is appropriate for the file system, where long string operations, array sorting, etc.). This test suite access times are not likely to change11. For DNS queries, includes each type of real EHP from our study of EHP 10To avoid fd leaks, we do not eagerly abort close. vulnerabilities in npm modules (§IV-C). Node.cure defeats all 11Of course, if the slow resource is in a networked file system like NFS 92 EHP attacks in this suite: each Vulnerable synchronous API or GPFS, slowness might be due to a network hiccup, and incorporating temporary device-level blacklisting might be more appropriate. 12Specifically, Node.js v8.8.1 commit dc6bbb44da on Oct. 25, 2017.

10 TimeoutError Benchmark Description Overheads (lazy, precise) throws a , and each Vulnerable asynchronous LokiJS [12] Server, Key-value store 1.00, 1.00 API returns a TimeoutError. Node Acme-Air [2] Server, Airline simulation 1.02, 1.03 [29] Server, P2P torrenting 1.02, 1.02 To finish our suite, we ported a vulnerable npm module to ws [30] Utility, websockets 1.00, 1.00* Node.cure. We selected the node-oniguruma [15] module, Three.js [26] Utility, graphics library 1.08, 1.09 Express [6] Middleware 1.06, 1.24 which offers asynchronous regexp APIs. Since the Oniguruma Sails [24] Middleware 1.14, 1.23* regexp engine is vulnerable to ReDoS, its APIs are Vulnerable, Restify [23] Middleware 1.14, 1.63* but they evaluate the regexps in a Task rather than in a Koa [10] Middleware 1.24, 1.60 Callback, poisoning a Worker rather than the Event Loop. We Table III. Results of our macro-benchmark evaluation of Node.cure’s made node-oniguruma timeout-aware by using the modified overhead. Where available, we used the benchmarks defined by the project Worker Pool uv_queue_work API (Table II). Node.cure now itself. Otherwise, we ran its test suite. Overheads are the ratio of Node.cure’s performance to that of the baseline Node.js, averaged over defends ReDoS attacks against this module’s Vulnerable APIs. several steady-state runs. We report the average overhead because we observed no more than 3% standard deviation in all but LokiJS, which averaged 8% standard deviation across our samples of its sub-benchmarks. B. Costs *: Median of sub-benchmark overheads. The Timeout Approach introduces runtime overheads, which we evaluated using micro-benchmarks and macro- the Worker Pool will also pay the overhead of Worker buffering benchmarks. It also may require some refactoring. (variable, perhaps 1.3x). We chose macro-benchmarks using a GitHub potpourri Overhead: Micro-benchmarks. Whether or not they time technique: we searched GitHub for “language:JavaScript”, out, Node.cure introduces several sources of overheads to sorted by “Most starred”, and identified server-side projects monitor Callbacks and Tasks. We evaluated the following from the first 50 results. To add additional complete servers, overheads using micro-benchmarks: we also included LokiJS [12], a popular Node.js-based key- 1) Every time V8 checks for interrupts, it now tests for a value store, and IBM’s Acme-Air airline simulation [2]. pending timeout as well. Table III lists the macro-benchmarks we used and the per- 2) Both the lazy and precise versions of the TimeoutWatch- formance overhead for each type of TimeoutWatchdog. These dog must instrument every asynchronous Callback using results show that Node.cure introduces minimal overhead on async-hooks, with noticeable overhead dependent on the real server applications, and they confirm the value of a lazy complexity of the Callback. TimeoutWatchdog. Matching our micro-benchmark assessment 3) To ensure memory safety for Dangling Workers, Workers of the TimeoutWatchdog’s async-hooks overhead, the overhead operate on buffered data that must be allocated when the from Node.cure increased as the complexity of the Callbacks Task is submitted. For example, Workers must copy the I/O used in the macro-benchmarks decreased — the middleware buffers supplied to read and write twice. benchmarks sometimes used empty Callbacks to handle client requests. In non-empty Callbacks like those of the real servers, New V8 interrupt. The overhead of our V8 Timeout inter- this overhead is amortized. rupt was negligible, simply a test for one more interrupt in V8’s interrupt infrastructure. Refactoring. Node.cure adds a TimeoutError to the JavaScript and Node.js specifications. See §IX. TimeoutWatchdog’s async hooks. We measured the addi- tional cost of invoking a Callback due to TimeoutWatchdog’s C. Security Guarantees async hooks. A lazy TimeoutWatchdog increases the cost of invoking a Callback by 2.4x, and a precise TimeoutWatchdog We discussed the security guarantees of the Timeout increases the cost by 7.9x. Of course, this overhead is most Approach in §VI-D. Here we discuss the precise security noticeable in empty Callbacks. As the number of instructions guarantees of our Node.cure prototype. As described in §VII, in a Callback increases, the cost of the async-hooks amortizes. we gave a sense of time to JavaScript (including interrupting For example, if the Callback executes 500 empty loop itera- long-running regexp and JSON operations before their comple- tions, the lazy overhead drops to 1.3x and the precise overhead tion) as well as to framework APIs identified by the Node.js drops to 2.7x; at 10,000 empty loop iterations, the lazy and developers as long-running (file system, DNS, cryptography, precise overheads are 1.01x and 1.15x, respectively. and compression). If any other language or framework APIs might be long-running, Node.cure could be extended to give Worker buffering. Our timeout-aware Worker Pool requires them a sense of time as well. buffering data to accommodate Dangling Workers, affecting DNS queries and file system I/O. While this overhead will Because the Timeout Approach is composable, application- vary from API to API, our micro-benchmark indicated a 1.3x level APIs composed of these timeout-aware language and overhead using read and write calls with a 64KB buffer. framework APIs are also timeout-aware. However, Node.js also permits applications to add their own C++ bindings, and these Overhead: Macro-benchmarks. Our micro-benchmarks will not be timeout-aware without refactoring. suggested that the overhead introduced by Node.cure may vary widely depending on what an application is doing. Ap- To evaluate the of this limitation, we measured plications that make little use of the Worker Pool will pay how frequently developers use C++ bindings. To this end, the overhead of the additional V8 interrupt (minimal) and we examined a subset of the 550,155 modules in the npm the TimeoutWatchdog’s async hooks, whose cost is strongly ecosystem as of 22 November 2017. We npm install’d dependent on the size of the Callbacks. Applications that use 125,173 modules (about 20%) using node v8.9.1 (npm v5.5.1),

11 resulting in 23,070 successful installations13. Of these, 695 be able to tolerate arbitrary TimeoutErrors, and developers (3%) had C++ bindings14. should pursue low-variance Callbacks and Tasks. From this we conclude that the use of user-defined C++ Respond to TimeoutErrors. When a Callback or Task bindings is uncommon in npm modules, not surprising since due to a client request times out, the application can attempt many developers in this community come from client-side to recover, or can return an error to the client. The easiest JavaScript contexts and may not have strong C++ skills. This path is to return an error; any resources held on behalf of implies, but does not directly indicate, the popularity of C++ the client must be released. Refactoring along these lines will bindings in Node.js applications, as there may be applications be easier for applications that have minimal, well-structured that use C++ bindings not published to npm. Nevertheless, we interactions between client requests. Appropriate try-catch conclude that the need to refactor C++ bindings to be timeout- blocks should be added to Callbacks, and timeout-handling aware would not place a significant burden on the community. code should be added to completion Callbacks for Tasks. As a template, we performed several such refactorings our- selves in the Node.js framework and node-oniguruma. Minimize variance. The Timeout Approach reduces the cost of handling an EHP’s evil input from “arbitrarily long” to “the difference between the timeout threshold and the cost of IX.DISCUSSION legitimate client requests”. A “tight” timeout threshold, one Questions abound: What should the EDA community do close to the average cost of legitimate client requests, will without first-class timeouts? Conversely, what might timeout- minimize the effect of an EHP attack: malicious requests will aware programs look like? How generalizable is the Timeout be afforded roughly the same time as legitimate ones. Approach? And are there other ways to achieve EHP-safety? An ideal application will thus have a tight distribution of What should the EDA community do without timeouts? Lifelines’ synchronous times for legitimate client requests. Although they lack first-class timeouts, developers in the EDA If the standard deviation is small, with few or no long community must still be concerned about EHP attacks. We outliers, then developers can configure a tight timeout without believe their best approach is the C-WCET Principle (§VI-C). discomfiting legitimate users. On the other hand, applications in which the distribution of Lifelines’ synchronous times has a We have pursued a two-pronged approach to help develop- 15 long tail require configuring a more generous timeout, allowing ers in the Node.js community follow the C-WCET Principle . malicious users to occupy more of the server’s time and First, we have prepared a guide for the nodejs.org website degrade its throughput. providing guidelines for building EHP-safe EDA-based appli- cations. This guide has received interest from key players in Developers can draw on existing tools for performance the Node.js community and been preliminarily approved. Our measurement to identify the distribution of synchronous times. guide will sit on the nodejs.org website, where new Node.js They can use this to minimize variation in client requests, e.g. developers go to learn best practices for Node.js development, by applying the C-WCET Principle. and we believe that it will give developers important insights into secure Node.js programming practices. How generalizable is the Timeout Approach? The Timeout Approach can be applied to any EDA framework. Second, we studied the Node.js implementation and Callbacks must be made interruptible, and Tasks must be made identified several unnecessarily Vulnerable APIs in abortable. While these properties are more readily obtained in Node.js (fs.readFile, crypto.randomFill, and an interpreted language like JavaScript (applicable to frame- crypto.randomBytes, all of which submit a single works in Python, Ruby, etc.), they could in principle be en- unpartitioned Task). Our PRs warning developers about forced in compiled or VM-driven languages using techniques the risks of EHP attacks using these APIs have been like dynamic instrumentation and binary rewriting. merged. We also submitted a trial pull request repairing the simplest of these, by partitioning fs.readFile, which Are there other avenues toward EHP-safety? In §VI we has been preliminarily approved after a discussion on the discussed several EHP-safe designs. There are of course other performance-security tradeoff involved. In summary, the approaches, like significantly increasing the size of the Worker Node.js community seems open to learning more about EHP Pool, performing speculative concurrent execution [46], or attacks and to seeing their framework and documentation using explicitly preemptable Callbacks and Tasks. However, hardened against them. each of these is a variation on the same theme: dedicating resources to every client, i.e. the One Thread Per Client What might timeout-aware programs look like? As Architecture. The OTPCA is generally safe from EHP attacks, discussed in §VII, applying the Timeout Approach while but only at the cost of scalability. If the server community preserving the EDA changes the language and framework spec- wishes to use the EDA, which offers high responsiveness and ifications. In particular, all synchronous APIs may now throw scalability through the use of cooperative multitasking, we a TimeoutError, and all asynchronous APIs may be fulfilled believe the Timeout Approach is a good path to EHP-safety. with a TimeoutError. These changes have two implications for application developers: Timeout-aware applications must X.RELATED WORK 13Module installations failed for reasons including “No such module”, “Unsupported operating system”, and “Unsupported Node.js version”. Throughout the paper we have referred to specific related 14We determined the presence of C++ bindings by searching a module’s work. Here we place our work in its broader context, situated directory tree for files with endings like “.h” and “.c[pp]”. at the intersection of the Denial of Service literature and the 15We omit references to meet the double-blind requirement. Event-Driven Architecture community.

12 Denial of Service attacks. Research on DoS can be taken a systems perspective to resolve it. broadly divided into network-level attacks (e.g. distributed DoS attacks) and application-level attacks [37]. Since EHP attacks Synchronous EHP attacks. The server-side EDA practioner exploit the semantics of the application, they are application- community is aware of the risk of DoS due to EHP on the level attacks, not easily defeated by network-level defenses. Event Loop. A common rule of thumb is “Don’t block the Event Loop”, advised by many tutorials as well as recent books DoS attacks seek to exhaust the resources critical to the about EDA programming for Node.js [96], [45]. Wandschnei- proper operation of a server, and various kinds of exhaustion der suggests worst-case linear-time partitioning on the Event have been considered. The brunt of the literature has focused Loop [96], while Casciaro advises developers to partition any on exhausting the CPU. Some have examined the gap between computation on the Event Loop, and to offload computationally common-case and worst-case performance in algorithms and expensive tasks to the Worker Pool [45]. Our work offers a data structures, for example in quicksort [76], hashing [51], more rigorous taxonomy and evaluation of EHP attacks, and and exponentially backtracking algorithms like ReDoS [50], in particular we extend the idea of “Don’t block the Event [88] and rule matching [90]. Other CPU-centric DoS attacks Loop” to the Worker Pool. have targeted more general programming issues, showing how to trigger infinite recursion [47] and how to trigger [91] We are not aware of research work discussing the use or detect [42] infinite loops. But the CPU is not the only of the general EDA on the server side and the resulting vulnerable resource, as shown by Olivo et al.’s work polluting EHP vulnerability. However, EHP attacks on the server side databases to increase the cost of queries [80]. We are not aware correspond to performance bugs in client side EDA programs. of prior research work that incurs DoS using the file system, Liu et al. studied such bugs [74], and Lin et al. proposed as do our Large/Slow File ReadDoS attacks, though we have an automatic refactoring to offload expensive tasks from the found a handful of CVE reports to this effect16. Event Loop to the Worker Pool [73]. This approach is sound on a client-side system like Android, where the demand for Our work identifies and shows how to exploit the most threads is limited, but is not applicable to the server-side EDA limited resource of the EDA: Event Handlers. The OTPCA because the small Worker Pool is shared among all clients. For approach ensures that for production-grade web services, at- the server-side EDA, Lin et al.’s approach would need to be tackers must submit enough queries to overwhelm the operat- extended to incorporate automatic C-WCET partitioning. ing system’s preemptive scheduler, no mean feat. In contrast, in the EDA there are only a few Event Handlers that need As future work, we believe that research into computational be poisoned. Although we prove our point using previously- complexity measurement ([85], [62], [43]) and estimation reported attacks like ReDoS, the underlying resource we are ([81], [65]) might be adapted to the Node.js context for EHP exhausting is the small, fixed-size set of Event Handlers vulnerability detection and automatic refactoring using the C- deployed in EDA-based services. WCET Principle. This is made difficult, however, because of the complexity of statically analyzing JavaScript and the EDA Our work also takes a somewhat unusual tack in the ([75], [39], [64], [97], [59]). application-level DoS space by providing a defense. Because application-level vulnerabilities rely on the semantics of the service being exploited, defenses must often be prepared XI.CONCLUSION on a case-by-case basis. In consequence, many works on application-level DoS focus on providing detectors (e.g. [47], The Event-Driven Architecture holds great promise for [91]), not defenses. When the application-level exploits take scalable web services, and it is increasingly popular in the advantage of an underlying library, however, researchers have software development community. In this paper we defined an offered replacement libraries (e.g. [51]). Our Node.cure pro- Event Handler Poisoning attack, a novel class of DoS attack totype shows how to achieve strong EHP safety by making against the EDA that exploits the cooperative multitasking at timeouts a first-class member of Node.js. its heart. We showed that there are EHP attacks hidden among previously-reported vulnerabilities, and observed that the EDA Security in the EDA. Though researchers have studied community is incautious about the risks of EHP attacks. security vulnerabilities in different EDA frameworks, most prominently client-side applications, the security properties We proposed and evaluated two prototype defenses against of the EDA itself have not been researched in detail. Some EHP attacks: the C-WCET Principle and the Timeout Ap- examples include security studies on client-side JavaScrip- proach. Though we deemed the C-WCET Principle imprac- t/Web [72], [69], [53], [77] and Java/Android [58], [57], [41], tical for strong security guarantees, our Timeout Approach [67] applications. These have focused on language issues (e.g., prototype, Node.cure, can readily defend Node.js applications eval in JavaScript [92]) or platform-specific issues (e.g., against all known EHP attacks. Everything needed to repro- 17 DOM in web browsers [72]). However, there has been little duce our results is available on GitHub . academic engagement with the security aspects of server-side Our findings can be directly applied by the EDA com- EDA frameworks. Ojamaa et al. [79] performed a Node.js- munity, and we hope they influence the design of existing specific high-level survey, and DeGroef et al. [54] proposed and future EDA frameworks. With the rise of popularity in a method to securely integrate third-party modules from npm. the EDA, EHP attacks will become an increasingly critical Ours is the first to identify the EHP problem, and we have DoS vector, a vector that can be defeated using the Timeout Approach. 16For Large Files, CVE-2001-0834, CVE-2008-1353, CVE-2011-1521, and CVE-2015-5295 mention DoS by ENOMEM using /dev/zero. For Slow Files (/dev/random), see CVE-2012-1987 and CVE-2016-6896. 17We omit the reference to meet the double-blind requirement.

13 REFERENCES [42] J. Burnim, N. Jalbert, C. Stergiou, and K. Sen, “Looper: Lightweight detection of infinite loops at runtime,” in International Conference on [1] “ab – benchmarking tool,” Automated Software Engineering (ASE), 2009. https://httpd.apache.org/docs/2.4/programs/ab.html. [43] J. Burnim, S. Juvekar, and K. Sen, “WISE: Automated Test Generation [2] “acmeair-node,” https://github.com/acmeair/acmeair-nodejsh. for Worst-Case Complexity,” in International Conference on Software [3] “Chrome V8.” [Online]. Available: https://developers.google.com/v8/ Engineering (ICSE), 2009. [4] “Cylon.js,” https://cylonjs.com/. [44] E. Cambiaso, G. Papaleo, and M. Aiello, “Taxonomy of Slow DoS [5] “EventMachine.” [Online]. Available: http://rubyeventmachine.com/ Attacks to Web Applications,” Recent Trends in Computer Networks and Distributed Systems Security, pp. 195–204, 2012. [6] “express,” https://github.com/expressjs/express. [45] M. Casciaro, Node.js Design Patterns, 1st ed., 2014. [7] “Github,” https://github.com/. [46] G. Chadha, S. Mahlke, and S. Narayanasamy, “Accelerating Asyn- [8] “Gnu libc – safety concepts,” chronous Programs Through Event Sneak Peek,” in International Sym- https://www.gnu.org/software/libc/manual/html node/POSIX-Safety- posium on Computer Architecture (ISCA), 2015. Concepts.html. [47] R. Chang, G. Jiang, F. Ivanciˇ c,´ S. Sankaranarayanan, and V. Shmatikov, [9] “iot-nodejs,” https://github.com/ibm-watson-iot/iot-nodejs. “Inputs of coma: Static detection of denial-of-service vulnerabilities,” [10] “Koa,” https://github.com/koajs/koa. in IEEE Computer Security Foundations Symposium (CSF), 2009. [11] “libuv,” https://github.com/libuv/libuv. [48] E. Corry, C. P. Hansen, and L. R. H. Nielsen, [12] “Lokijs,” https://github.com/techfort/LokiJS. “Irregexp, Google Chrome’s New Regexp Implementation,” [13] “Module Counts.” [Online]. Available: http://www.modulecounts.com/ 2009. [Online]. Available: https://blog.chromium.org/2009/02/irregexp- google-chromes-new-regexp.html [14] “Mozilla developer network: Javascript,” https://developer.mozilla.org/en-US/docs/Web/JavaScript. [49] R. Cox, “Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...),” 2007. [Online]. [15] “Node-oniguruma regexp library,” https://github.com/atom/node- Available: https://swtch.com/ rsc/regexp/regexp1.html oniguruma. [50] S. Crosby, “Denial of service through regular expressions,” USENIX [16] “Node security platform,” https://nodesecurity.io/advisories. Security work in progress report, 2003. [17] “Node.js,” http://nodejs.org/. [51] S. A. Crosby and D. S. Wallach, “Denial of Service via Algorithmic [18] “Nodejs async hooks,” https://nodejs.org/api/async hooks.html. Complexity Attacks,” in USENIX Security, 2003. [19] “Node.js developer guides,” https://nodejs.org/en/docs/guides/. [52] J. Davis, A. Thekumparampil, and D. Lee, “Node.fz: Fuzzing the [20] “Node.js thread pool documentation,” Server-Side Event-Driven Architecture,” in European Conference on http://docs.libuv.org/en/v1.x/threadpool.html. Computer Systems (EuroSys), 2017. [21] “Node.js v8.9.1 documentation,” https://nodejs.org/dist/latest- [53] W. De Groef, D. Devriese, N. Nikiforakis, and F. Piessens, “Flowfox: v8.x/docs//. A web browser with flexible and precise information flow control,” ser. [22] “Re2 regular expression engine,” https://github.com/google/re2. CCS, 2012. [23] “restify,” https://github.com/restify/node-restify. [54] W. De Groef, F. Massacci, and F. Piessens, “NodeSentry: Least-privilege library integration for server-side JavaScript,” in Annual Computer [24] “sails,” https://github.com/balderdashy/sails. Security Applications Conference (ACSAC), 2014. [25] “Snyk.io,” https://snyk.io/vuln/. [55] A. Desai, V. Gupta, E. Jackson, S. Qadeer, S. Rajamani, and D. Zufferey, [26] “three.js,” https://github.com/mrdoob/three.js. “P: Safe asynchronous event-driven programming,” in ACM SIGPLAN [27] “Twisted.” [Online]. Available: https://twistedmatrix.com/trac/ Conference on Programming Language Design and Implementation https://github.com/twisted/twisted (PLDI), 2013. [28] “Vert.x,” http://vertx.io/. [56] S. C. Drye and W. C. Wake, Java Swing reference. Manning Publications Company, 1999. [29] “webtorrent,” https://github.com/webtorrent/webtorrent. [57] W. Enck, D. Octeau, P. McDaniel, and S. Chaudhuri, “A study of [30] “ws: a node.js websocket library,” https://github.com/websockets/ws. android application security,” in USENIX Security, 2011. [31] “Zotonic,” http://zotonic.com/. [58] W. Enck, M. Ongtang, and P. McDaniel, “Understanding android [32] “The Calendar and Contacts Server,” https://github.com/Apple/Ccs- security,” IEEE Security and Privacy, 2009. calendarserver, 2007. [59] Y. Feng, S. Anand, I. Dillig, and A. Aiken, “Apposcopy: Semantics- [33] “Ubuntu One: Technical Details,” https://wiki.ubuntu.com/UbuntuOne/ based detection of android malware through static analysis,” in FSE, TechnicalDetails, 2012. 2014. [34] “New Node.js Foundation Survey Reports New “Full Stack” In [60] S. Ferg, “Event-driven programming: introduction, tutorial, history,” Demand Among Enterprise Developers,” 2016. [Online]. Available: 2006. [Online]. Available: http://eventdrivenpgm.sourceforge.net/ https://nodejs.org/en/blog/announcements/nodejs-foundation-survey/ [61] A. S. Foundation, “The Apache web server.” [Online]. Available: [35] “Posix poll specification,” http://pubs.opengroup.org/onlinepubs/ http://www.apache.org 9699919799/functions/poll.html, 2016. [62] S. F. Goldsmith, A. S. Aiken, and D. S. Wilkerson, “Measuring Empiri- [36] “Random(4),” http://man7.org/linux/man-pages/man4/random.4.html, cal Computational Complexity,” in Foundations of Software Engineering 2017. (FSE), 2007. [37] M. Abliz, “Internet Denial of Service Attacks and Defense Mecha- [63] D. Goodman and P. Ferguson, Dynamic HTML: The Definitive Refer- nisms,” Tech. Rep., 2011. ence, 1st ed. O’Reilly, 1998. [38] S. Alimadadi, A. Mesbah, and K. Pattabiraman, “Understanding Asyn- [64] M. I. Gordon, D. Kim, J. Perkins, L. Gilhamy, N. Nguyenz, , and M. Ri- chronous Interactions in Full-Stack JavaScript,” ICSE, 2016. nard, “Information flow analysis of android applications in droidsafe.” [39] S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, A. Bartel, J. Klein, in NDSS, 2015. Y. Le Traon, D. Octeau, and P. McDaniel, “Flowdroid: Precise context, [65] S. Gulwani, K. K. Mehra, and T. Chilimbi, “SPEED: Precise and flow, field, object-sensitive and lifecycle-aware taint analysis for android Efficient Static Estimation of Program Computational Complexity,” in apps,” in PLDI, 2014. Principles of Programming Languages (POPL), 2009. [40] B. Atkinson, Hypercard. Apple Computer, 1988. [66] J. Harrell, “Node.js at PayPal,” 2013. [Online]. Available: [41] D. Barrera, H. G. Kayacik, P. C. van Oorschot, and A. Somayaji, “A https://www.paypal-engineering.com/2013/11/22/node-js-at-paypal/ methodology for empirical analysis of permission-based security models and its application to android,” in CCS, 2010.

14 [67] S. Heuser, A. Nadkarni, W. Enck, and A.-R. Sadeghi, “Asm: A able: http://www.ebaytechblog.com/2013/05/17/how-we-built-ebays- programmable interface for extending android security,” in Proceedings first-node-js-application/ of the 23rd USENIX Conference on Security Symposium, ser. SEC’14, [83] V. S. Pai, P. Druschel, and W. Zwaenepoel, “Flash: An Efficient and 2014. Portable Web Server,” in USENIX Annual Technical Conference (ATC), [68] IBM, “Node-RED.” [Online]. Available: https://nodered.org/ 1999. [69] X. Jin, X. Hu, K. Ying, W. Du, H. Yin, and G. N. Peri, “Code injection [84] D. Pariag, T. Brecht, A. Harji, P. Buhr, A. Shukla, and D. R. Cheriton, attacks on html5-based mobile apps: Characterization, detection and “Comparing the performance of web server architectures,” in European mitigation,” in CCS, 2014. Conference on Computer Systems (EuroSys). ACM, 2007. [70] J. Kirrage, A. Rathnayake, and H. Thielecke, “Static Analysis for [85] P. P. Puschner and C. Koza, “Calculating the Maximum Execution Time Regular Expression Denial-of-Service Attacks,” Network and System of Real-Time Programs,” Real-Time Systems, vol. 1, no. 2, pp. 159–176, Security, vol. 7873, pp. 35–148, 2013. 1989. [71] H. C. Lauer and R. M. Needham, “On the duality of operating system [86] A. Rathnayake and H. Thielecke, “Static Analysis for Regular Expres- structures,” ACM SIGOPS Operating Systems Review, vol. 13, no. 2, sion Exponential Runtime via Substructural Logics,” CoRR, 2014. pp. 3–19, 1979. [87] R. Rogers, J. Lombardo, Z. Mednieks, and B. Meike, Android applica- [72] S. Lekies, B. Stock, and M. Johns, “25 million flows later: Large-scale tion development: Programming with the Google SDK, 1st ed. O’Reilly detection of dom-based xss,” in CCS, 2013. Media, Inc., 2009. [73] Y. Lin, C. Radoi, and D. Dig, “Retrofitting Concurrency for Android [88] A. Roichman and A. Weidman, “VAC - ReDoS: Regular Expression Applications through Refactoring,” in ACM International Symposium Denial Of Service,” Open Web Application Security Project (OWASP), on Foundations of Software Engineering (FSE), 2014. 2009. [74] Y. Liu, C. Xu, and S.-C. Cheung, “Characterizing and detecting perfor- [89] A. Silberschatz, P. B. Galvin, and G. Gagne, Operating System Con- mance bugs for smartphone applications,” in International Conference cepts, 9th ed. Wiley Publishing, 2012. on Software Engineering (ICSE), 2014. [90] R. Smith, C. Estan, and S. Jha, “Backtracking Algorithmic Complexity [75] M. Madsen, F. Tip, and O. Lhotak, “Static analysis of event-driven Attacks Against a NIDS,” in ACSAC, 2006, pp. 89–98. Node. js JavaScript applications,” in ACM SIGPLAN International [91] S. Son and V. Shmatikov, “SAFERPHP Finding Semantic Vulnerabil- Conference on Object-Oriented Programming, Systems, Languages, and ities in PHP Applications,” in Workshop on Programming Languages Applications (OOPSLA). ACM, 2015. and Analysis for Security (PLAS), 2011, pp. 1–13. [76] M. D. Mcilroy, “Killer adversary for quicksort,” Software - Practice [92] C.-A. Staicu, M. Pradel, and B. Livshits, “Understanding and Automat- and Experience, vol. 29, no. 4, pp. 341–344, 1999. ically Preventing Injection Attacks on Node.js,” Tech. Rep., 2016. [77] N. Nikiforakis, L. Invernizzi, A. Kapravelos, S. Van Acker, W. Joosen, [93] B. Sullivan, “Security Briefs - Regular Expression Denial of Service C. Kruegel, F. Piessens, and G. Vigna, “You are what you include: Attacks and Defenses,” Tech. Rep., 2010. [Online]. Available: Large-scale evaluation of remote javascript inclusions,” in CCS, 2012. https://msdn.microsoft.com/en-us/magazine/ff646973.aspx [78] J. O’Dell, “Exclusive: How LinkedIn used Node.js and HTML5 [94] R. E. Sweet, “The Mesa programming environment,” ACM SIGPLAN to build a better, faster app,” 2011. [Online]. Available: Notices, vol. 20, no. 7, pp. 216–229, 1985. http://venturebeat.com/2011/08/16/linkedin-node/ [95] B. Van Der Merwe, N. Weideman, and M. Berglund, “Turning [79] A. Ojamaa and K. Duuna, “Assessing the security of Node.js platform,” Evil Regexes Harmless,” in SAICSIT, 2017. [Online]. Available: in 7th International Conference for Internet Technology and Secured https://doi.org/10.1145/3129416.3129440 Transactions (ICITST), 2012, pp. 348–355. [96] M. Wandschneider, Learning Node.js: A Hands-on Guide to Building [80] O. Olivo, I. Dillig, and C. Lin, “Detecting and Exploiting Second Web Applications in JavaScript. Pearson Education, 2013. Order Denial-of-Service Vulnerabilities in Web Applications,” ACM [97] F. Wei, S. Roy, X. Ou, and Robby, “Amandroid: A precise and general Conference on Computer and Communications Security (CCS), 2015. inter-component data flow analysis framework for security vetting of [81] ——, “Static Detection of Asymptotic Performance Bugs in Collection android apps,” in Proceedings of the 2014 ACM SIGSAC Conference Traversals,” PLDI, 2015. on Computer and Communications Security, ser. CCS ’14, 2014. [82] S. Padmanabhan, “How We Built eBays [98] M. Welsh, D. Culler, and E. Brewer, “SEDA : An Architecture for Well- First Node.js Application,” 2013. [Online]. Avail- Conditioned, Scalable Internet Services,” in Symposium on Operating Systems Principles (SOSP), 2001.

15