Analysing the Use of Outdated Javascript Libraries on the Web
Total Page:16
File Type:pdf, Size:1020Kb
Thou Shalt Not Depend on Me: Analysing the Use of Outdated JavaScript Libraries on the Web Tobias Lauinger, Abdelberi Chaabane, Sajjad Arshad, William Robertson, Christo Wilson and Engin Kirda Northeastern University {toby, 3abdou, arshad, wkr, cbw, ek}@ccs.neu.edu Abstract—Web developers routinely rely on third-party Java- scripts or HTML into vulnerable websites via a crafted tag. As Script libraries such as jQuery to enhance the functionality of a result, it is of the utmost importance for websites to manage their sites. However, if not properly maintained, such dependen- library dependencies and, in particular, to update vulnerable cies can create attack vectors allowing a site to be compromised. libraries in a timely fashion. In this paper, we conduct the first comprehensive study of To date, security research has addressed a wide range of client-side JavaScript library usage and the resulting security client-side security issues in websites, including validation [30] implications across the Web. Using data from over 133 k websites, we show that 37 % of them include at least one library with a and XSS ([17], [36]), cross-site request forgery [4], and session known vulnerability; the time lag behind the newest release of fixation [34]. However, the use of vulnerable JavaScript libraries a library is measured in the order of years. In order to better by websites has not received nearly as much attention. In understand why websites use so many vulnerable or outdated 2014, a series of blog posts presented cursory measurements libraries, we track causal inclusion relationships and quantify highlighting that major websites included known vulnerable different scenarios. We observe sites including libraries in ad hoc libraries ([25], [26], [24]). These findings echo warnings from and often transitive ways, which can lead to different versions of other software ecosystems like Android [3], Java [32] and the same library being loaded into the same document at the same Windows [21], which show that vulnerable libraries continue time. Furthermore, we find that libraries included transitively, or to exist in the wild even when they are widely known to via ad and tracking code, are more likely to be vulnerable. This contain severe vulnerabilities. Given that JavaScript dependency demonstrates that not only website administrators, but also the dynamic architecture and developers of third-party services are management is relatively primitive and corresponding tools to blame for the Web’s poor state of library management. are not as well-established as in more mature ecosystems, these findings suggest that security issues caused by outdated The results of our work underline the need for more thorough JavaScript libraries on the Web may be widespread. approaches to dependency management, code maintenance and third-party code inclusion on the Web. In this paper, we conduct the first comprehensive study on the security implications of JavaScript library usage in websites. We seek to answer the following questions: I. INTRODUCTION The Web is arguably the most popular contemporary • Where do websites load JavaScript libraries from (i.e., programming platform. Although websites are relatively easy to first or third-party domains), and how frequently are create, they are often composed of heterogeneous components these domains used? such as database backends, content generation engines, multiple • How current are the libraries that websites are using, scripting languages and client-side code, and they need to deal and do they contain known vulnerabilities? with unsanitised inputs encoded in several different formats. • Are web developers intentionally including JavaScript Hence, it is no surprise that it is challenging to secure websites libraries, or are these dependencies caused by adver- because of the large attack surface they expose. tising and tracking code? • Are existing remediation strategies effective or widely One specific, significant attack surface are vulnerabilities used? related to client-side JavaScript, such as cross-site scripting (XSS) and advanced phishing. Crucially, modern websites often • Are there additional technical, methodological, or include popular third-party JavaScript libraries, and thus are at organisational changes that can improve the security risk of inheriting vulnerabilities contained in these libraries. For of websites with respect to JavaScript library usage? example, a 2013 XSS vulnerability in the jQuery [13] library Note that the focus of this paper is not measuring the security before version 1.6.3 allowed remote attackers to inject arbitrary state of specific JavaScript libraries. Rather, our goal (and primary contribution) is to empirically examine whether website Permission to freely reproduce all or part of this paper for noncommercial operators keep their libraries current and react to publicly purposes is granted provided that copies bear this notice and the full citation disclosed vulnerabilities. on the first page. Reproduction for commercial purposes is strictly prohibited without the prior written consent of the Internet Society, the first-named author Answering these questions necessitated solving three funda- (for reproduction of an entire paper only), and the author’s employer if the mental methodological challenges. First, there is no centralised paper was prepared within the scope of employment. repository of metadata pertaining to JavaScript libraries and NDSS ’17, 26 February - 1 March 2017, San Diego, CA, USA Copyright 2017 Internet Society, ISBN 1-1891562-46-0 their versions, release dates, and known vulnerabilities. To http://dx.doi.org/10.14722/ndss.2017.23414 address this, we manually constructed a catalogue containing all “release” versions of 72 of the most popular open-source • We present results on the origins of vulnerable libraries, including detailed vulnerability information on a JavaScript library inclusions, which allows us to subset of 11 libraries. Second, web developers often modify contrast the security posture of website developers with JavaScript libraries by reformatting, restructuring or appending third-party modules such as WordPress, advertising or code, which makes it difficult to detect library usage in the tracking networks, and social media widgets. wild. We solve this problem through a combination of static • We show that a large number of websites include and dynamic analysis techniques. Third, to understand why JavaScript libraries in unexpected ways, such as mul- specific libraries are loaded by a given site, we need to track all tiple inclusions of different library versions into the of the causal relationships between page elements (e.g., script same document, which may impact their attack surface. s1 in frame f1 injects script s2 into frame f2). To solve this, • We find existing remediation strategies to be ineffective we developed a customised version of Chromium that records at mitigating the threats posed by vulnerable JavaScript detailed causality trees of page element creation relationships. libraries. For example, less than 3 % of websites could Using these tools, we crawled the Alexa Top 75 k websites fix all their vulnerable libraries by applying only patch- and a random sample of 75 k websites drawn from a snapshot level updates. Similarly, only 1 % of websites use the of the .com zone in May 2016. These two crawls allow us to version-aliasing services offered by JavaScript CDNs. compare and contrast JavaScript library usage between popular and unpopular websites. In total, we observed 11,141,726 inline II. BACKGROUND scripts and script file inclusions; 87.7 % of Alexa sites and JavaScript has allowed web developers to build highly inter- 46.5 % of .com sites used at least one well-known JavaScript active websites with sophisticated functionality. For example, library, with jQuery being the most popular by a large majority. communication and production-related online services such as Analysis of our dataset reveals many concerning facts Gmail and Office 365 make heavy use of JavaScript to create about JavaScript library management on today’s Web. More web-based applications comparable to their more traditional than a third of the websites in our Alexa crawl include at desktop counterparts. In this paper, we focus exclusively on least one vulnerable library version, and nearly 10 % include aspects of client-side JavaScript executed in a browser, not the two or more different vulnerable versions. From a per-library recent trend of using JavaScript for server-side programming. perspective, at least 36.7 % of jQuery, 40.1 % of Angular, 86.6 % of Handlebars, and 87.3 % of YUI inclusions use a vulnerable A. JavaScript Libraries version. Alarmingly, many sites continue to rely on libraries like YUI and SWFObject that are no longer maintained. In fact, In many cases, to make their lives easier, web developers the median website in our dataset is using a library version rely on functionality that is bundled in libraries. For example, 1,177 days older than the newest release, which explains why so jQuery [13] is a popular JavaScript library that makes HTML many vulnerable libraries tend to linger on the Web. Advertising, document traversal and manipulation, event handling, animation, tracking and social widget code can cause transitive library and AJAX much simpler and compatible across browsers. inclusions with a higher rate of vulnerability, suggesting that In the simplest case, a JavaScript library is a plain-text script these problems extend beyond individual