Malicious Browser Extensions at Scale: Bridging the Observability Gap Between Web Site and Browser
Total Page:16
File Type:pdf, Size:1020Kb
Malicious Browser Extensions at Scale: Bridging the Observability Gap between Web Site and Browser Louis F. DeKoven1, Stefan Savage1, Geoffrey M. Voelker1, and Nektarios Leontiadis2 1University of California, San Diego 2Facebook Abstract capabilities allow extensions to offer complex and rich Browser extensions enhance the user experience in a va- modifications to the user experience and support the im- riety of ways. However, to support these expanded ser- plementation of services that would otherwise be impos- vices, extensions are provided with elevated privileges sible to implement. However, these same capabilities can that have made them an attractive vector for attackers provide a powerful vehicle for performing malicious at- seeking to exploit Internet services. Such attacks are par- tacks [3]. Unsurprisingly, this problem has evolved from ticularly vexing for the sites being abused because there one of abstract potential into a concrete threat, and today is no standard mechanism for identifying which exten- the problem of Malicious Browser Extensions (MBE) is sions are running on a user’s browser, nor is there an es- widely understood to be real and growing [3,7,8,10,11]. tablished mechanism for limiting the distribution of ma- We consider MBEs to be extensions that take actions on licious extensions even when identified. behalf of a user without their consent, or replace Face- In this paper we describe an approach used at Face- book’s key functionality or content. book for dealing with this problem. We present a Unfortunately, detecting MBEs is challenging because methodology whereby users exhibiting suspicious online the malicious nature of a given extension can manifest behaviors are scanned (with permission) to identify the dynamically and the online targets of its abuse have no set of extensions in their browser, and those extensions natural way to attribute those behaviors back to partic- are in turn labelled based on the threat indicators they ular extensions. More concretely, while a browser ven- contain. We have employed this methodology at Face- dor or extension marketplace is in a position to inspect book for six weeks, identifying more than 1700 lexically extension code, inferring malicious intent may not be distinct malicious extensions. We use this labelling to possible from that vantage point. In addition to the tra- drive user device clean-up efforts as well to report to anti- ditional challenges with such code analysis approaches malware and browser vendors. (e.g., polymorphic encoding), extensions routinely fetch resources from third-party sites and, as a result, an exten- 1 Introduction sion may only exhibit malicious actions at certain times Today, Web browsers encapsulate dynamic code, interact or when certain Web services are visited. Conversely, with users and are implicated in virtually every activity from the vantage point of a targeted Web service, abu- performed on computers from e-mail to game playing. sive actions may be clear, but the source of those actions While some of these activities have been made possible can be murky. Extensions frequently hide by emulating by enhancements to the standard languages and capabil- a normal user’s interactions and there are no standard ities supported by the browsers themselves, many oth- mechanisms to link browser actions back to a particular ers are made possible via browser extensions designed to extension (or even to enumerate the extensions present augment this baseline functionality. on a user’s browser). Indeed, because the viewpoint of Notably, browser extensions enable customization not the Web service provider is limited to the Document Ob- only with respect to visual appearance (e.g., by chang- ject Model (DOM) there is no shared language by which ing the look and feel of the browser), but also on a deep they can crisply share threat intelligence with browser behavioral level (i.e., in the way the browser interacts vendors or extension marketplaces. with Web sites). Browsers enable the latter functional- In this paper, we have started to bridge this gap be- ity by allowing extensions to use a set of permissions tween the Web site and browser vantage points, which we that Web sites do not normally have. For example, ex- believe will enable more effective interventions against tensions are capable of modifying HTTP headers, by- the threat of MBEs. In particular, we examine MBEs passing the Content Security Policy (CSP) [13] set by from the perspective of Facebook — which, among oth- Web site owners and hiding the results of any actions by ers, is extensively targeted by such extensions (e.g. [7]). rewriting Web site content before it is displayed. These We describe our approach for automatically collecting browser extensions of interest, detecting the malicious opportunities to enable malicious behavior. For example, ones (within seconds of reaching Facebook’s infrastruc- extensions can violate typical cross-site request forgery ture) and then working to remove such extensions from (CSRF) or cross-site scripting (XSS) protections, inject our customer ecosystem. In particular, this paper de- arbitrary code in a page’s DOM, rewrite its content, and scribes the following contributions: access Web traffic as a page is being loaded (including all cookies and POST parameters). Indeed, the permissions • A methodology for collecting browser extensions available are sufficiently powerful that they can even pre- from devices suspected of malware compromise. vent the user from removing an extension once loaded. • A methodology for automated labeling of malicious Users frequently load browser extensions via online extensions using indicators we extract from the col- marketplaces (e.g., the Chrome Web store), which try to lected samples and threat indicators previously as- vet both code and authors, and remove extensions that are sociated with abusive behavior. clearly abusive. However, browsers also allow a range of • Deploying this methodology at Facebook, we iden- alternate “sideloading” options including manual instal- tify more than 1700 malicious Chrome and Fire- lation, operating system administrative policies, and na- fox extensions1 out of a total of more than 34000 tive binaries. While browser vendors are actively reduc- scanned extensions during a 6-week period span- ing such sideloading opportunities, attackers have shown ning late 2016 into 2017. great creativity in bypassing ad hoc limits. Moreover, even when the browser is configured to prevent sideload- • We show that existing anti-malware and anti-abuse ing, we have observed one class of malware (BePush) mechanisms only offer limited effectiveness against enabling sideloading by simply installing an older ver- MBEs, addressing a small fraction of the samples sion of the browser that lacks such protections. we detect (and with far greater delay when they do). These issues have been understood for some time, with The remainder of this paper is organized as follows: Dhawan and Ganapathy identifying early malicious ex- Section2 provides an overview of browser extensions tensions in 2009 and proposing techniques to protect and related work; Section3 outlines Facebook’s ap- against them [3]. Researchers have shown similar prob- proach for detecting compromised user accounts, and lems exist in modern browsers [2,5,9, 12] and large- collecting and analyzing browser extensions; Section4 scale empirical measurements [8] and operational expe- describes the automated extension labeling system, and rience [7] show that malicious browser extensions are a Section5 evaluates it, characterizing the volume of ex- widespread problem. Much of the existing research in tensions Facebook deals with and system behavior over this space has focused on how to either better harden the time; Section6 motivates the need for the new labeling browser [4,6,9,10] or to provide a better mechanism for system; and Section7 concludes. vetting code in extension marketplaces [1]. Our work builds on ideas from these prior efforts. Our 2 Background approach is data driven, like the work of Jagpal et al. [7] and Kapravelos et al. [8], but is based on static analy- Browser authors have tried to provide as dynamic and sis using Facebook’s own contemporaneous threat indi- programmable experience as possible, but inevitably cator data (e.g., abusive domains / URLs) to label exten- some applications have required capabilities beyond sions. This allows our approach to be browser-agnostic those provided via standard Web programming inter- and adapt quickly to changes in the kinds of abuse be- faces. To address this need, virtually all major browsers ing perpetrated on Facebook. Of course, Rice’s theorem support extension interfaces. Extensions written to these says there is no way to figure out whether a piece of code interfaces are allowed to execute code, interact with the will be malicious, so there is no way to make a promise to browser core and initiate network calls — all indepen- catch every MBE. We further describe a soup-to-nuts op- dent of particular Web pages being viewed. While exten- erational workflow — including how we obtain samples, sions can use a variety of technologies and languages, for process and label them, and remediate affected users. this paper we will be focusing on HTML and JavaScript (JS) which are used predominantly in the development of 3 Collecting browser malware Chrome and Firefox extensions. Because browser extensions are given permission to Facebook has collected more than 1 700 unique mali- interact with the browser in manners that would other- cious samples over the 6-week analysis period.2 Natu- wise be classified as “high-risk” [9], there is a range of rally, manual analysis of extensions at this scale is in- 1While most recent browsers support extensions (e.g. Safari, Opera, feasible, so Facebook has developed new techniques to Internet Explorer, etc.), we focus on Chrome and Firefox since visitors automate the collection and analysis of samples. using these two browsers constitute 54% and 13% of browser traffic respectively seen per day at Facebook. 2Uniqueness is based on MD5 hashes of extension contents.