<<

Malicious Browser Extensions at Scale: Bridging the Observability Gap between Web Site and Browser

Louis F. DeKoven1, Stefan Savage1, Geoffrey M. Voelker1, and Nektarios Leontiadis2 1University of California, San Diego 2Facebook

Abstract capabilities allow extensions to offer complex and rich Browser extensions enhance the user experience in a va- modifications to the user experience and support the im- riety of ways. However, to support these expanded ser- plementation of services that would otherwise be impos- vices, extensions are provided with elevated privileges sible to implement. However, these same capabilities can that have made them an attractive vector for attackers provide a powerful vehicle for performing malicious at- seeking to exploit Internet services. Such attacks are par- tacks [3]. Unsurprisingly, this problem has evolved from ticularly vexing for the sites being abused because there one of abstract potential into a concrete threat, and today is no standard mechanism for identifying which exten- the problem of Malicious Browser Extensions (MBE) is sions are running on a user’s browser, nor is there an es- widely understood to be real and growing [3,7,8,10,11]. tablished mechanism for limiting the distribution of ma- We consider MBEs to be extensions that take actions on licious extensions even when identified. behalf of a user without their consent, or replace Face- In this paper we describe an approach used at Face- book’s key functionality or content. book for dealing with this problem. We present a Unfortunately, detecting MBEs is challenging because methodology whereby users exhibiting suspicious online the malicious nature of a given extension can manifest behaviors are scanned (with permission) to identify the dynamically and the online targets of its abuse have no set of extensions in their browser, and those extensions natural way to attribute those behaviors back to partic- are in turn labelled based on the threat indicators they ular extensions. More concretely, while a browser ven- contain. We have employed this methodology at Face- dor or extension marketplace is in a position to inspect book for six weeks, identifying more than 1700 lexically extension code, inferring malicious intent may not be distinct malicious extensions. We use this labelling to possible from that vantage point. In addition to the tra- drive user device clean-up efforts as well to report to anti- ditional challenges with such code analysis approaches and browser vendors. (e.g., polymorphic encoding), extensions routinely fetch resources from third-party sites and, as a result, an exten- 1 Introduction sion may only exhibit malicious actions at certain times Today, Web browsers encapsulate dynamic code, interact or when certain Web services are visited. Conversely, with users and are implicated in virtually every activity from the vantage point of a targeted , abu- performed on computers from e-mail to game playing. sive actions may be clear, but the source of those actions While some of these activities have been made possible can be murky. Extensions frequently hide by emulating by enhancements to the standard languages and capabil- a normal user’s interactions and there are no standard ities supported by the browsers themselves, many oth- mechanisms to link browser actions back to a particular ers are made possible via browser extensions designed to extension (or even to enumerate the extensions present augment this baseline functionality. on a user’s browser). Indeed, because the viewpoint of Notably, browser extensions enable customization not the Web service provider is limited to the Document Ob- only with respect to visual appearance (e.g., by chang- ject Model (DOM) there is no shared language by which ing the look and feel of the browser), but also on a deep they can crisply share threat intelligence with browser behavioral level (i.e., in the way the browser interacts vendors or extension marketplaces. with Web sites). Browsers enable the latter functional- In this paper, we have started to bridge this gap be- ity by allowing extensions to use a set of permissions tween the Web site and browser vantage points, which we that Web sites do not normally have. For example, ex- believe will enable more effective interventions against tensions are capable of modifying HTTP headers, by- the threat of MBEs. In particular, we examine MBEs passing the Content Security Policy (CSP) [13] set by from the perspective of Facebook — which, among oth- Web site owners and hiding the results of any actions by ers, is extensively targeted by such extensions (e.g. [7]). rewriting Web site content before it is displayed. These We describe our approach for automatically collecting browser extensions of interest, detecting the malicious opportunities to enable malicious behavior. For example, ones (within seconds of reaching Facebook’s infrastruc- extensions can violate typical cross-site request forgery ture) and then working to remove such extensions from (CSRF) or cross-site scripting (XSS) protections, inject our customer ecosystem. In particular, this paper de- arbitrary code in a page’s DOM, rewrite its content, and scribes the following contributions: access Web traffic as a page is being loaded (including all cookies and POST parameters). Indeed, the permissions • A methodology for collecting browser extensions available are sufficiently powerful that they can even pre- from devices suspected of malware compromise. vent the user from removing an extension once loaded. • A methodology for automated labeling of malicious Users frequently load browser extensions via online extensions using indicators we extract from the col- marketplaces (e.g., the ), which try to lected samples and threat indicators previously as- vet both code and authors, and remove extensions that are sociated with abusive behavior. clearly abusive. However, browsers also allow a range of • Deploying this methodology at Facebook, we iden- alternate “sideloading” options including manual instal- tify more than 1700 malicious Chrome and Fire- lation, operating system administrative policies, and na- fox extensions1 out of a total of more than 34000 tive binaries. While browser vendors are actively reduc- scanned extensions during a 6-week period span- ing such sideloading opportunities, attackers have shown ning late 2016 into 2017. great creativity in bypassing ad hoc limits. Moreover, even when the browser is configured to prevent sideload- • We show that existing anti-malware and anti-abuse ing, we have observed one class of malware (BePush) mechanisms only offer limited effectiveness against enabling sideloading by simply installing an older ver- MBEs, addressing a small fraction of the samples sion of the browser that lacks such protections. we detect (and with far greater delay when they do). These issues have been understood for some time, with The remainder of this paper is organized as follows: Dhawan and Ganapathy identifying early malicious ex- Section2 provides an overview of browser extensions tensions in 2009 and proposing techniques to protect and related work; Section3 outlines Facebook’s ap- against them [3]. Researchers have shown similar prob- proach for detecting compromised user accounts, and lems exist in modern browsers [2,5,9, 12] and large- collecting and analyzing browser extensions; Section4 scale empirical measurements [8] and operational expe- describes the automated extension labeling system, and rience [7] show that malicious browser extensions are a Section5 evaluates it, characterizing the volume of ex- widespread problem. Much of the existing research in tensions Facebook deals with and system behavior over this space has focused on how to either better harden the time; Section6 motivates the need for the new labeling browser [4,6,9,10] or to provide a better mechanism for system; and Section7 concludes. vetting code in extension marketplaces [1]. Our work builds on ideas from these prior efforts. Our 2 Background approach is data driven, like the work of Jagpal et al. [7] and Kapravelos et al. [8], but is based on static analy- Browser authors have tried to provide as dynamic and sis using Facebook’s own contemporaneous threat indi- programmable experience as possible, but inevitably cator data (e.g., abusive domains / URLs) to label exten- some applications have required capabilities beyond sions. This allows our approach to be browser-agnostic those provided via standard Web programming inter- and adapt quickly to changes in the kinds of abuse be- faces. To address this need, virtually all major browsers ing perpetrated on Facebook. Of course, Rice’s theorem support extension interfaces. Extensions written to these says there is no way to figure out whether a piece of code interfaces are allowed to execute code, interact with the will be malicious, so there is no way to make a promise to browser core and initiate network calls — all indepen- catch every MBE. We further describe a soup-to-nuts op- dent of particular Web pages being viewed. While exten- erational workflow — including how we obtain samples, sions can use a variety of technologies and languages, for process and label them, and remediate affected users. this paper we will be focusing on HTML and JavaScript (JS) which are used predominantly in the development of 3 Collecting browser malware Chrome and extensions. Because browser extensions are given permission to Facebook has collected more than 1 700 unique mali- interact with the browser in manners that would other- cious samples over the 6-week analysis period.2 Natu- wise be classified as “high-risk” [9], there is a range of rally, manual analysis of extensions at this scale is in- 1While most recent browsers support extensions (e.g. , , feasible, so Facebook has developed new techniques to , etc.), we focus on Chrome and Firefox since visitors automate the collection and analysis of samples. using these two browsers constitute 54% and 13% of browser traffic respectively seen per day at Facebook. 2Uniqueness is based on MD5 hashes of extension contents. Clustering Enroll in User action S3.1 & Malicious action scan Classifiers Non-malicious

Logged in Logout user No, Log in user

Should Scan? S3.2 Login attempt

1. User consent Yes Logged out S3.3 user 2. Scanner dl, run Fetch MD5 status

3. Upload MD5 Malware Cleanup Malware Yes, go to 5 New Scan sample?

4. No, remove if bad

Store indicators 5. Upload sample Store Unpack Indicator Extraction (go to 4)

Figure 1: An overview of our system highlighting the detection, malware scanner, and static analysis steps. The dashed arrows describe normal user interaction, and solid arrows are transitions within the described system.

In this section we describe some of the ways Facebook anomalies that can be remediated automatically, and ones detects malware-compromised user accounts. We further that need an analyst to examine and take action. outline what happens after detecting such accounts and, An example of the latter case would be auto-generated specifically, how we collect and analyze the responsible objectionable content being shared on Facebook (e.g. malware samples from malware victims’ computers via a adult content) with similar characteristics, e.g. direct- custom malware scanner. Finally, we report on the initial ing viewers to the same external domain that results in analysis we perform on the collected extensions. a drive-by malware infection. In such a case, the ana- lyst would typically add the related domains to a blacklist 3.1 Detecting compromised user accounts and enqueue the users participating in such activity into a malware cleanup flow. Anomalies that can be auto- The process of acquiring new malware samples starts by remediated are either simple anomalies that make use detecting user accounts suspected of being compromised of other high quality signals (e.g. spiking negative feed- with malware. At a high level, this process is guided back) or anomalies that have been previously seen and by the clustering and classification systems (shown in the responses are already codified. Figure1) using as input (i) signals of abnormal ac- DOM-based indicators. Facebook uses client-side tivity, (ii) client-side third-party injected code in Face- code to self-inspect its own rendered DOM for injected book’s DOM, (iii) and user-reported objectionable con- third-party code. One challenge with client-side code is tent. While the detailed process of detecting malware- that a MBE may attempt to prevent such code from run- compromised accounts is mainly beyond the scope of ning. In the event third-party code is discovered, code- this work, in the following paragraphs we present some specific features are analyzed by Facebook’s clustering examples of related signals. and classification systems. When features related to mal- Negative Feedback. In the event a user account is ware are identified, users’ devices containing such fea- compromised with malware, the malware may attempt tures may get enrolled into a malware cleanup flow. to either use the compromised account for monetization — e.g. by posting that redirect to ad-filled pages 3.2 Malware scanner and cleanup — or to spread the infection by posting links to malware. Once Facebook identifies an account suspected of hav- The latter usually happens via clickbait. In either case, ing been compromised by malware, the account may be Facebook users have the ability to report the content as enrolled in a process that is capable of detecting and re- objectionable (e.g. abusive, malicious, etc.), and links to mediating malware via an online scan session.3 This pro- such content may eventually get blacklisted. cess is shown in Figure1 under “Malware Cleanup”. Fol- Spiking Content. Facebook’s real-time abuse detec- lowing user consent (see Figure2), the user downloads tion systems monitor the time series of user activity to a one-time malware scanner that runs on the potentially detect anomalies based on diurnal patterns of normal ac- tivity. Such anomalies fall into two high-level categories: 3If the account was recently enrolled, it may not be re-enrolled. Indicator extraction. We attempt to extract threat indicators from each potential malware sample without parsing binaries or code. Instead, we treat each file as a plain text document. This approach, although naive, still generates actionable intelligence from each sample. We use a series of regular expressions to extract Uni- form Resource Locators (URLs), IP addresses, domain names, cryptographic hashes, browser extension IDs, and email addresses. In addition, we attempt to deob- Figure 2: The user consent prompt explaining actions fuscate, decompress, decode, and otherwise clean up the the Facebook scanner will take if the user agrees. In this code contained in the extensions we analyze. We also instance the scanner is paired with a third party scanner make reasonable effort to repair broken URLs and other responsible for removing other types of infections. malformed data. Once we have the set of initially ex- tracted indicators, we make a second pass, but this time compromised system. If user consent is not provided, the on the extracted data. During this pass we attempt to find user can continue to access their account using other de- more indicators using the type of the original indicator vices. After a cool-down period the potentially-infected as a clue. For example, for URL indicators that contain device is allowed to access Facebook again. API keys, the first pass extracts the URL and the second Once the scanner process starts, it inspects locations pass extracts the API key from the URL. on the file system known to hold Firefox and Chrome External sharing. We share the full collected sam- extensions. For each observed sample, the scanner com- ples with ThreatExchange and VirusTotal only if either municates the file hash with Facebook’s infrastructure, of the following two conditions are met: (i) the number which in turn provides a verdict on whether Facebook of users having the specific sample are beyond a specific believes the sample is malicious. Suspicious extensions threshold, or (ii) in the case of Chrome extensions, if the or files that have not been seen before (e.g. based on their extension is live on the Chrome store. We are able to de- extension ID) are uploaded to Facebook’s infrastructure tect the latter by constructing and accessing a URL that for real-time analysis. When Facebook’s server-side in- points to the extension on the Chrome Web store. frastructure indicates to the scanner that a sample is mali- cious, the scanner attempts to start a cleanup routine that 4 Browser extension labeling removes the offending sample. In this section we describe our methodology for labeling browser extensions. Labels represent a status of mali- 3.3 Static analysis ciousness, and we assign extensions one of two values in After the we collect and store samples on ThreatEx- decreasing order of severity: change4 – Facebook’s threat intelligence infrastructure • MALICIOUS samples are those deemed with high – and while the malware scanner is still running on the confidence to be malicious. The malware scanner user’s device, we initiate a static analysis pipeline that described in Section 3.2 will subsequently remove extracts threat intelligence from the samples. them when users agree to an anti-virus scan. Our decision for using a static versus a dynamic anal- ysis is based on the understanding that the specific ma- • UNKNOWN is the default status for all samples for licious extensions being analyzed are already exhibit- which we do not have a definitive opinion. ing their malicious behavior at collection time. Conse- We describe the rules our system uses to propagate quently, we do not have to overcome issues of, e.g., time- labels from individual indicators all the way to en- gating that a dynamic analysis would be helpful for [7]. tire extensions, and also how changing labels propagate Although we execute several distinct analysis functions, through the system. While the system automatically ex- the following three are relevant to this paper. tracts indicators and propagates labels, there are some Unpacking. We start by unpacking the sample. Then, situations where traditional manual analysis still plays a we recursively schedule analysis for any files contained role and we end by discussing how the system incorpo- within the extracted object. This function can be unsafe rates input from analysts. in the case where the archive is compressed and has ma- licious intent, which we handle via sanity checks, such 4.1 Automated extension labeling as a limited recursion depth. We start by assigning high-quality labels to individual 4https://developers.facebook.com/products/ threat indicators (e.g. URLs). These indicators come pri- threat-exchange marily from the system responsible for identifying spam activity, as described in Section 3.1, which the labeling temporally gated malice have the potential of erro- system assumes to be ground truth. In essence, the mal- neously labeling samples when the labeling action occurs ware labeling process is designed to apply these vetted outside such boundaries. For geographic gating, our pol- threat labels onto the indicators extracted from samples icy is to disregard the boundary and apply globally the via the static analysis pipeline (Section 3.3). label of the indicators with the highest severity. If any All indicators receive an initial label, but Facebook users experience malicious behavior, our goal is to pro- also maintains a feedback process to flag and re-evaluate tect all users. However, temporal gating requires more them over time if it learns new information. As a result, a attention, specifically for indicators that have been ma- URL erroneously marked as MALICIOUS, for example, licious in the past, but, after re-evaluation, we can posi- will be appropriately re-labeled. This update will then tively characterize them as not malicious. automatically propagate to the relevant samples, which will subsequently be queued for re-labeling. 4.1.2 Cleaning up false positives The system needs to react quickly and automatically to 4.1.1 Propagating maliciousness labels the discovery of false positives. If the system incorrectly At a high level, our automated browser extension label- labels an extension as MALICIOUS, then the extension ing system operates under the basic assumption that if will be removed by the malware scanner the next time a text file (e.g. a JS file) contains indicators marked as devices with the extension are scanned. MALICIOUS in the ground truth data, then we can de- Using the same rule engine used for propagating la- terministically propagate this label to the containing file. bels from indicators to files and extensions, we also Furthermore, if a file labeled as MALICIOUS is a part create a set of rules to automate correcting false posi- of container (e.g. a browser extension), then we can de- tives. Specifically, if an indicator changes status from terministically propagate that label to the container. For MALICIOUS to a lesser severity (e.g., UNKNOWN), (i) we example, if the URL http://www.example.com/ identify all malware samples containing the indicator, evil. is considered MALICIOUS, and a file back- (ii) we filter out all samples that have any status other ground.js contains this URL, then the file will be labeled than MALICIOUS, and (iii) for the remaining samples as MALICIOUS. And if a Chrome extension goats.crx we set their status to UNKNOWN if they do not have any contains background.js, then the extension will also be remaining MALICIOUS indicators. Similarly, when the labeled as malicious. a MALICIOUS sample receives a new, less severe status, In practice, there are also cases that require an explicit we re-compute the status of its containers by applying policy decision on how to propagate labels. Although the same set of rules used for updating indicators. the policies we have chosen may introduce noise into the 4.1.3 Known false positives analysis, our experience has been that overall the system has a sufficient number of strong indicators that it over- Throughout the 6-week measurement period our system comes that noise when it ultimately labels extensions. collected over 34k unique extensions of which 124 are Shared resources. If an indicator represents a shared known to have been incorrectly labeled as MALICIOUS. resource — e.g. an IP address used as a Network Ad- Additionally, the median time to identify a false positive dress Translation protocol (NAT) gateway — it can be is 18 days. As a result, in 0.8% of total scan sessions, used by both benign and bad actors concurrently. In this our system removed one or more of these 124 extensions case, labeling a file that contains the named IP address as erroneously. After the extension is removed, the user can MALICIOUS would be equivalent to erroneously mark- re-install the extension and likely will not be re-enrolled ing all traffic originating from that IP as MALICIOUS. in the malware cleanup process described in Section 3.2. For simplicity of implementation, our policy is to still We consider this number of false positives as small in propagate labels even on indicators for shared resources, number and of an acceptable magnitude. rather than to try to identify and differentiate between shared and non-shared situations. 4.2 Manual labeling Inactive code. Another example are inactive blocks While we consider our automated MBE labeling system of code referencing MALICIOUS indicators. Indeed, the highly effective, there are also cases where a threat an- malicious block of code is not executable, why label the alyst may need to manually examine a sample to decide file as MALICIOUS? Our policy is to still propagate the its status. Such cases include: (i) Suspicious extensions label from indicators on inactive code to the containing with highly obfuscated code that circumvents the static object. We argue that, if an actor has the capability to analysis pipeline’s ability to extract threat indicators. add any type of code into a file, then they may also have (ii) Suspicious samples that may contain evolved adver- the ability to activate previously inactive malicious code. sarial capacity to bypass our detection capabilities. Even Gating. Finally, indicators with geographically or if the sample was correctly labeled as MALICIOUS, in Botnet DOM−based detection such cases analysts are responsible for examining the 7e−04 Devices with indicator samples and communicating their findings within Face- Devices cleaned book. (iii) Malicious indicators or samples that are re- 6e−04 sponsible for labeling multiple samples within a short 5e−04 period of time and beyond a certain alerting threshold. 4e−04 Such cases are indicative of a commonly reused, erro- 3e−04 neously labeled MALICIOUS sample, and the manual 2e−04 analysis step will prevent many false positives. Proportion devices of devices/all 1e−04 Any threat status manually applied by an analyst al- 5 10 15 ways dominates labels originating from the automated systems. Therefore, such systems are configured to never Days after first detection overwrite an analyst-originating threat label. Figure 3: Daily proportion of user devices detected with a DOM-based indicator of the botnet, and the proportion 4.3 A real world example of user devices that have the botnet remediated. To make this process more concrete, we conclude with 5 System evaluation an end-to-end example of a botnet that targets Face- book users to disseminate malicious content. A user’s We now evaluate our system for automatically labeling browser becomes infected when the MBE5 is installed malicious browser extensions using extension data col- via the Chrome Web Store (it is unknown if installa- lected over a period of six weeks, spanning the end of tion is a result of user choice, or through another attack 2016 through early 2017. We start by characterizing vector). Once installed, the extension monitors all Web the volume of data our infrastructure processes, focusing content that the user accesses by sending the URLs to on Chrome and Firefox extensions. The volume under- a command-and-control (C&C) server. Additionally, the scores the need for an automated system. We then show extension periodically requests from a C&C server re- the system in operation and its behavior over time. mote resources that are executed in the user’s browser. Most of the time, the resources do nothing, allowing the 5.1 Extensions collected MBE to appear benign until the botnet operator initiates Table1 shows a high-level breakdown of the browser ex- an attack. This behavior helps explain why VirusTotal’s tensions we collected over the six-week period. Overall, 57 anti-virus engines consider the extension to be non- Facebook’s malware scanner collected a total of 34559 malicious, and why it was on the Chrome Web Store. distinct browser extensions from 741k distinct scan ses- When active, the MBE manipulates Facebook’s DOM sions. We uniquely identify a browser extension by its with side effects that are detectable both by the DOM XPI identifier for Firefox extensions, and by its CRX scanner, as well as by users themselves (who subse- identifier for Chrome extensions. Extensions are more quently reported issues to Facebook). As a result, when popular among Chrome users as the majority of collected the botnet began executing malicious payloads, these distinct extensions (67.6%) came from Chrome. side effects provided the first signals of its existence to As shown in Table1, throughout the six-week period our system, which resulted in automated classification of our system extracted more than 85000 unique HTML the sample as MALICIOUS. and JS files, with 79.5% of them originating from Figure3 shows the detection and remediation of this Chrome extensions. Note that a small number of JS files MBE over time. The x-axis shows the number of days (454 in total) appear both in Chrome and Firefox exten- after the MBE was first detected by Facebook. The y- sions, and are cases of libraries like jQuery commonly axis, normalized by the number of devices scanned daily, shared among JS-based applications. Additionally, three shows both the proportion of scanned devices detected HTML files appear in both Chrome and Firefox exten- as being infected, and the proportion of devices with the sions, and are related to the Potentially Unwanted Pro- MBE cleaned from their system. The lag from when an gram (PUP) Conduit.A. indicator is detected on a device, and when the device Among the collected extensions, our infrastructure ex- performs malware cleanup, is due to the two processes tracted a total of 73281 unique indicators. Most of these being independent. In less than a week it peaks while indicators (90%) were embedded in samples extracted Facebook’s malware scanner actively cleaned infected from Chrome extensions. As with the JS files, 9398 in- devices. After two weeks, almost all extensions had been dicators overlap across the two types of extensions due removed from the browsers of Facebook’s users. to common references to certain resources like domains, URLs, and email addresses. 5e.g. MD5 a369ecc2e8ca5924ddf1639993ffa3aa For Chrome extensions, we find 2200 (9.4%) of the All extensions Malicious extensions Extension contents Extracted indicators Scan sessions # % # % of total JS HTML Total # Malicious (#/%) # % Chrome extensions 23 376 67.6 1 697 7.3 67 380 720 66 134 1 559 (2.4%) 718 497 96.9 Firefox extensions 11 183 32.4 88 0.8 17 979 16 19 004 609 (3.2%) 257 164 34.7 Total unique 34 559 100.0 1 785 5.2 84 905 733 73 281 1 516 (2.1%) 741 276 100.0

Table 1: Collected browser extensions broken down by browser name, status, contained samples and indicators, and by number of scan sessions reporting a specific type of extension. A scan session may collect both Firefox and Chrome extensions if both browsers are present on a given machine, and thus these percentages add up to more than 100%. Extensions automatically marked as malicious by labeling time

150 23376 total extensions to have been on ’s Web Chrome extensions Store at least once. The high proportion of extensions Firefox extensions

likely installed via sideloading (90.6%) is not surpris- 100 ing as Facebook’s malware scanner runs on devices sus- pected of being infected, and Google removes extensions 50 they consider malicious from the Web Store. # of malicious extensions 5.2 Malicious extensions detected 0 0 10 20 30 40 We now consider the methodology for automated MBE

labeling we presented in Section 4.1, and examine its ap- Days plication to the extensions we collected and the files we extracted during the six-week measurement period. Figure 4: The number of unique extensions labeled as Of the 34559 extensions from this period, we classi- malicious each day of the six-week measurement period. fied 5.2% of them as malicious. As expected, attack- ers clearly target Chrome more often. From the 11183 some extensions are initially labeled benign and are only Firefox extensions, only 0.8% of them are labeled mali- later discovered to be malicious when their embedded cious. Yet, of the 23376 Chrome extensions, 7.3% are indicators are associated with abusive behavior. In our malicious. This bias naturally reflects browser market measurement period, we only found 143 (8.0%) exten- share, as Facebook sees predominantly more traffic from sions that are eventually labeled MALICIOUS more than Chrome and attackers concentrate on the platform most 1 day after they are first collected, and these extensions popular with users. Of the malicious Chrome extensions are found on ≈ 9% of all users cleaned during the mea- identified, 24.9% have been accessible on the Web Store surement period. Delayed discovery is expected with an at least one time throughout the measurement period. indicator-based labeling system as the status of an indi- The small portion of extracted threat indicators labeled cator can change over time, and we consider the number malicious in Table1 (2.1% of all extracted indicators) to be acceptably low for an operational system. highlights the effectiveness of our labeling methodology in trickling up known badness. The malicious indicators 6 Evaluating alternatives used to label MBE are primarily domains and URIs, with the exception of a single email address that resulted in The system evaluation shows that Facebook’s MBE la- labeling one Chrome extension as malicious. beling is effective at detecting, labeling, and cleaning Figure4 shows the behavior of the automated labeling malicious extensions. A related question is whether it system over time as it detects and labels MBEs. Each is necessary to create a new system to perform this task. point represents the number of new extensions labeled Next we evaluate alternatives to underscore the need for as malicious on a given day (x-axis), even if the exten- developing a new system to protect Facebook and its sion was first seen on different day. The spike spanning users from large-scale abuse via browser extensions. For days 32–35 is linearly correlated with the fluctuation in this evaluation, we focus on Chrome extensions since the number of users clearing the malware checkpoint at they dominate what we encounter on user’s devices. In the same period. On an average day the system labels particular, 2200 extensions once available as “public” or 39.5 (median: 37) Chrome extensions and 2 (median: 1) “unlisted” on the Chrome Web Store, of which Facebook Firefox extensions as malicious. labeled 422 (19.2%) as malicious, and 1778 (80.8%) as In general, identifying new malicious extensions is im- unknown. Recall that these are extensions from users mediate: for over 90% of newly-collected browser ex- that exhibited suspicious activity on Facebook and trig- tensions, the system labels them as MALICIOUS with gered an anti-virus scan, so we would expect a greater a median time of 21 seconds after collection. However, concentration of malicious extensions in this smaller set. 6.1 VirusTotal While browser vendors are in a position to restrict which We first use VirusTotal to evaluate whether Facebook can extensions are distributed and, in principal, which ex- use general databases of malware to detect malicious ex- tensions may be installed, they have limited insight into tensions. VirusTotal is a popular online system owned which extensions act abusively in the wild. Indeed, some by Google that analyzes malware files using a suite of extension’s malicious code is only loaded at run-time and 57 anti-virus products, and reports which A/V products even then may only be activated for particular sites. Con- label a file as malicious (if any). versely, abused sites directly experience malicious be- We initially use the set of new extensions publicly haviors but they are not in a position to identify which available on the Chrome Web Store overlapping with extensions are implicated in a given attack because this our measurement period. The authors of the Hulk sys- information is not available through the Web interface. tem [8] kindly shared these extensions with us, and they In over six weeks of deployment at Facebook our sys- total 9172 unique CRXs. As a baseline we submitted the tem has identified more than 1700 malicious Chrome shared public extensions their system collected to Virus- and Firefox extensions. Comparing our findings with Total. VirusTotal was aware of only 73 (0.8%) of them, both contemporaneous anti-malware detections (as re- and considered only 5 (0.1%) as malicious. flected in VirusTotal) and takedowns from the Chrome Additionally, out of the 422 MALICIOUS extensions Web Store, reveals a considerable detection gap in the as labeled by Facebook, only 22.7% are identified as ma- existing abuse ecosystem. We hope that by highlight- licious by one or more anti-virus engines. We conclude ing this issue and sharing our data we can encourage a broader and more collaborative focus on this under- that a general malware database like VirusTotal is insuf- 6 ficient for detecting MBEs for sites like Facebook. addressed attack vector . References 6.2 Chrome Web Store [1] S. Bandhakavi, S. T. King, M. Parthasarathy, and M. Winslett. Google also has a vested interest in maintaining the Vetting Browser Extensions for Security Vulnerabilities with health of the Chrome extension ecosystem, and therefore VEX. In Proc. of USENIX Security, 2010. also actively removes extensions that it determines to be [2] N. Carlini, A. P. Felt, and D. Wagner. An Evaluation of the Extension Security Architecture. In Proc. of malicious. Since another option for Facebook would be USENIX Security, 2012. to rely upon Google’s efforts, as a final step we quan- [3] M. Dhawan and V. Ganapathy. Analyzing Information in tify the benefits that Facebook’s MBE labeling system JavaScript-based Browser Extensions. In Proc. of ACSAC, 2009. is able to provide by focusing on just its service beyond [4] V. Djeric and A. Goel. Securing Script-Based Extensibility in what Google provides to all Chrome users. Web Browsers. In Proc. of USENIX Security, 2010. When an extension is removed from the Chrome store, [5] M. Finifter, J. Weinberger, and A. Barth. Preventing Capability we conservatively assume that the extension was re- Leaks in Secure JavaScript Subsets. In Proc. of NDSS, 2010. [6] A. Guha, M. Fredrikson, B. Livshits, and N. Swamy. Verified moved because Google considered it malicious. Since Security for Browser Extensions. In Proc. of IEEE S&P, 2011. extensions may be removed for other reasons (e.g., de- [7] N. Jagpal, E. Dingle, J. Gravel, P. Mavrommatis, N. Provos, velopers removing their own extensions), this represents M. A. Rajab, and K. Thomas. Trends and Lessons from Three an upper bound of Google’s detection capability. By Years Fighting Malicious Extensions. In Proc. of USENIX Secu- the end of the measurement period, Google removed rity, 2015. 367 of the 9 172 extensions from the Chrome store (70 [8] A. Kapravelos, C. Grier, N. Chachra, C. Kruegel, G. Vigna, and V. Paxson. Hulk: Eliciting Malicious Behavior in Browser Ex- MALICIOUS and 297 UNKNOWN based to our labels). tensions. In Proc. of USENIX Security, 2014. In addition to cleaning up the malicious extensions, [9] L. Liu, X. Zhang, G. Yan, and S. Chen. Chrome Extensions: another goal of our MBE labeling system is to reduce Threat Analysis and Countermeasures. In Proc. of NDSS, 2012. the time that they are active and profitable to attackers. [10] M. T. Louw, J. S. Lim, and V.N Venkatakrishnan. Enhancing web Using the public user counts listed on the Web Store we against malware extensions. Journal in Com- estimate that these 70 malicious extensions have been in- puter Virology, 2008. [11] H. Shahriar, K. Weldemariam, T. Lutellier, and M. Zulkernine. stalled 1009806 times. Of the 70 MBEs, Facebook al- A Model-Based Detection of Vulnerable and Malicious Browser ways labels the extensions as malicious before Google Extensions. In Proc. of SERE, 2013. removes them, with a median difference of 67.3 hours. [12] J. Wang, X. Li, X. Liu, X. Dong, J. Wang, Z. Liang, and Z. Feng. Thus reducing the median monetization window of ma- An Empirical Study of Dangerous Behaviors in Firefox Exten- licious extensions by over 2.8 days. sions. In Proc. of ICISC, 2012. [13] M. West, A. Barth, and D. Veditz. Content Security Policy Level 7 Conclusions 3. W3C, 2016. 6MD5 hashes of the 422 identified Chrome MBE available Malicious extensions are a vexing problem and one that in VirusTotal and ThreatExchange: ://pastebin.com/ is challenging to address from any single vantage point. nzVGPLnr