Understanding Malvertising Through Ad-Injecting Browser Extensions

Xinyu Xing Wei Meng Byoungyoung Lee Georgia Institute of Georgia Institute of Georgia Institute of Technology Technology Technology [email protected] [email protected] [email protected] Udi Weinsberg Anmol Sheth Facebook Inc. A9.com/Amazon [email protected] [email protected] Roberto Perdisci Wenke Lee University of Georgia Georgia Institute of [email protected] Technology [email protected]

ABSTRACT Keywords Malvertising is a malicious activity that leverages advertising to Malvertising; Browser Extension; distribute various forms of . Because advertising is the key revenue generator for numerous Internet companies, large ad networks, such as Google, Yahoo and Microsoft, invest a lot of 1. INTRODUCTION effort to mitigate malicious ads from their ad networks. This drives is a powerful way to deliver brand messages adversaries to look for alternative methods to deploy malvertising. to potential customers. To monetize their online services and appli- In this paper, we show that browser extensions that use ads as cations, most modern websites act as ad publishers and reserve ad their monetization strategy often facilitate the deployment of malver- space on their web pages where online ads are displayed to their tising. Moreover, while some extensions simply serve ads from ad visitors. Ad networks work as brokers between advertisers and networks that support malvertising, other extensions maliciously publishers. Joining an ad network frees websites from having to alter the content of visited webpages to force users into installing set up their own ad servers and invest in tracking software. Conse- malware. To measure the extent of these behaviors we developed quently, some ad networks attract a very large number of publishers Expector, a system that automatically inspects and identifies browser and produce huge revenues. extensions that inject ads, and then classifies these ads as malicious As online advertising became increasingly popular and perva- or benign based on their landing pages. Using Expector, we auto- sive, miscreants have started to abuse this convenient channel to matically inspected over 18,000 Chrome browser extensions. We conduct malicious activities. Malvertising is one of such activities, found 292 extensions that inject ads, and detected 56 extensions where an attacker maliciously uses advertising to distribute various that participate in malvertising using 16 different ad networks and forms of malware. Malvertising can result in serious consequences, with a total user base of 602,417. because an attacker can purchase ad space to publish malicious ads on many popular websites. Therefore, the maliciously crafted con- tent could reach a very large audience. In addition, users may be Categories and Subject Descriptors unaware of the fact that they could encounter malicious content H.3.5 [Online Information Services]: Web-based services; K.6.2 while browsing highly reputable websites, which may put them at [Installation Management]: Performance and usage measurement; an even higher risk. K.6.5 [Security and Protection (D.4.6, K.4.2)]: Invasive software Recent reports indicate that some ad networks have started to of- (e.g., viruses, worms, Trojan horses) fer browser extension developers an opportunity to monetize their “free” work [3,9]. In other words, some ad networks are extend- ing their business by connecting advertisers to browser extension General Terms developers. Given this new business trend and little studies on Security, Design, Experimentation malvertising in this new context, we conduct a comprehensive study on malvertising activities of ad-injecting extensions, which insert ads on webpages without user consents. In particular, we aim to an- swer a number of fundamental questions: How many ad-injecting extensions are present on popular repositories like the Chrome Ex- tension store? How many extensions are conducting malvertis- Copyright is held by the International World Wide Web Conference Com- ing activities? What are the characteristics of malvertising via ad- mittee (IW3C2). IW3C2 reserves the right to provide a hyperlink to the injecting extensions? author’s site if the Material is used in electronic media. To answer the aforementioned questions, we developed Expector WWW 2015, May 18–22, 2015, Florence, Italy. ACM 978-1-4503-3469-3/15/05. (Extension Inspector), a tool that is able to automatically iden- http://dx.doi.org/10.1145/2736277.2741630. tify ad-injecting extensions. We ran Expector on extensions served by the Chrome Extension store, and observed several in- to serving malicious advertisements, and ad arbitration (syndica- teresting behaviors. First, compared to ads “naturally” published tion) process can facilitate the distribution of malicious advertise- (without any browser extension intercession) by popular websites, ments. Different from these previous studies, we focus on unrav- ad-injecting extensions tend to serve a larger fraction of malicious eling those characteristics of malvertising unique to ad-injecting ads. We attribute this to the observation that popular websites part- browser extensions. ner with large reputable ad networks, whereas extensions utilize Browser Extension. Though not considered malicious, ad-injecting smaller ad networks that devote insufficient efforts to identifying activities of browser extensions are misbehaving practice. A series and mitigating malicious advertisers. Second, in contrast to previ- of issues in misbehaving browser extensions have been studied for ous studies that found only a low fraction of malicious ads, we ob- years, mostly focusing on preventing user data leakage through ma- served that some ad networks serve only malicious ads. This is pre- licious extensions [20,24,25,27,29] and protecting against privilege sumably because these ad networks and miscreants collude to con- abuse of extensions [22]. The security model of Chrome extensions duct malvertising, or because miscreants effectively own these ad was criticized in [27]. The authors showed that extensions intro- networks and use them for serving malicious ads. Finally, we found duce attacks on the Chrome browser itself, and proposed a enforc- that ad-injecting extensions can make malvertising more detrimen- ing micro-privileges at the DOM element level. Egele et al. [20,24] tal, because some ad networks are unscrupulous on abusing the proposed several detection solutions to spyware that silently steals privileges that ad-injecting extensions offer. user information. By leveraging both static and dynamic analysis In summary, the main contributions of this paper are as follows: techniques the authors show that it is possible to track the flow of sensitive information as it is processed by the web browser and any • We present a comprehensive study that exposes and quanti- extension installed. [22] showed that many Chrome extensions are fies the extent to which browser extensions facilitate malver- over-privileged, which can potentially cause security risks. To ad- tising. Furthermore, we show that the high privileges granted dress the shortcomings of existing extension mechanisms, the au- to extensions are often abused to increase the likelihood of thors propose a comprehensive new model for extension security. infecting users with malware, e.g., through click hijacking. Our work differs from these studies because we focus on a little studied yet important issue in misbehaving browser extensions – • We present Expector, a measurement framework that au- ad injection. tomatically identifies browser extensions that inject ads on A pioneering work in this problem space is Hulk [23], a dy- webpages. Our evaluation shows that Expector has both namic analysis system that automatically detects Chrome browser low false positives (3.6%) and low false negatives (3%). extensions with malicious behaviors. Using Hulk, Kapravelos et • We ran Expector on all extensions available on the Chrome al.found thousands of suspicious extensions including those with extension store (almost 18,000), and identified 292 exten- ad-injecting practice. Despite effectiveness on identifying misbe- sions that embed ad-injection capabilities. Of these, 56 de- having browser extensions, Hulk cannot provide us only with a cor- liver ads that lead users to sites that host malware, and have pus of ad-injecting extensions for studying their malvertising activ- an overall user base of 602,417. Furthermore, we found 16 ities because it is designed for identifying various misbehaviors in unique ad-network domains that inject malicious ads, leading Chrome extensions. In addition, our study needs not only a tool users to 117 distinct malware-serving domains. to identify ad-injecting extensions but also an automatic solution to categorize ads placed by these extensions. Considering the lim- • We show that a user installing an ad-injecting extension is itation of Hulk as well as our unique requirement, we built a cus- more likely to be exposed to malvertising threat, compared tomized measurement tool – Expector– to facilitate our study. to users that do not install such browser extensions. In addition to Hulk, other tools have been recently developed to assist users in identifying ad-injecting Chrome extensions [6,7]. The rest of this paper is structured as follows. We begin with the These tools operate by comparing extensions installed by a user discussion of related work in Section2. In Section3 we provide es- against a list, oftentimes crowd-sourced, of known adware, and sential background. We detail the design of Expector and show alerting the user if such extensions are found. Such an approach the results from applying it to the Google Chrome extension store is limited by the accuracy and completeness of its list, and cannot in Section4. We then perform a detailed study for malvertising ex- handle new adware that was not yet reported. Therefore, our study tensions identified by Expector in Section5 and6. Finally, we does not rely on manually curated lists. conclude the paper in Section7. Adware. As an ad-injecting browser extension is one instance of adware, analysing malvertising activities of ad-injecting extensions 2. RELATED WORK also falls under the category of the study on adware. Edelman et al. [18] provide an overview of the adware ecosystem, studying In this section, we discuss three lines of work most related to the ad networks, exchanges, and practices used by ad injectors. In ours – (1) malvertising, (2) misbehaving browser extension, and a more recent work [19], the same authors study fraud in online (3) adware. affiliate marketing, and how adware takes part of such frauds. Both Malvertising. Malvertising has been a growing concern for many works focus more on the business aspects of the ecosystem, and ad networks. Prior research on this threat has shown the ram- unlike our work, they do not attempt to conduct a comprehensive pancy of malvertising through normal Web browsing [21, 26, 28, analysis of adware, but instead extract the participants of manually 30]. Provos et al. [28] studied various channels for distributing selected adware extensions. drive-by downloads. The authors found that 2% of the landing sites deliver malware via advertisements, which were often reached through multiple ad syndication. Ford et al. [21] studied malware 3. AD-INJECTING EXTENSIONS distribution through Flash ads and developed a tool for malicious Browser extensions are programs that enhance the functionali- Flash ad detection. According to recent studies [26,30], researchers ties of the browser. These programs are written using a combina- found that about 1% ads lead users to malicious content. In addi- tion of HTML, JavaScript, and CSS, and are typically hosted in tion, Zarras et al.observed that smaller ad networks are more prone online stores, such as the Chrome web store [10] and Mozilla Ad- networks and ad exchanges, there exists a thriving market of ad networks that provide JavaScript ad injection libraries for exten- sion developers to integrate with their applications. These libraries inject ads on webpages by modifying the DOM structure of the HTML and inserting additional HTML iframe’s that contain the injected ad content. The ad provider may trigger the ad injection only on specific websites (e.g., retail websites), and/or inject ads when a user performs a specific operations (e.g., mouse hover on a product image). 4. EXPECTOR In this section, we provide details for the design and implemen- tation of Expector, our browser extension analysis and measure- ment framework that aims to automatically detect and characterize ad-injection practices used by browser extensions. We then discuss how we use Expector to detect ad-injecting extensions (with ei- ther “regular” ads or malvertising) in the Chrome Extension store.

Figure 1: A Google search injected with ads from “Translate Se- 4.1 Design lection” Chrome extension. Identifying Triggering Websites. Some ad-injecting extensions inject ads only when the user visits a specific set of websites. There- fore, Expector needs to identify the websites that may trigger Ons store [8]. Extensions interact with the current webpage loaded the ad injection functionalities of each extension to be analyzed. in the browser by injecting JavaScript to read or modify the DOM To identify such websites, Expector performs static analysis of structure of the webpage, communicating with external servers us- the extension’s code. As some of the ad injection code may be XMLHttpRequest ing , and leveraging the browser APIs and fea- loaded from third-party library at runtime, Expector also spawns tures. The permissions requested by the extension are listed in a a browser with the extension installed to intercept all the scripts manifest file that is reviewed by the user at installation time. As is transmitted via the network. More specifically, Expector in- evident from this description, browser extensions hold significant stalls a virtual proxy between the browser’s JavaScript engine and privileges, thus leaving the door open to malpractices and security the extension. Using the remote debugging protocols supported and privacy risks. In the rest of the section we describe three pri- by browsers (e.g., Chrome [12]), the virtual proxy intercepts all mary types of ad injection malpractices that we observed through the function invocations related to JavaScript executions. Once the our study of Chrome browser extensions. Then, we briefly intro- JavaScript code executed by the extension is obtained, it searches duce JavaScript libraries used by ad-injecting extensions. for references to one of the ten most popular top-level domains 3.1 Ad Injection Practices (TLDs) [17] (e.g., .com, .net, .org). We limit the search to these top-ten TLDs because we assume that extensions will most Ad Injection on Search Result Pages. Consider the screenshot in likely target popular websites, almost all of which are registered Figure1, showing the result page of a Google search for the query under the top-ten TLDs. “bags”. The user has installed the “Translate Selection” Chrome Triggering Events for Ad Injection. Besides injecting ads based extension whose primary purpose is to help users quickly translate on websites the user visits, extensions also inject ads on specific the selected text between different languages. However, this ex- user generated events. For example, the extension code may reg- tension bundles with it the functionality to inject ads in the search ister an event listener on a product image of a retail site and listen results returned to the user. The top highlighted region in Figure1 to the “MouseOver” event. The callback for injecting ads of sim- shows the standard Google sponsored ads. The bottom highlighted ilar products is only executed when the user hovers his mouse on region shows another set of ads of bags with links to online stores. this image. In order to effectively identify ad injection extension, Ad Injection on Retail Websites. Another type of ad injection Expector tries to trigger as many as possible potential events that practice is related to injecting ads on webpages related to online the extension might be interested in. To this end, for all DOM el- retail (amazon.com, .com, etc.). When a potential shopper ements that have a JavaScript event handler to process user events, browses a product on an online retail website, the extension sends like onClick, onMouseOver, etc., we instruct the browser to execute the context of a browsing session to a third-party advertisement the corresponding JavaScript function. service, which retrieves similar or related products. These products A caveat to the above described event trigger mechanism is that are shown in the form of an ad to the user, overlayed on the existing some extensions inject ads only after some time has elapsed since website content. the user installed the extension. We assume these practices are used Ad Injection on Unrelated Websites. The third type of ad injec- to reduce user dissatisfaction from the injected ads and to reduce tion practice consists of extensions that aggressively insert ads on the chance to be detected by an extension reviewer. We therefore almost every webpage that the user browses. This practice can of- “tricked” these extensions by setting the system clock before we ten degrade the user’s experience by shown pop-ups or other forms install an extension to a random long enough time backwards, then of annoying ads. install the extension, launch the browser and then set the system clock back. 3.2 JavaScript Libraries for Ad Injection Identifying Extension-Injected Elements. As a first step for de- It is relatively straightforward for extension developers to mon- tecting injected ads, the measurement framework should identify etize their extensions through ad injection. Similar to existing ad the suspicious DOM elements that are potentially inserted by the extension for ad injection. A naive approach is to directly compare browser to visit the URL. Note that we ignore iframe’s, since the the HTML source of two different pages – one with the extension URL points to the source of the iframe. For the remaining elements loaded, and another one without. However this approach will result (iframe’s or elements with no URLs), we instruct the browser to in a large number of false positives as most webpages load a large trigger a click event to emulate a user clicking inside the element. amount of dynamic content that can be served by different hosts on For all the above cases, if the element contains an ad, the adver- each reload of the webpage. For example, each reload of a webpage tiser’s website will be loaded usually through a series of redirects, can potentially result in ads served from different ad providers. first to one or more ad networks, and then to the ad landing page. To address this, Expector uses both the DOM structure as well To detect such redirection patterns, we implemented a lightweight as the content of the DOM elements to identify potential extension- Chrome extension, which is loaded before the other extension is injected elements. This is achieved by the following steps: installed, that logs all HTTP requests made by the browser. HTTP 1. We assume that all hosts that serve content on the webpage requests issued as a result of the above process are analyzed for without the extension installed are trusted and generate a searching redirections. A redirection is detected when a visit (click) list of trusted host names by loading the webpage multiple results in more than one domain in the traffic trace, and the last do- times without the extension. The reason we trust such exist- main (e.g., the ad landing page) is not the same as the original web ing hosts is that extensions use different ad networks than site. If such redirection is identified, the extension is labeled for mainstream websites, mostly due to restrictions enforced further inspection. by the large ad networks [2]. 4.2 Implementation 2. We load the webpage using two instances of the browser We implement Expector using Node.js [11] and Selenium [14]. (with and without the extension loaded) and record the DOM We use Node.js to spawn a Chrome instance and load a Chrome tree. extension for pre-parsing (identifying triggering websites). Us- 3. We trigger user events in the instance with extension in- ing the remote debugging protocol of Chrome, we configure a vir- stalled as described above. tual proxy working as a Chrome Developer Tool that intercepts 4. The DOM tree structure is flattened into a list of elements all the JavaScript function invocations. Specifically, we listen on by performing a pre-order traversal over the correspond- all Debugger.scriptParsed events and log all the JavaScript code ing DOM tree, i.e., for each node in the list, its descen- parsed by the V8 JavaScript engine. The rest of the components dants are located at its right side and its ancestors are at the of Expector are implemented with Selenium and the LCS algo- left side of the node. Each item in the list consists of the rithm is implemented in Python. tag name of the DOM element and the domain associated To ensure fast and scalable processing of a large number of Chrome with the DOM element (if such domain exists). For exam- extensions, we deploy Expector across 60 Linux Debian 7 vir- ple, an iframe tag like