Impression Fraud in On-Line Advertising Via Pay-Per-View Networks
Total Page:16
File Type:pdf, Size:1020Kb
Impression Fraud in On-line Advertising via Pay-Per-View Networks Kevin Springborn, Broadcast Interactive Media; Paul Barford, Broadcast Interactive Media and University of Wisconsin—Madison This paper is included in the Proceedings of the 22nd USENIX Security Symposium. August 14–16, 2013 • Washington, D.C., USA ISBN 978-1-931971-03-4 Open access to the Proceedings of the 22nd USENIX Security Symposium is sponsored by USENIX Impression Fraud in Online Advertising via Pay-Per-View Networks Kevin Springborn Paul Barford Broadcast Interactive Media Broadcast Interactive Media [email protected] University of Wisconsin-Madison [email protected] ABSTRACT The online ad ecosystem can roughly be divided into three Advertising is one of the primary means for revenue gener- groups: advertisers, publishers and intermediaries. Adver- ation for millions of websites and mobile apps. While the tisers pay publishers to place a specified volume of creative majority of online advertising revenues are based on pay- content with embedded links (i.e., text, display or video ads) per-click, alternative forms such as impression-based dis- on websites and apps. Intermediaries (e.g., ad servers and ad play and video advertising have been growing rapidly over exchanges) are often used to facilitate connections between the past several years. In this paper, we investigate the prob- publishers and advertisers. Intermediaries typically place a lem of invalid traffic generation that aims to inflate adver- surcharge on the fees paid by advertisers to publishers for ad tising impressions on websites. Our study begins with an placements and/or ad clicks. What is immediately obvious analysis of purchased traffic for a set of honeypot websites. from this simple description is that publisher and intermedi- Data collected from these sites provides a window into the ary platform revenues are directly tied to the number of daily basic mechanisms used for impression fraud and in partic- visits to a website or app. Thus, there are strong incentives ular enables us to identify pay-per-view (PPV) networks. for publishers and intermediaries to use any means available PPV networks are comprised of legitimate websites that use to drive user traffic to publisher sites. JavaScript provided by PPV network service providers to There are certainly legitimate methods for traffic gener- render unwanted web pages "underneath" requested content ation for publisher sites. The most widely used are the on a real user’s browser so that additional advertising im- text-based ad words that appear in search results e.g., from pressions are registered. We describe the characteristics of Google or Bing. However, it can be quite difficult and ex- pensive to drive large traffic volumes to target sites using the PPV network ecosystem and the typical methods for de- 1 livering fraudulent impressions. We also provide a case study ad words alone. Thus, other methods for traffic genera- of scope of PPV networks in the Internet. Our results show tion have emerged, many of which are deeded as fraudulent by advertisers and intermediaries. Google defines invalid that these networks deliver hundreds of millions of fraudu- 2 lent impressions per day, resulting in hundreds of millions (fraudulent) traffic as follows: of lost advertising dollars annually. Characteristics unique Invalid traffic includes both clicks and impres- to traffic delivered via PPV networks are also discussed. We sions that Google suspects to not be the result conclude with recommendations for countermeasures that of genuine user interest [21]. can reduce the scope and impact of PPV networks. Standard methods for generating invalid traffic includes (i) 1. INTRODUCTION using employees at publisher companies to view sites and click on ads, (ii) hiring 3rd parties to view sites and click Advertising is one of the primary methods for generating on ads, (iii) click/view pyramid schemes and (iv) using soft- revenues from websites and mobile apps. A recent report ware and/or botnets to automate views/clicks [21]. The chal- from the Internet Advertising Bureau (IAB) places ad rev- lenges for advertisers and intermediaries focused on offering enues in the US for the first half of 2012 at $17B, which rep- trustworthy platforms are to understand these and potentially resents a 14% increase over the previous year [15]. While other threats so that effective countermeasures can be de- the majority of that revenue is search-based, ad words ad- ployed. vertising, display and video advertising have been growing. In this paper, we investigate a relatively new threat for dis- Indeed, a recent report places display and video advertis- play and video advertising called Pay-Per-View (PPV) net- ing in the US at $12.7B for FY2012, growing at 17% annu- 1 ally [27]. At a high level the basic notion of selling space on This has led to the emergence of a large number of Search Engine Optimization companies in recent years. web pages and apps for advertising is simple. However, the 2While Google is not the only company in this domain, we refer to mechanisms and infrastructure that are required for online them as an authoritative source of information due to their size and advertising are highly diverse and complex. experience in online advertising. 1 USENIX Association 22nd USENIX Security Symposium 211 works. The basic idea for PPV networks is to pay legitimate can be compiled through programmatic enumeration of PPV publishers to run specialized JavaScript when users access destinations. Finally, referer fields can be queried at the time their sites that will display other publishers websites in a of advertisement load in order to identify traffic originating camouflaged fashion. This will result in impressions and po- from known PPV domains. tentially even clicks that are registered on the camouflaged The remainder of this paper is organized as follows. In pages without "genuine user interest" i.e., invalid traffic gen- Section 2 we provide a description of the online advertis- eration. Legitimate publishers view this as another way to ing ecosystem and an overview of invalid traffic generation monetize their sites without impact to their users. PPV net- threats. In Section 3, we describe the details of our hon- works sell their traffic generation capability by touting real eypot websites and our traffic purchases for these sites. In and unique users, geolocation and context specificity among Section 4, we describe the details of the evaluations that we other things. The fact that pages are appearing on real users’ conduct on our data including analyses of additional data systems makes detecting and preventing PPV traffic genera- sets and measurements that enable us to project some of the tion challenging. broader characteristics of PPV networks. We provide rec- To study PPV networks, we employ a small set of honey- ommendations for counter measures that can be employed pot websites that we use as the target for traffic generation. to reduce the impact of PPV networks in Section 5. We dis- These sites were constructed to include what appears to be cuss prior studies that inform our work in Section 6. We legitimate content and advertising. We then use search to summarize, conclude and discuss future work in Section 7. identify a wide variety of traffic generation offerings on the Internet. We purchased impressions for our honeypot sites in 2. ONLINE ADVERTISING ECOSYSTEM various quantities from a selection of different traffic gener- In this section we provide an overview of the online adver- ation services over the course of a 3.5 month period. By en- tising ecosystem including both the business framework and gaging with traffic generation services directly, we were able technical framework for delivering advertisements to pub- to uncover the basic mechanisms of PPV networks and ini- lisher websites and apps. Some prior studies have provided tiate additional measurements to characterize their deploy- similar overviews including [16, 34, 41]. We also provide an ments. overview of invalid traffic generation threats and the chal- The characteristics of the traffic purchased for our honey- lenges they pose in the ecosystem. pot sites is dictated at a high level by the service offerings, which enable volume, time frame and geographic location, 2.1 Business Framework etc. of users to be specified. Our results show that impres- sions are typically spread in a somewhat bursty fashion over As mentioned in Section 1, there are three main partici- the specified time frame and that user characteristics are well pant groups in ad networks: advertisers, intermediaries and matched with specifications. By considering the referer field publishers. As shown in Figure 1 there are two other im- of the incoming traffic, we were able to identify the fact that portant groups: brands and users. Brands pay advertisers our honeypot sites were being loaded into a frame (along to help them sell their products and services. Internet-based with as many as ten other sites) for display on remote sys- campaigns are attractive to brands and advertisers since con- tems. By considering names of a small selection of traf- sumers/users spend a growing proportion of their time on- fic generation services, we use a recent, publicly-available, line. An important appeal of online advertising (especially Internet-wide web crawl to identify the scope of PPV net- for consumer goods) is that it offers the opportunity to tie works. We find tags from these services are, in fact, widely ad campaigns and associated costs directly to sales e.g., by deployed – on tens of thousands of sites. By appealing to tracking clicks from online ads to purchases on a brand’s MuStat [29], we conservatively estimate the number of in- ecommerce site. valid impressions that are generated from this small set of Advertisers are companies that create and manage adver- PPV networks to be on the order of 500 million per day.