An Empirical Study of Mobile Ad Targeting
Total Page:16
File Type:pdf, Size:1020Kb
An Empirical Study of Mobile Ad Targeting Theodore Book Dan S. Wallach Rice University Rice University Abstract of Android advertising libraries to access permission- protected user data [15, 31, 22, 6] as well as the behavior Advertising, long the financial mainstay of the web of applications that directly pass user private information ecosystem, has become nearly ubiquitous in the world to their ad libraries [7]. What remains little understood of mobile apps. While ad targeting on the web is fairly is the way that information is used after it has been col- well understood, mobile ad targeting is much less stud- lected. ied. In this paper, we use empirical methods to collect In this work, we measure a more complex factor that a database of over 225,000 ads on 32 simulated devices is also critical to the understanding of user privacy—the hosting one of three distinct user profiles. We then ana- interaction between advertising libraries and their host lyze how the ads are targeted by correlating ads to poten- servers. Because we do not have direct access to the pro- tial targeting profiles using Bayes’ rule and Pearson’s chi prietary algorithms used in processing ad requests and squared test. This enables us to measure the prevalence serving advertisements, we choose to treat the data center of different forms of targeting. We find that nearly all ads as a black box, observing the relationship between inputs show the effects of application- and time-based targeting, (ad requests) and outputs (provided ads). This resembles while we are able to identify location-based targeting in the methodology used by other researchers in measuring 43% of the ads and user-based targeting in 39%. the targeting of web applications [21, 3, 8, 26]. In-app mobile ads present interesting opportunities for 1 Introduction ad targeting, and bring related concerns regarding user privacy. Advertisers face the task of delivering their ads 1.3 million of the 1.5 million apps available on Google’s to a receptive audience. Some forms of Internet advertis- Play Store are available for free, including 99% of the ing, such as search advertising, allow effective targeting apps with at least 50,000 downloads [2]. This is largely without any user profile at all—search keywords, alone, made possible by advertising, which accounts for the ma- can indicate products and services for which a user might jority of mobile app revenue in the United States, and be shopping. Other forms of advertising, such as on so- reached USD 19.3B world-wide in 2013 [19, 18]. Other cial networks, rely on extensive profiles that the user has arXiv:1502.06577v1 [cs.CR] 23 Feb 2015 forms of app revenue, such as paid app sales and in-app entrusted to the service, giving the service provider the purchases, make up the rest of the mobile app monetiza- responsibility of using the data for ad targeting without tion space. This huge market depends on delivering ads revealing its contents to third parties [11, 20]. to people who are likely to respond to them. In-app mobile ads, by contrast, present relatively lit- Mobile ads are delivered through advertising libraries tle information that advertisers can use for targeting. At which are embedded in the application at compile time. their most basic level, they provide a prospective adver- These libraries generally use an embedded web browser, tiser with little more than the name of a host application Android’s WebView, to display the ads and handle user and an IP address that can be used for rough geographical interaction. (These WebViews are subject to many of targeting. Because of this limitation, advertising libraries the same security concerns that impact other dated web have long used some form of unique identifier to identify browsers [34].) These libraries then handle the work of the device, and hence, the user [6]. When unique identi- requesting the ad from a central server, displaying it to fiers are correlated with other data—passed by the appli- the user, and tracking the user’s interaction with the ad. cation or obtained from other sources—opportunities for In prior work, researchers have measured the ability more precise ad targeting begin to emerge. Our goal is to 1 measure and quantify the targeting that takes place, with stock applications on physical devices. This allowed us an eye to elucidating the privacy implications that stem the freedom of manipulating various parameters and ob- from that targeting. serving the results. We began by capturing traces of the In order to maintain a reasonable scope for our work, AdMob library’s communications on a number of real we chose to focus on a single platform (Android), and a and emulated devices. Because these exchanges are un- single ad library (Google’s AdMob.) Android represents encrypted, it was easy for us to analyze them and identify 84.4% of current mobile device shipments, making it a critical components. 1 good proxy for the entire mobiles space [17]. Likewise, By using a variety of applications in generating our AdMob represents 35% of the install-weighted ad library requests, we were able to capture the behavior of various market on the Android platform, making it a reasonable versions of the AdMob Android library, and thus build proxy for the entire mobile ad space [7]. It is worth not- a fairly complete understanding of its behavior. While ing that AdMob is one of the more conservative advertis- the AdMob library makes a variety of HTTP requests in ing libraries in regard to the amount of user data that it the course of its operation, we found that all of the data collects [6]. Other libraries, with access to more personal necessary to request an ad is passed in a single HTTP data, may engage in additional forms of targeting. GET request [33]. In studying AdMob, we are not only studying the be- We analyzed our request dataset and identified 88 dif- havior of the Google’s proprietary systems. Rather, we ferent parameters set by the AdMob library. Our next are studying the entire AdMob ecosystem, which in- task was to determine the significance of each of those cludes not only Google, with its engineers and systems, parameters in order to emulate the library’s behavior. We but also the advertisers that make use of those systems to proceeded by decompiling the AdMob library and exam- place ads based on various criteria. This includes many ining the code used to generate the request. In this way, third parties that use ad exchanges, such as Google’s we were able to identify the code used to set the param- DoubleClick, to place ads on the AdMob network. In eter, and identify the sources of the information that was this way, our research represents an attempt to yield an encoded. end-to-end understand of a complex multi-vendor sys- While most of the parameters related to the mechan- tem. ics of requesting and serving ads, some parameters with significance for user privacy are listed in Table 1. As Ad- 2 Methodology Mob uses ordinary HTTP to make requests and serve ads, the content of most of these parameters is sent in clear Measuring mobile ads presents methodological chal- text. The one exception is the UULE parameter, which lenges due to the need to measure an ever changing data transmits the device location, and is encrypted with a key set without the ability to take random samples [16]. New embedded in the AdMob library. ad campaigns are constantly being introduced, old ones retired, and ads targeting affects each request. We sought 2.2 Building an AdMob Emulator to overcome these factors by focusing on narrow seg- ments of mobile ads, with a limited number of profiles, After reverse engineering the AdMob request syntax it over a restricted time period. In order to generate re- became possible to develop an AdMob emulator, which quests, collect ads, and correlate responses to the re- would build requests that would be interpreted as legit- quests, we first reverse engineered the AdMob ad request imate by AdMob servers. We constructed the emulator protocol. Following this, we built an AdMob emulator in Python, taking application characteristics as inputs, that would request ads based on app parameters that we and generating syntactically valid AdMob requests. The supplied. We then developed a collection of user profiles completed emulator allowed us to vary parameters re- that would enable us to measure ad targeting. We next lated to ad targeting while continuing to generate cor- rolled out a collection infrastructure consisting of numer- rect values for the built-in checksum and other changing ous collection stations that reported ads back to our cen- parameters. We tested our emulator to verify that it re- tral server. Finally we analyzed the data we collected trieved valid ads from the AdMob network. to determine the correlation between possible targeting Of course, the URL parameters are not the only pieces methodologies and the ads that we received. of information that AdMob servers are able to gather from a request. The request timing and IP address are also significant. Because of this, it was necessary to 2.1 Reverse Engineering AdMob develop an infrastructure that would enable us to send In order to give ourselves maximal flexibility in prob- 1It also makes it easy for attackers to make use of identifying in- ing AdMob’s behavior, we chose to reverse engineer the formation in the request to profile individuals. Indeed, it is known that AdMob ad request API rather than send requests from nation-state attackers have made use of this information in the past [29].