Automated Experiments on Ad Privacy Settings

Proceedings on Privacy Enhancing Technologies 2015; 2015 (1):92–112 Amit Datta*, Michael Carl Tschantz, and Anupam Datta Automated Experiments on Ad Privacy Settings A Tale of Opacity, Choice, and Discrimination Abstract: To partly address people’s concerns over web serious privacy concern. Colossal amounts of collected tracking, Google has created the Ad Settings webpage data are used, sold, and resold for serving targeted to provide information about and some choice over the content, notably advertisements, on websites (e.g., [1]). profiles Google creates on users. We present AdFisher, Many websites providing content, such as news, out- an automated tool that explores how user behaviors, source their advertising operations to large third-party Google’s ads, and Ad Settings interact. AdFisher can ad networks, such as Google’s DoubleClick. These net- run browser-based experiments and analyze data using works embed tracking code into webpages across many machine learning and significance tests. Our tool uses a sites providing the network with a more global view of rigorous experimental design and statistical analysis to each user’s behaviors. ensure the statistical soundness of our results. We use People are concerned about behavioral marketing AdFisher to find that the Ad Settings was opaque about on the web (e.g., [2]). To increase transparency and con- some features of a user’s profile, that it does provide trol, Google provides Ad Settings, which is “a Google some choice on ads, and that these choices can lead to tool that helps you control the ads you see on Google seemingly discriminatory ads. In particular, we found services and on websites that partner with Google” [3]. that visiting webpages associated with substance abuse It displays inferences Google has made about a user’s changed the ads shown but not the settings page. We demographics and interests based on his browsing be- also found that setting the gender to female resulted in havior. Users can view and edit these settings at getting fewer instances of an ad related to high paying http://www.google.com/settings/ads jobs than setting it to male. We cannot determine who Yahoo [4] and Microsoft [5] also offer personalized ad caused these findings due to our limited visibility into settings. the ad ecosystem, which includes Google, advertisers, However, they provide little information about how websites, and users. Nevertheless, these results can form these pages operate, leaving open the question of how the starting point for deeper investigations by either the completely these settings describe the profile they have companies themselves or by regulatory bodies. about a user. In this study, we explore how a user’s behaviors, either directly with the settings or with content Keywords: blackbox analysis, information flow, behav- providers, alter the ads and settings shown to the user ioral advertising, transparency, choice, discrimination and whether these changes are in harmony. In particu- DOI 10.1515/popets-2015-0007 lar, we study the degree to which the settings provides Received 11/22/2014; revised 2/18/2015; accepted 2/18/2015. transparency and choice as well as checking for the presence of discrimination. Transparency is important for people to understand how the use of data about them 1 Introduction affects the ads they see. Choice allows users to control how this data gets used, enabling them to protect the information they find sensitive. Discrimination is an in- Problem and Overview. With the advancement of creasing concern about machine learning systems and tracking technologies and the growth of online data ag- one reason people like to keep information private [6, 7]. gregators, data collection on the Internet has become a To conduct these studies, we developed AdFisher, a tool for automating randomized, controlled experiments for studying online tracking. Our tool offers a combi- *Corresponding Author: Amit Datta: Carnegie Mellon nation of automation, statistical rigor, scalability, and University, E-mail: [email protected] Michael Carl Tschantz: International Computer Science explanation for determining the use of information by Institute, E-mail: [email protected] web advertising algorithms and by personalized ad set- Anupam Datta: Carnegie Mellon University, E-mail: danu- tings, such as Google Ad Settings. The tool can simulate [email protected] having a particular interest or attribute by visiting web- Automated Experiments on Ad Privacy Settings 93 pages associated with that interest or by altering the ad dict which group an agent belonged to, from the ads settings provided by Google. It collects ads served by shown to an agent. The classifier is trained on a subset Google and also the settings that Google provides to of the data. A separate test subset is used to deter- the simulated users. It automatically analyzes the data mine whether the classifier found a statistically signif- to determine whether statistically significant differences icant difference between the ads shown to each group between groups of agents exist. AdFisher uses machine of agents. In this experiment, AdFisher found a clas- learning to automatically detect differences and then ex- sifier that could distinguish between the two groups of ecutes a test of significance specialized for the difference agents by using the fact that only the agents that visited it found. the substance abuse websites received ads for Watershed Someone using AdFisher to study behavioral tar- Rehab. geting only has to provide the behaviors the two groups We also measured the settings that Google provided are to perform (e.g., visiting websites) and the measure- to each agent on its Ad Settings page after the experiments (e.g., which ads) to collect afterwards. AdFisher mental group of agents visited the webpages associated can easily run multiple experiments exploring the causal with substance abuse. We found no differences (signif- connections between users’ browsing activities, and the icant or otherwise) between the pages for the agents. ads and settings that Google shows. Thus, information about visits to these websites is in- The advertising ecosystem is a vast, distributed, and deed being used to serve ads, but the Ad Settings page decentralized system with several players including the does not reflect this use in this case. Rather than provid- users consuming content, the advertisers, the publishers ing transparency, in this instance, the ad settings were of web content, and ad networks. With the exception of opaque as to the impact of this factor. the user, we treat the entire ecosystem as a blackbox. We In another experiment, we examined whether the measure simulated users’ interactions with this black- settings provide choice to users. We found that removing box including page views, ads, and ad settings. With- interests from the Google Ad Settings page changes the out knowledge of the internal workings of the ecosystem, ads that a user sees. In particular, we had both groups we cannot assign responsibility for our findings to any of agents visit a site related to online dating. Then, only single player within it nor rule out that they are un- one of the groups removed the interest related to online intended consequences of interactions between players. dating. Thereafter, the top ads shown to the group that However, our results show the presence of concerning kept the interest were related to dating but not the top effects illustrating the existence of issues that could be ads shown to the other group. Thus, the ad settings do investigated more deeply by either the players them- offer the users a degree of choice over the ads they see. selves or by regulatory bodies with the power to see the We also found evidence suggestive of discrimina- internal dynamics of the ecosystem. tion from another experiment. We set the agents’ gen- Motivating Experiments. In one experiment, we ex- der to female or male on Google’s Ad Settings page. We plored whether visiting websites related to substance then had both the female and male groups of agents abuse has an impact on Google’s ads or settings. We visit webpages associated with employment. We estab- created an experimental group and a control group of lished that Google used this gender information to se- agents. The browser agents in the experimental group lect ads, as one might expect. The interesting result was visited websites on substance abuse while the agents in how the ads differed between the groups: during this ex- the control group simply waited. Then, both groups of periment, Google showed the simulated males ads from agents collected ads served by Google on a news website. a certain career coaching agency that promised large Having run the experiment and collected the data, salaries more frequently than the simulated females, a we had to determine whether any difference existed in finding suggestive of discrimination. Ours is the first the outputs shown to the agents. One way would be to study that provides statistically significant evidence of intuit what the difference could be (e.g. more ads con- an instance of discrimination in online advertising when taining the word “alcohol”) and test for that difference. demographic information is supplied via a transparency- However, developing this intuition can take consider- control mechanism (i.e., the Ad Settings page). able effort. Moreover, it does not help find unexpected While neither of our findings of opacity or discrimi- differences. Thus, we instead used machine learning to nation are clear violations of Google’s privacy policy [8] automatically find differentiating patterns in the data. and we do not claim these findings to generalize or im- Specifically, AdFisher finds a classifier that can pre- ply widespread issues, we find them concerning and war- ranting further investigation by those with visibility into Automated Experiments on Ad Privacy Settings 94 the ad ecosystem.

Load more