Addendum for Advertising-based Measurement: A Platform of 7 Billion Mobile Devices

Mark D. Corner, Brian N. Levine, Omar Ismail, and Angela Upreti College of Information and Computer Sciences University of Massachusetts Amherst, MA, USA mcorner,levine,oismail,[email protected] July 24, 2017

APPENDIX Given the controversial nature of studying censorship, we had to remove the following information from our submission to MobiCom [4]. We received negative feed- back from the MobiSys program committee on this issue and felt that it was not worth the risk in having it re- Figure 1: The 88×88 image we retrieved from , blocked in China and . jected at MobiCom. We disagree with the conclusion reached by the MobiSys TPC and in particular we do not believe that the non-interrogative nature of TPC is the best place to evaluate human subjects concerns. ernment may block or tamper with images or data re- You are welcome to cite this work separately from the quested by the browser. In addition, page access is ruled MobiCom paper. by the same-origin policy, which is a rule applied by all Before conducting the research described here, we browsers disallowing a script coming from one domain sought and obtained the consent of our own Institu- from requesting data from other domains. tional Review Board (IRB) under protocol numbers 2016- One of the rare exceptions to same-origin is that HTML 3112 and 2016-3141. Before starting the censorship image tags for arbitrary domains can be placed in a doc- study we sought additional IRB approval, as similar ument, though the script cannot access the contents of measurements taken from a website have been consid- those images. By placing an image tag pointing to an- ered an area of ethical concern by some in our com- other domain and receiving javascript callbacks on im- munity [5]. In our study of censorship, users unwit- age loads, AaaP can measure how long an image takes tingly retrieve blocked images. Burnett et al[3] used a to load, if the image was fetched correctly, and get the similar technique of background image loading to mea- original (i.e., natural) image width and height. Using sure censorship, but relied on using their own webpage. image tags, AaaP can measure if particular domains are Their project was submitted to the Georgia Tech and blocked in the device’s network environment, in partic- Princeton IRBs, which declined to review the project ular imposed by governments. because it was not collecting PII. Our IRB reviewed A similar technique for embedding images was de- our project and has also allowed the study, with restric- scribed in Encore [3] using web pages rather than ad- tions that include using what the Encore paper calls vertisements. The advantages in using AaaP to measure “ubiquitous, yet uninteresting URLs”[3]. These URLs censorship is that we have extremely fine-grained con- are often found embedded in webpages and constitute trol over how and when the measurement occurs. In non-controversial content (e.g., the logo), even if a web page, one has to entice users to visit the page they come from sites generally blocked by governments, and thus measurements may be poorly distributed and such as and Google; see Fig. 1. irregularly timed, whereas with advertisements specific geographic areas can be targeted to give fine grained A. ADVERSARIAL ENVIRONMENTS AND resolution. For example, we can find if specific areas of a country are censored, such as blocks often placed in CENSORSHIP certain regions in Turkey [1]. Challenge: Web retrievals made by an adver- To measure censorship AaaP includes javascript to tisement may be blocked or severely hampered add an image tag with a source taken from a set of by the network environment and browser. For image URLs we hypothesize to be censored or not cen- instance, a firewall operated by a company or a gov- sored in various countries. Based on our IRB proto-

1 . Proc. ACM . Snowbird, Utah, , 17(3–5), 2012. , 45(4):653–667, 2015. ACM SIGCOMM Computer , pages 17–19. ACM, 2015. First Monday DigiKala, a Persian shopping site ex- Proceedings of the 23rd ACM (iii) ://turkeyblocks.org/2016/10/27/new- internet-shutdown-turkey-southeast- offline-diyarbakir-unrest/ Smith. Censorship and deletionsocial practices media. in chinese Lightweight measurement of web censorshipcross-origin with requests. Communication Review and Angela Upreti. Advertising-based measurement: A platform ofdevices. 7 In billion mobile International Conference on Mobileand Computing Networking (MobiCom) USA, October 2017. Paxson, and Nick Weaver. Ethicalcensorship concerns measurement. for In SIGCOMM Workshop on EthicsSystems in Research Networked Most importantly, these results demonstrate a gener- Images from blockedblocked sites by all in ISPs. Chinaalways For example, allowed and Facebook by Iran was ISPs almost Hong are in Guangdong not Kong) (which and contains Network. the Chinahibits Education poor and performance, Research of except access in is Iran, notwith and due poor the to hosting lack censorship,effective, performance. and but inexpensive rather method Overall, antime for measurement issue AaaP of inexpensive blocked is real- andnetworks. an excessively hampered ally useful feature in AaaP:and the to ability remote to resources test on bothmore the from Internet. techniques that There are weadditional even can exploration, such use as in usingsources, CORS-enabled AaaP, JSON-P re- but to require testimages, content APIs, and rather WebRTC to than measure just P2P accesses. B. REFERENCES [1] [2] David Bamman, Brendan O’Connor, and Noah [3] Sam Burnett and Nick Feamster. Encore: [4] Mark D. Corner, Brian N. Levine, Omar Ismail, [5] Ben Jones, Roya Ensafi, Nick Feamster, Vern 2 (ii) CN IRN VNM SAU USA

GoogleHK GoogleVN ; the image GoogleSA Google result. The ex- null Youtube slowly null Facebook

failed DigiKala

Twitter and correctly (we verified

slow Blogfa fast Zing

fast Haraj Varzesh3 ICANN QQ Baidu 0% 0% 0% 0% 0% 50% 25% 25% 75% 50% 25% 75% 75% 50% 25% 75% 50% 75% 50% 25% to load; or the image failed to load by the time 100% 100% 100% 100% 100% The primary result is that we can effectively mea- We can conclude a number of things from the results: failed the advertisement closed, termedperiment a comprises more than 27king measurements, $10. cost- We furtherusing verified the open results web throughof proxies testing the in experiment, the includingfrom the countries. and sites the countries the The we images tested results came are shown in(i) Figure 2. sure censorship of Facebook,China and Google, the and censorship ofQQ YouTube and in Youtube Zing.vn—both in of Iran, these assaging companies well provide and as mes- chat applicationsfor which censorship. are In typicalin China, targets a images timely manner, typicallyate rather fail error. than to Interestingly, producing the load from an results immedi- show that is anof not image Twitter blocked, even is though known the to content be blocked in China [2]. the native height and width ofSSL the to image further and fetch verify using age); the client the received image the loaded correct correctly im- but col, we used onlyimages “uninteresting” and such non-controversial as logosFigure and 1). user-interface We then elementscurred: determine (e.g., the which image of loaded four results oc- Figure 2: Measurementsthan are 3 marked seconds.load slow by if the NULL time loading= measurements the requires $10.06; ad means Impressions more closed = the by 27,760; the image user Efficiency failed or = application. 100%. to Total cost