Measuring Smart TV Advertising and Tracking
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings on Privacy Enhancing Technologies ; 2020 (2):129–154 Janus Varmarken†*, Hieu Le†, Anastasia Shuba, Athina Markopoulou, and Zubair Shafq The TV is Smart and Full of Trackers: Measuring Smart TV Advertising and Tracking Abstract: In this paper, we present a large-scale mea- 1 Introduction surement study of the smart TV advertising and track- ing ecosystem. First, we illuminate the network behav- Smart TV adoption has steadily grown over the last few ior of smart TVs as used in the wild by analyzing net- years, with more than 37% of US households owning at work traÿc collected from residential gateways. We fnd least one smart TV in 2018, a 16% increase over the that smart TVs connect to well-known and platform- previous year [1]. This growth is driven by two trends. specifc advertising and tracking services (ATSes). Sec- First, over-the top (OTT) video streaming services such ond, we design and implement software tools that sys- as Hulu and Netfix have become popular, with more tematically explore and collect traÿc from the top-1000 than 28 million and 60 million subscribers in the US, re- apps on two popular smart TV platforms, Roku and spectively [2]. Second, smart TV solutions are available Amazon Fire TV. We discover that a subset of apps at relatively a˙ordable prices, with many of the external communicate with a large number of ATSes, and that smart TV boxes/sticks priced less than $50, while built- some ATS organizations only appear on certain plat- in smart TVs now cost only a few hundreds dollars [3]. forms, showing a possible segmentation of the smart As a result, a diverse set of smart TV platforms have TV ATS ecosystem across platforms. Third, we evaluate emerged, and have been integrated into various smart the (in)e˙ectiveness of DNS-based blocklists in prevent- TV products. For example, Apple TV integrates tvOS, ing smart TVs from accessing ATSes. We highlight that and TCL and Sharp TVs integrate Roku. even smart TV-specifc blocklists su˙er from missed ads Many of these platforms have their own respective and incur functionality breakage. Finally, we examine app store, where the vast majority of smart TV apps our Roku and Fire TV datasets for exposure of person- are ad-supported [4]. OTT advertising, which includes ally identifable information (PII) and fnd that hun- smart TV, is expected to increase by 40% to $2 billion in dreds of apps exfltrate PII to third parties and plat- 2018 [5]. Roku and Fire TV are two of the leading smart form domains. We also fnd evidence that some apps TV platforms in number of ad requests [6]. Despite their send the advertising ID alongside static PII values, ef- increasing popularity, the advertising and tracking ser- fectively eliminating the user’s ability to opt out of ad vices (“ATSes”) on smart TVs are currently not well personalization. understood by users, researchers, and regulators. In this Keywords: Smart TV; privacy; tracking; advertising; paper, we present one of the frst large-scale measure- blocklists ment studies of the emerging smart TV advertising and tracking ecosystem. DOI 10.2478/popets-2020-0021 Received 2019-08-31; revised 2019-12-15; accepted 2019-12-16. In the Wild Measurements (§3). First, we analyze the network traÿc of smart TV devices in the wild. We instrument residential gateways of 41 homes and collect fow-level summary logs of the network traÿc gener- *Corresponding Author: Janus Varmarken†: University ated by 57 smart TVs from seven di˙erent platforms. of California, Irvine, E-mail: [email protected] The comparative analysis of network traÿc by di˙er- † Hieu Le : University of California, Irvine, E-mail: ent smart TV platforms uncovers similarities and di˙er- [email protected]. (†The frst two authors made equal contribu- ences in their characteristics. As expected, we fnd that tions and share the frst authorship). Anastasia Shuba: Broadcom Inc. (The author was a student a substantial fraction of the traÿc is related to popular at the University of California, Irvine at the time the work was video streaming services such as Netfix and Hulu. More conducted), E-mail: [email protected] importantly, we fnd generic as well as platform-specifc Athina Markopoulou: University of California, Irvine, E- ATSes. Although realistic, the in the wild dataset does mail: [email protected] not provide app-level visibility, i.e., we cannot deter- Zubair Shafq: University of Iowa, E-mail: zubair- [email protected] mine which apps generate traÿc to ATSes. To address this limitation, we undertake the following major e˙ort. The TV is Smart and Full of Trackers 130 Controlled Testbed Measurements (§4). We de- by multiple apps (“app prevalence”) and keyword search sign and implement two software tools, Rokustic for (based on ATS related words like “ads” and “measure”). Roku and Firetastic for Amazon Fire TV, which sys- PII Exposures (§6). We further examine the net- tematically explore apps and collect their network traf- work traces of our testbed datasets and fnd that hun- fc. We use Rokustic and Firetastic to exercise the top- dreds of apps exfltrate personally identifable informa- 1000 apps of their respective platforms, and refer to the tion (PII) to third parties and platform-specifc parties, collected network traÿc as our testbed datasets. We an- mostly for non-functional advertising and tracking pur- alyze the testbed datasets w.r.t. the top Internet des- poses. Alarmingly, we fnd that many apps send the ad- tinations the apps contact, at the granularity of fully vertising ID alongside static PII values such as the de- qualifed domain names (FQDNs), e˙ective second level vice’s serial number. This eliminates the user’s ability domains (eSLDs), and organizations. We use “domain”, to opt out of personalized advertisements by resetting “endpoint”, and “destination” interchangeably in place the advertising ID, since the ATS can simply link an old of FQDN and eSLD when the distinction is clear from advertising ID to its new value by joining on the serial the context. We further separate destinations as frst, number. We evaluate the blocklists’ ability to prevent third, and platform-specifc party, w.r.t. to the app that exposures of PII and fnd that they generally perform contacts them. well for Roku, but struggle to prevent exfltration of the First, we fnd that the majority of apps contact few device’s serial number and the device ID to third parties ATSes, while about 10% of the apps contact a large and the platform-specifc party for Fire TV. number of ATSes. Interestingly, many of these more Contributions. In this paper, we analyze the network concerning apps come from a small set of developers. behavior of smart TVs, both in the wild and in the lab. Second, we fnd what appears to be a segmentation of Our contributions include the following: (1) providing the smart TV ATS ecosystem across Roku and Fire TV an in-depth comparative analysis of the ATS ecosystems as (1) the two datasets have little overlap in terms of of Roku, Fire TV, and Android; (2) illuminating the ATS domains; (2) some third party ATSes are among key players within the Roku and Fire TV ATS ecosys- the key players on one platform, but completely absent tems by mapping domains to eSLDs and parent orga- on the other; and (3) apps that are present on both plat- nizations; (3) evaluating the e˙ectiveness and adverse forms have little overlap in terms of the domains they e˙ects of an extensive set of blocklists, including smart contact. Third, we compare the top third party ATS TV specifc blocklists; (4) instrumenting long experi- domains of the testbed datasets to those of Android. ments per app to uncover approximately twice as many Evaluation of DNS-Based Blocklists (§5). Users domains as [11]; and (5) making our tools, Rokustic and typically rely on DNS-based blocking solutions such as Firetastic, and our testbed datasets available [12]. Pi-hole [7] to prevent in-home devices such as smart Outline. The structure of the rest of the paper is as fol- TVs from accessing ATSes. Thus, we evaluate the e˙ec- lows. Section 2 provides background on smart TVs and tiveness of DNS-based blocklists, selecting those that reviews related work. Section 3 presents the in the wild are most relevant to smart TVs. Specifcally, we ex- measurement and analysis of 57 smart TVs from seven amine and test four popular blocklists: (1) Pi-hole De- di˙erent platforms in 41 homes. Section 4 presents our fault blocklist (PD) [7], (2) Firebog’s recommended ad- systematic testing approach and comparative analysis vertising and tracking lists (TF) [8], (3) Mother of all of the top-1000 Roku and Fire TV apps. Section 5 eval- Ad-Blocking (MoaAB) [9], and (4) StopAd’s smart TV uates four well-known DNS-based blocklists and shows specifc blocklist (SATV) [10]. Our comparative analysis their limitations through analysis of missed ATSes and shows that block rates vary, with Firebog having the app breakage. Section 6 investigates exfltration of PII. highest coverage across di˙erent platforms and StopAd Section 7 concludes the paper, discusses limitations, and blocking the least. We further investigate potential false outlines directions for future work. The appendix pro- negatives (FN) and false positives (FP). We discover vides additional details and results, as well as an evalu- that blocklists miss di˙erent ATSes (FN), some of which ation and discussion of the limitations of our approach. are missed by all blocklists, while more aggressive block- lists can su˙er from false positives that result in break- ing app functionality.