<<

A Long Way to the Top: Significance, Structure, and Stability of Top Lists

Quirin Scheitle Oliver Hohlfeld Julien Gamba Technical University of Munich RWTH Aachen University IMDEA Networks Institute/ Universidad Carlos III de Madrid Jonas Jelten Torsten Zimmermann Stephen D. Strowes Technical University of Munich RWTH Aachen University RIPE NCC Narseo Vallina-Rodriguez IMDEA Networks Institute/ICSI ABSTRACT 1 INTRODUCTION A broad range of research areas including Internet measurement, Scientific studies frequently make use of a sample of DNS domain privacy, and network security rely on lists of target domains to be names for various purposes, be it to conduct lexical analysis, to analysed; researchers make use of target lists for reasons of neces- measure properties of domains, or to test whether a new algorithm sity or efficiency. The popular Alexa list of one million domainsis works on real domains. Internet top lists, such as the Alexa or Cisco a widely used example. Despite their prevalence in research papers, Umbrella Top 1M lists, serve the purpose of providing a reputedly the soundness of top lists has seldom been questioned by the com- representative sample of Internet domains in popular use. These munity: little is known about the lists’ creation, representativity, top lists can be created with different methods and data sources, potential biases, stability, or overlap between lists. resulting in different sets of domains. In this study we survey the extent, nature, and evolution of top The prevalence and opacity of these lists could have introduced lists used by research communities. We assess the structure and an unchecked bias in science—for 10 networking venues in 2017 stability of these lists, and show that rank manipulation is possible alone, we count 69 publications that use a top list. This potential for some lists. We also reproduce the results of several scientific bias is based on the fact that curators of such top lists commonly studies to assess the impact of using a top list at all, which list conceal the data sources and ranking mechanism behind those lists, specifically, and the date of list creation. We findthat (i) top lists which are typically seen as a proprietary business asset in the search generally overestimate results compared to the general population engine optimisation (SEO) space [4]. This leaves researchers using by a significant margin, often even an order of magnitude, and (ii) those lists with little to no information about content, stability, some top lists have surprising change characteristics, causing high biases, evolution and representativity of their contents. day-to-day fluctuation and leading to result instability. We conclude In this work, we analyse three popular top lists—Alexa Global [1], our paper with specific recommendations on the use of top lists, Cisco Umbrella [2], and Majestic Million [3]—and discuss the fol- and how to interpret results based on top lists with caution. lowing characteristics: Significance: In a survey of 687 networking-related papers pub- lished in 2017, we investigate if, to what extent, and for what pur- CCS CONCEPTS pose, these papers make use of Internet top lists. We find that 69 • Networks → Network measurement; papers (10.0%) make use of at least one top list (cf., §3). Structure: Domain properties in different top lists, such as the surprising amount of invalid top-level domains (TLDs), low inter- ACM Reference Format: sections between various lists (<30%), and classifications of disjunct Quirin Scheitle, Oliver Hohlfeld, Julien Gamba, Jonas Jelten, Torsten Zim- arXiv:1805.11506v2 [cs.NI] 23 Sep 2018 domains, are investigated in §5. mermann, Stephen D. Strowes, and Narseo Vallina-Rodriguez. 2018. A Long Way to the Top:, Significance, Structure, and Stability of Internet Stability: We conduct in-depth longitudinal analyses of top list Top Lists. In 2018 Internet Measurement Conference (IMC ’18), October 31- stability in §6, revealing daily churn of up to 50% of domains. November 2, 2018, Boston, MA, USA. ACM, New York, NY, USA, 16 pages. Ranking Mechanisms: Through controlled experiments and ://doi.org/10.1145/3278532.3278574 reverse of the Alexa toolbar, we shed light on the ranking mechanisms of different top lists. In one experiment, we Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed place an unused test domain at a 22k rank in Umbrella (cf., §7). for profit or commercial advantage and that copies bear this notice and the full citation Research Result Impact: Scientific studies that use top lists for on the first page. Copyrights for components of this work owned by others than the Internet research measure characteristics of the targets contained author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission in each list, or the related infrastructure. To show the bias inherent and/or a fee. Request permissions from [email protected]. in any given target list, we run several experiments against top lists IMC ’18, October 31-November 2, 2018, Boston, MA, USA and the general population of all com/net/org domains. We show © 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-5619-0/18/10...$15.00 that top lists significantly exaggerate results, and that results even https://doi.org/10.1145/3278532.3278574 depend on the day of week a list was obtained (cf., §8). IMC ’18, October 31-November 2, 2018, Boston, MA, USA Scheitle et al.

We discuss related work and specific recommendations in §9. 3 SIGNIFICANCE OF TOP LISTS Throughout our work, we aim to adhere to the highest ethical Scientific literature often harnesses one or more of the top lists out- standards, and aim for our work to be fully reproducible. We share lined in §2. To better understand how often and to what purpose top code, data, and additional insights under lists are used by the literature, we survey 687 recent publications.

https://toplists.github.io 3.1 Methodology We survey papers published at 10 network-related venues in 2017, listed in Table 1. First, we search the 687 papers published at these venues for keywords2 in an automated manner. Next, we inspect 2 DOMAIN TOP LISTS matching papers manually to remove false positives (e.g., ’s This section provides an overview of various domain top lists and Alexa home assistant, or an author named Alexander), or papers their creation process. Each of these lists is updated daily. that mention the lists without actually using them as part of a study. Alexa: The most popular and widely used top list is the Alexa Finally, we reviewed the remaining 69 papers (10.0%) that made Global Top 1M list [1]. It is generated based on web activity moni- use of a top list, with various aims in mind: to understand the tored by the Alexa browser plugin1, “directly measured sources” [6] top lists used (§3.2), the nature of the study and the technologies and “over 25,000 different browser extensions”7 [ ] over the past measured (§3.3), whether the study was dependent on the list for its three months [8] from “millions of people” [6]. No information results (§3.4), and whether the study was possibly replicable (§3.5). exists on the plugin’s user base, which opens questions on potential Table 1 provides an overview of the results. biases in terms of, e.g., geography or age of its user base. Alexa We find the field of Internet measurement to be most reliant on lists are generally offered for sale, with few offerings. Paid top lists, used in 22.2% of the surveyed papers. Other fields also use offerings include top lists per country, , or region. The top lists frequently, such as security (8.5%), systems (6.4%) and web Global Top 1M list is the most popular free offering, available with technology (7.9%). no explicit license, and was briefly suspended in late 2016. Cisco Umbrella: Another popular top list is the Cisco Umbrella 3.2 Top Lists Used 1M, a top list launched in mid-December 2016. This list contains the Top 1M domains (including subdomains) as seen by Cisco’s We first investigate which lists and what subsets of lists are typically OpenDNS service [2]. This DNS-based nature is fundamentally used; Table 1 provides an overview of lists used in the studies we different from collecting visits or links. Hence, the Umbrella identified. We find 29 studies using the full Alexa Global Top1M, list contains Fully Qualified Domain Names (FQDN) for any kind the most common choice among inspected publications, followed of Internet service, not just web sites as in the case of Alexa or by a surprising variety of Alexa Top 1M subsets (e.g., Top 10k). Majestic. Without explicit license, it is provided “free of charge”. All papers except one [69] use a list collated by Alexa. This paper Majestic: The third top list is the Majestic Million [3], released instead uses the Umbrella Top 100 list to assess importance of ASes in October 2012. This creative commons licensed list is based on showing BGP bursts. No paper in our study used the Majestic list. Majestic’s . It ranks sites by the number of /24 IPv4- A study may also use multiple distinct subsets of a list. For subnets linking to that site [9]. This is yet another data collection example, one study uses the Alexa Global Top 1k, 10K, 500K and methodology and, similar to Alexa, heavily web-focused. While Top 1M at different stages of the study61 [ ]. We count these as the Majestic list is currently not widely used in research, we still distinct use-cases in the right section of Table 1. include it in our study for its orthogonal mechanism, its explicitly We also find that 59 studies exclusively use Alexa as a source open license, and its availability for several years. for domain names. Ten papers use lists from more than one origin; Other Top Lists: There are few other top lists available, but as one paper uses the Alexa Global Top 1M, the Umbrella Top 1M, and those are little used, not consistently available, or fluctuate in size, various DNS zone files as sources [21]. In total, two studies make we will not investigate them in detail in this paper. Quantcast [10] use of the Cisco Umbrella Top 1M [21, 67]. provides a list of the Top 1M most frequently visited per Category and country-specific lists are also being used: eight country, measured through their web intelligence plugin on sites. studies use country-specific lists from Alexa, usually choosing only Only the US-based list can be downloaded; all other lists can only one country; one study selected 138 countries [26]. Category-based be viewed online and show ranks only when paid. The Statvoo [11] lists are rarer still: two studies made use of category subsets [17, 71]. list provides an API and a download for their Top 1M sites, but has frequently been inaccessible in the months before this publication. 3.3 Characterisation of Studies Statvoo does not offer insights about the metrics they use in their To show that top lists are used for various types of studies, we look creation process. The Chrome UX report [12] publishes telemetry at the range of topics covered and technologies measured in our data about domains popular with Chrome users. It does not, how- surveyed papers. For each paper we assigned a broad purpose, and ever, rank domains or provide a static-sized set of domains. We also the network layer in focus. exclude the SimilarWeb Top Sites ranking [13] as it is not available Purposes: For all papers, we reviewed the broad area of study. for free and little used in science. The largest category we identified encompasses various aspects 1Available for Firefox and Chrome. Internet Explorer discontinued June 2016 [5] 2Those being: “alexa”, “umbrella”, and “majestic” Significance, Structure, and Stability of Internet Top Lists IMC ’18, October 31-November 2, 2018, Boston, MA, USA

Table 1: Left: Use of top lists at 2017 venues. The ‘dependent’ column indicates whether we deemed the results of the study to rely on the list used (‘Y’), or that the study relies on a list for verification (‘V’) of other results, or that a list is used but the outcome doesn’t relyonthe specific list selected (‘N’). The ‘date’ column indicates how many papers stated the date of list download or measurement. Right: Type of lists used in 69 papers from left. Multiple counts for papers using multiple lists.

using list # dependent # date? Alexa Global Top ... Venue Area Papers # % ↓ Y V N List Study References 1M 29 5k 2 ACM IMC Measurements 42 11 26.2% 8 2 1 1 3 [14–24] 100k 2 1k 5 PAM Measurements 20 4 20.0% 3 1 0 0 0 [25–28] 75k 1 500 8 TMA Measurements 19 3 15.8% 1 1 1 0 0 [29–31] 50k 2 400 1 USENIX Security Security 85 12 14.1% 8 4 0 2 0 [32–43] 25k 2 300 1 IEEE S&P Security 60 5 8.3% 3 2 0 1 1 [44–49] 20k 1 200 1 ACM CCS Security 151 11 7.3% 4 5 2 1 1 [50–60] 16k 1 100 8 NDSS Security 68 3 4.4% 2 0 1 0 0 [61–63] 10k 11 50 3 8k 1 10 1 ACM CoNEXT Systems 40 4 10.0% 2 1 1 0 1 [64–68] ACM SIGCOMM Systems 38 3 7.9% 3 0 0 0 0 [69–71] Alexa Country: 2 Alexa Category: 2 WWW Web Tech. 164 13 7.9% 11 1 1 2 3 [72–84] Umbrella 1M: 3 Total 687 69 10.0% 45 17 7 7 9 Umbrella 1k: 1 of security, across 38 papers in total: this includes at- on the list being used: a different set of domains in the list may tacks [81, 82], session safety during redirections [83], and domain have yielded different results. squatting [58], to name a few. Nine more papers study aspects of Verification (V): We identify 17 studies that use a list only to privacy & censorship, such as the Tor overlay network [61], or user verify their results. A typical example may be to develop some tracking [35]. Network or application performance is also a popular algorithm to find domains with a certain property, and then usea area: ten papers in our survey focus on this, e.g., HTTP/2 top list to check whether these domains are popular. In such cases, push [72], mobile web performance [71], and Internet latency [26]. the algorithm developed is independent of the list’s content. Other studies look at economic aspects such as hosting providers. Independent (N): Eight studies cite and use a list, but we deter- Layers: We also reviewed the network layers measured in each mine that their results are not necessarily reliant on the list. These study. Many of the papers we surveyed focus on web infrastructure: papers typically use a top list as one source among many, such that 22 of the papers are concerned with content, 8 focus on the HTTP(S) changes in the top list would likely not affect the overall results. protocols, and 7 focus on applications (e.g., browsers [39, 40]). Studies relating to core network protocols are commonplace: 3.5 Are Studies Replicable? DNS [32, 36, 51, 52, 61] (we identified 3 studies relating to domain Repeatability, replicability, and reproducibility are ongoing con- names as separate from DNS protocol measurements [24, 58, 63]), in Computer Networks [85, 86] and Internet Measurement [87]. TCP [19, 31], and IP [14, 15, 18, 30, 64, 69], and TLS/HTTPS [21, 37, While specifying the date of when a top list was downloaded, and 38, 50, 57, 76, 83] layer measurements are common in our survey. the date when measurements where conducted, are not necessarily Finally, we identify 12 studies whose experimental design mea- sufficient to reproduce studies, they are important first steps. sures more than one specific layer; e.g., cases studying a full con- Table 1 lists two “date” columns that indicate whether the list nection establishment (from initial DNS query to HTTP request). download date or the measurement dates were given3. Across all 69 We conclude from this that top lists are frequently used to explic- papers using top lists, only 7 stated the date the list was retrieved, itly or implicitly measure DNS, IP, and TLS/HTTPS characteristics, and 9 stated the measurement date. Unfortunately, only 2 papers which we investigate in depth in §8. give both the list and measurement data and hence fulfil these basic criteria for reproducibility. This does not necessarily mean that the other papers are not reproducible, authors may publish the specific 3.4 Are Results Dependent on Top Lists? top list used as part of data, or authors might be able to provide In this section, we discuss how dependent study results are on top the dates or specific list copies upon inquiry. However, recent in- lists. For this, we fill the “dependent” columns in Table 1 as follows: vestigations of reproducibility in networking hints that this may Dependent (Y): Across all papers surveyed, we identify 45 stud- be an unlikely expectation [87, 88]. We find two papers that explic- ies whose results may be affected by the list chosen. Such a study itly discuss instability and bias of top lists, and use aggregation or would take a list of a certain day, measure some characteristic over enrichment to stabilise results [45, 67]. the set of domains in that list, and draw conclusions about the mea- 3 sured characteristic. In these cases, we say that the results depend We require a specific day to be given to count a paper, the few papers just citinga year or month were counted as no date given IMC ’18, October 31-November 2, 2018, Boston, MA, USA Scheitle et al.

3.6 Summary the smaller set of professionally managed and well-established top Though our survey has a certain level of subjectivity, we consider level domains over the sometimes problematic new gTLDs [96–98]. its broad findings meaningful: (i) that top lists are frequently used, Invalid TLDs occur neither in any Top 1k domains nor in the (ii) that many papers’ results depend on list content, and (iii) that Alexa Top 1M domains, but as a minor count in the Majestic Top 1M few papers indicate precise list download or measurement dates. (7 invalid TLDs, resulting in 35 domain names), and significant We also find that the use of top lists to measure network and count in the Umbrella Top 1M: there, we can find 1,347 invalid 5 security characteristics (DNS, IP, HTTPS/TLS) is common. We fur- TLDs , in a total of 23k domain names (2.3% of the list). This is ther investigate how top list use impacts result quality and stability an early indicator of a specific characteristic in the Umbrella list: in studies by measuring these layers in §8. invalid domain names queried by misconfigured hosts or outdated software can easily get included into the list. 4 TOP LISTS DATASET Comparing valid and invalid TLDs also reveals another struc- tural change in the Alexa list on July 20th, 2014: before that date, For the three lists we focus on in this study, we source daily snap- Alexa had a fairly static count of 206 invalid and 248 valid TLDs. shots as far back as possible. Many snapshots come from our own Perhaps driven by the introduction of new gTLDs from 2013 [99], archives, and others were shared with us by other members of the Alexa changed its filtering: After that date, invalid TLDs have been research community, such as [89–91]. Table 2 gives an overview of reduced to ≈0, and valid TLDs have shown continued growth from our datasets along with some metrics discussed in §5. For the Alexa 248 to ≈800. This confirms again that top lists can undergo rapid list, we have a dataset with daily snapshots from January 2009 to and unannounced changes in their characteristics, which may sig- March 2012, named AL0912, and another from April 2013 to April nificantly influence measurement results. 2018, named AL1318. The Alexa list underwent a significant change Subdomain Depth is an important property of top lists. Base in January 2018; for this we created a partial dataset named AL18 domains offer more breadth and variety in setups, while subdomains after this change. For the Umbrella list, we have a dataset spanning may offer interesting targets besides a domain’s main . 2016 to 2018, named UM1618. For the Majestic Million list, we cover The ratio of base to subdomains is hence a breadth/depth trade- June 2017 to April 2018. off, which we explore for the three lists used. Table 2 shows the As many of our analyses are comparative between lists, we create average number of base domains (µBD ) per top list. We note that a JOINT dataset, spanning the overlapping period from June 2017 Alexa and Majestic contain almost exclusively base domains with to the end of April 2018. We also sourced individual daily snapshots few exceptions (e.g., for blogspot). In contrast, 28% of the names from the community and the [92], but only used in the Umbrella list are base domains, i.e., Umbrella emphasises periods with continuous daily data for our study. depth of domains. Table 2 also details the subdomain depth for a single-day snapshot (April 30, 2018) of all lists. As the Umbrella list 5 STRUCTURE OF TOP LISTS is based on DNS lookups, such deep DNS labels can easily become In this section, we analyse the structure and nature of the three top part of the Umbrella list, regardless of the origin of the request. In lists in our study. This includes questions such as top level domain fact, Umbrella holds subdomains up to level 33 (e.g., domains with (TLD) coverage, subdomain depth, and list intersection. extensive www prefixes or ‘.’-separated OIDs). DNS Terms used in this paper, for clarity, are the following: for We also note that the base domain is usually part of the list when www.net.in.tum.de, .de is the public suffix4 (and top level domain), its subdomains are listed. On average, each list contains only few tum.de is the base domain, in.tum.de is the first subdomain, and hundred subdomains whose base domain is not part of the list. net.in.tum.de is the second subdomain. Hence, www.net.in.tum.de Domain Aliases are domains with the same second-level do- counts as a third-level subdomain. main, but different top-level domains, e.g., .com and google.de. Table 2 shows the number of domain aliases as DU PSLD . We find a 5.1 Depth and Breadth moderate level of ≈5% of domain aliases within various top lists, A first characteristic to understand about top lists is the scope of with only 1.5% for Majestic. Analysis reveals a very flat distribution, ≈ their coverage: how many of the active TLDs do they cover, and with the top entry google at 200 occurrences. how many do they miss? How deep are they going into specific subdomains, choosing trade-offs between breadth and depth? TLD Coverage is a first indicator of list breadth. Per IANA[94, 5.2 Intersection between Lists 95], 1,543 TLDs exist as of May 20th, 2018. Based on this list, we We next study intersection between lists—all 3 lists in our study count valid and invalid TLDs per list. The average coverage of valid promise a view on the most popular domains (or websites) in the TLDs in the JOINT period is ≈700 TLDs, covering only about 50% Internet, hence measuring how much these lists agree6 is a strong of active TLDs. This implies that measurements based on top lists indicator of bias in list creation. Figure 1a shows the intersection may miss up to 50% of TLDs in the Internet. between top lists over time during the JOINT period. We see that At the Top 1k level we find quite different behaviour with 105 the intersection is quite small: for the Top1M domains, Alexa and valid TLDs for Alexa, 50 for Majestic, but only 13 (com/net/org and Majestic share 285k domains on average during the JOINT duration. few other TLDs) for Umbrella. We speculate that this is rooted in 5Examples for invalid TLDs: , localdomain, server, cpe, 0, big, cs DNS administrators from highly queried DNS names preferring 6To control for varying subdomain length, we first normalise all lists to unique base domains (cf. µBD in Table 2, reducing e.g., Umbrella to 273k base domains) 4per Public Suffix List [93], a browser-maintained list aware of cases suchas co.uk. Significance, Structure, and Stability of Internet Top Lists IMC ’18, October 31-November 2, 2018, Boston, MA, USA

Table 2: Datasets: mean of valid TLDs covered (µTLD ), mean of base domains (µBD ), mean of sub-domain level spread (SDn for share of n-th level subdomains, SDM for maximum sub-domain level), mean of domain aliases (DU PSLD ), mean of daily change (µ∆) and mean of new (i.e., not included before) domains per day (µNEW ). Footnote 4: Average after Alexa’s change in January 18.

List Top Dataset Dates µTLD ± σ µBD ± σ SD1 SD2 SD3 SDM DU PSLD µ∆ µNEW Alexa 1M AL0912 29.1.09–16.3.12 248 ± 2 973k ± 2k 1.6% 0.4% ≈0% 4 47k ± 2k 23k n/a Alexa 1M AL1318 30.4.13–28.1.18 545 ± 180 972k ± 6k 2.2% 0.1% ≈0% 4 49k ± 3k 21k 5k Alexa 1M AL18 29.1.18–30.4.18 771 ± 8 962k ± 4k 3.7% ≈0% ≈0% 4 45k ± 1k 483k 121k Alexa 1M JOINT 6.6.17–30.4.18 760 ± 11 972k ± 7k 2.6% ≈0% ≈0% 4 51k ± 4k 147k 38k Umbrella 1M JOINT 6.6.17–30.4.18 580 ± 13 273k ± 13k 49.9% 14.7% 5.9% 33 15k ± 1k 100k 22k Majestic 1M JOINT 6.6.17–30.4.18 698 ± 14 994k ± 617 0.4% ≈0% ≈0% 4 49k ± 1k 6k 2k Alexa 1k JOINT 6.6.17- 30.4.18 105 ± 3 990 ±2 1.3% 0.0% 0.0% 1 22 ± 2 9 (784) 4 (84) Umbrella 1k JOINT 6.6.17–30.4.18 13 ± 1 317 ±6 52.0% 14% ≈0% 6 11 ± 2 44 2 Majestic 1k JOINT 6.6.17–30.4.18 50 ± 1 939 ±3 5.9% 0.1% 0.1% 4 32 ± 1 5 .8 Umbrella 1M UM1618 15.12.16–30.4.18 591 ± 45 281k ± 16k 49.4% 14.5% 5.7% 33 15k±1k 118k n/a

Alexa and Umbrella agree on 150k, Umbrella and Majestic on 113k, Table 3: Share of one-week Top 1k disjunct domains present in and all three only on 99k out of 1M domains. hpHosts (blacklist), Lumen (mobile), and Top 1M of other top lists. For the Top1k lists, the picture is more pronounced. On average List # Disjunct % hpHosts % Lumen % Top 1M during the JOINT period, Alexa and Majestic agree on 295 domains, Alexa and Umbrella on 56, Majestic and Umbrella on 65, and all Alexa 1,224 3.10% 1.55% 99.10% Umbrella 1,116 20.16% 39.43% 25.63% three only on 47 domains. Majestic 665 1.95% 3.76% 93.63% This disparity between top domains suggests a high bias in the list creation. We note that even both web-based lists, Alexa and Majestic, only share an average of 29% of domains. as MalwareBytes’ hpHosts ATS file [103]. We also check if the do- Standing out from Figure 1a is the fact that the Alexa list has mains from a given top list can be found in the aggregated Top 1M changed its nature in January 2018, reducing the average intersec- of the other two top lists during the same period of time. Table 3 tion with Majestic from 285k to 240k. This change also introduced summarises the results. As we suspected, Umbrella has significantly a weekly pattern, which we discuss further in §6.2. We speculate more domains flagged as “mobile traffic” and third-party advertis- that Alexa might have reduced its 3-month sliding window [8], ing and tracking services than the other lists. It also has the lowest making the list more volatile and susceptible to weekly patterns. proportion of domains shared with other Top 1M lists. We contacted Alexa about this change, but received no response. This confirms that Umbrella is capable of capturing domains from any device using OpenDNS, such as mobile and IoT devices, and also include domains users are not aware of visiting, such 5.3 Studying Top List Discrepancies as embedded third-party trackers in websites. Alexa and Majestic The low intersection between Umbrella and the other lists could be provide a web-specific picture of popular Internet domains. rooted in the DNS vs. web-based creation. Our hypothesis is that the web-based creation of Alexa and Majestic lists tends to miss 6 STABILITY OF TOP LISTS domains providing embedded content as well as domains popular Armed with a good understanding of the structure of top lists, we on mobile applications [67, 100]. In this section, we explore the now focus on their stability over time. Research has revealed hourly, origin of discrepancies across domain lists. daily and weekly patterns on ISP traffic and service load, as well as We aggregate the Alexa, Umbrella, and Majestic Top 1k domains significant regional and demographic differences in accessed con- from the last week of April 2018, and analyse the set of 3,005 disjunct tent due to user habits [104–107]. We assess whether such patterns domains across these lists, i.e., those found only in a single list. 40.7% also manifest in top lists, as a first step towards understanding the of these domains originate from Alexa, 37.1% from Umbrella, and impact of studies selecting a top list at a given time. 22.1% from Majestic. Subsequently, we identify whether the disjunct domains are associated with mobile traffic or third-party advertis- 6.1 Daily Changes ing and tracking services not actively visited by users, but included We start our analysis by understanding the composition and evolu- through their DNS lookups. We opt against utilizing domain classi- tion of top lists on a daily basis. As all top lists have the same size, fiers such as the OpenDNS Domain Tagging service101 [ ], as it has we use the raw count of daily changing domains for comparison. been reported that categories are vague and coverage is low [100]. Figure 1b shows the count of domains that were removed daily, Instead, we use the data captured by the Lumen Privacy Mon- specifically the count of domains present in a list onday n but itor [102] to associate domains with mobile traffic for more than not on day n+1. The Majestic list is very stable (6k daily change), 60,000 Android apps, and use popular anti-tracking blacklists such the Umbrella list offers significant churn (118k), and the Alexa list IMC ’18, October 31-November 2, 2018, Boston, MA, USA Scheitle et al.

(a) Intersection between Top1M lists (live). (b) Daily changes of Top1M entries.(live) (c) Average % daily change over rank. Figure 1: Intersection, daily changes and average stability of top lists (y-axis re-scaled at 10% in right plot). Click for live version/source code used to be stable (21k), but drastically changed its characteristic in This behaviour is further confirmed in Figure 2b. In this figure, January 2018 (483k), becoming the most unstable list. we compute the intersection between a fixed starting day and the The fluctuations in the Umbrella list, and in the Alexa list after upcoming days. We compute it seven times, with each day of the January 2018, are weekly patterns, which we investigate closer in first week of the JOINT dataset as the starting day. Figure 2bshows §6.2. The average daily changes are given in column µ∆ of Table 2. the daily median value between these seven intersections. Which Ranks Change? Previous studies of Internet traffic This shows several interesting aspects: (i) the long-term trend revealed that the distribution of accessed domains and services fol- in temporal decay per list, confirming much of what we have seen lows a power-law distribution [68, 104–107]. Therefore, the ranking before (high stability for Majestic, weekly patterns and high insta- of domains in the long tail should be based on significantly smaller bility for Umbrella and the late Alexa list) (ii) the fact that for Alexa and hence less reliable numbers. and Umbrella, the decay is non-monotonic, i.e., a set of domains is Figure 1c displays the stability of lists depending on subset size. leaving and rejoining at weekly intervals. The y-axis shows the mean number of daily changing domains in For How Long are Domains Part of a Top List? We inves- the top X domains, where X is depicted on the x-axis. For example, tigate the average number of days a domain remains in both the an x-value of 1000 means that the lines at this point show the aver- Top 1M and Top 1k lists in Figure 2c. This figure displays a CDF age daily change per list for the Top 1k domains. The figure shows with the number of days from the JOINT dataset in the x-axis, and instability increasing with higher ranks for Alexa and Umbrella, the normalised cumulative probability that a domain is included but not for Majestic. We plot Alexa before and after its January on the list for X or fewer days. Our analysis reveals significant 2018 change, highlighting the significance of the change across all differences across lists. While about 90% of domains in theAlexa its ranks–even its Top 1k domains have increased their instability Top 1M list are in the list for 50 or fewer days, 40% of domains from 0.62% to 7.7% of daily change. in the Majestic Top 1M list remain in the list across the full year. New or In-and-out Domains? Daily changes in top lists may With this reading, lines closer to the lower right corner are “better” stem from new domains joining, or from previously contained in the sense that more domains have stayed in the list for longer domains re-joining. To evaluate this, we cumulatively sum all the periods, while lines closer to the upper left indicate that domains unique domains ever seen in a list in Figure 2a, i.e., a list with get removed more rapidly. The lists show quite different behaviour, only permutations of the same set of domains would be a flat line. with Majestic Top 1k being the most stable by far (only ≈ 26% of Majestic exhibits linear growth: every day, about 2k previously not domains present on < 100% of days), and being followed by Majes- included domains are added to it — approximately a third of the tic Top 1M, Umbrella Top 1k, Alexa Top 1k, Umbrella Top 1M, and 6k total changing domains per day (i.e., 4k domains have rejoined). Alexa Top 1M. The Majestic Top 1M list offers stability similar to Over the course of a year, the total count of domains included in the Alexa and Umbrella Top 1k lists. the Majestic list is 1.7M. Umbrella adds about 20k new domains per day (out of 118k daily changing domains), resulting in 7.3M 6.2 Weekly Patterns domains after one year. Alexa grows by 5k (of 21k) and 121k (of We now investigate the weekly7 pattern in the Alexa and Umbrella 483k) domains per day, before and after its structural change in lists as observed in Figure 1b. We generally do not include Majestic January 2018. Mainly driven from the strong growth after Alexa’s as it does not display a weekly pattern. In this section, we resort to change, its cumulative number of domains after one year is 13.5M. various statistical methods to investigate those weekend patterns. This means that a long-term study of the Alexa Top 1M will, over We will describe each one of them in their relevant subsection. the course of this year, have measured 13.5M distinct domains. How Do Domain Ranks Change over the Weekends? The Across all lists, we find an average of 20% to 33% of daily changing weekly periodical patterns shown in Figure 1b show that list con- domains to be new domains, i.e., entering the list for the first time. tent depends on the day of the week. To investigate this pattern This also implies that 66% to 80% of daily changing domains are statistically, we calculate a weekday and weekend distribution of domains that are repeatedly removed from and inserted into a list. the rank position of a given domain and compute the distance be- We also show these and the equivalent Top 1k numbers in column tween those two distribution using the Kolmogorov-Smirnov (KS) µNEW of Table 2. 7It is unclear what cut-off times list providers use, and how they offset time zones. For our analysis, we map files to days using our download timestamp Significance, Structure, and Stability of Internet Top Lists IMC ’18, October 31-November 2, 2018, Boston, MA, USA

(a) Cumulative sum of all domains ever included (b) List intersection against a fixed starting set (c) CDF of % of domains over days included in in Top 1M lists (Top 1k similar). (median value of seven different starting days) Top 1M and Top 1k lists. Figure 2: Run-up and run-down of domains; share of days that a domains spend in a top list for the JOINT dataset.

(a) Kolmogorov-Smirnov (KS) distance between (b) Weekday/weekend dynamics in Alexa Top 1M (c) Weekday/weekend dynamics in Umbrella weekend and weekday distributions. Second-Level-Domains (SLDs). Top 1M SLDs. Figure 3: Comparison of weekday vs. weekend distributions and dynamics in Second-Level-Domains (SLDs). test. This method allows us to statistically determine to what degree Microsoft Office platform). Umbrella shows the same behaviour, the distribution of a domain’s ranks on weekdays and weekends with nessus.org (a threat intelligence tool) more popular during overlap, and is shown in Figure 3a. We include Majestic as a base the week, and ampproject.org (a dominant website performance line without a weekly pattern. For Alexa Top 1M, we can see that optimisation framework), and nflxso.net (a Netflix domain) more ≈35% of domains have a KS distance of one, meaning that their popular on weekends. These examples confirm that different Inter- weekend and weekday distributions have no data point in common. net usage on weekends9 is a cause for the weekly patterns. This feature is also present in Umbrella’s rank, where over 15% of domains have a KS distance of 1. The changes are less pronounced 6.3 Order of Domains in Top Lists for the Top 1k Alexa and Umbrella lists, suggesting that the top As top lists are sorted, a statistical analysis of order variation com- domains are more stable. As a reference, the KS distance when pletes our view of top lists’ stability. We use the Kendall rank corre- comparing weekdays to weekdays and weekends to weekends is lation coefficient108 [ ], commonly known as Kendall’s τ coefficient, much lower. For 90% of domains in Alexa or Umbrella (Top 1k or to measure rank correlation, i.e., the similarity in the order of lists. Top 1M) the distance is lower than 0.05. The KS distance is lower Kendall’s correlation between two variables will be high when than 0.02 for all of the domains in Majestic rankings (Top 1k or observations have a similar order between the two variables, and Top 1M). This demonstrates that a certain set of domains, the ma- low when observations have a dissimilar (or fully different for a jority of them localised in the long-tail, present disjunct rankings correlation of -1) rank between the two variables. between weekends and weekdays. In Figure 4, we show the CDF of Kendall’s τ rank correlation What Domains are More Popular on Weekends? This leads coefficient for the Alexa, Umbrella and Majestic Top 1k domains to the question about the nature of domains changing in popularity in two cases: (i) for day to day comparisons; (ii) for a static com- with a weekly pattern. To investigate this, we group domains by parison to the first day in the JOINT dataset. For analysis, wecan “second-level-domain” (SLD), which we define as the label left ofa compare the percentage of very strongly correlated ranks, i.e., the public suffix per the Public Suffix list[93]. Figures 3b and 3c display ranks for which Kendall’s τ is higher than 0.95. For day to day com- the time dynamics of SLD groups for which the number of domains parisons, Majestic is clearly most similar at 99%, with Alexa (72%) varies by more than 40% between weekdays and weekends. For and Umbrella (40%) both showing considerably dissimilarities. Alexa, we can see stable behaviour before its February 2018 change. When compared for a reference day, very strong correlation We see that some groups such as blogspot.*8 or .com are drops below 5% for all lists. This suggests that the order variations significantly more popular on weekends than on weekdays. The are not perceived in the short term, but may arise when considering opposite is true for domains under sharepoint.com (a web-based longer temporal windows. 8We include all blogspot.* domains in the same group 9Our data indicates prevailing Saturday and Sunday weekends IMC ’18, October 31-November 2, 2018, Boston, MA, USA Scheitle et al.

Table 4: Rank variation for some more and less popular websites in the Top 1M lists.

Highest rank Median rank Lowest rank Domain Alexa Umbrella Majestic Alexa Umbrella Majestic Alexa Umbrella Majestic google.com 1 1 1 1 1 1 2 4 8 .com 3 1 2 3 6 2 3 8 19 netflix.com 21 1 455 32 2 515 34 487 572 jetblue.com 2,284 14,291 4,810 3,133 29,637 4,960 5,000 56,964 5,150 mdc.edu 25,619 177,571 24,720 35,405 275,579 26,122 88,093 449,309 30,914 puresight.com 183,088 593,773 687,838 511,800 885,269 749,819 998,407 999,694 869,872

statistics [7, 109]. There is no further information about these tool- bars besides Alexa’s own toolbar. Alexa also provides data to The Internet Archive to add new sites [92]. It has been speculated that Alexa provides tracking information to feed the Amazon recommen- dation and profiling engine since Amazon’s purchase of Alexa in 1999 [110]. To better understand the ranking mechanism behind the Alexa list, we reverse engineer the Alexa toolbar10 and investigate what data it gathers. Upon installation, the toolbar fetches a unique identifier which is stored in the browser’s local storage, called the Alexa ID (aid). This identifier is used for distinctly tracking the device. During installation, Alexa requests information about age, (binary) gender, household income, ethnicity, , children, Figure 4: CDF of Kendall’s τ between top lists. and the toolbar installation location (home/work). All of these are linked to the aid. After installation, the toolbar transfers for each Investigating the Long Tail: To compare higher and lower visited site: the page URL, screen/page sizes, referer, window IDs, ranked domains, we take three exemplary domains from the Top tab IDs, and loading time metrics. For a scarce set of 8 search en- 100 and the lower ranks as examples. Table 4 summarises the results. gine and shopping URLs11, referer and URL are anonymised to their For each of the six domains, we compute the highest, median, and host name. For all other domains, the entire URL, including all GET lowest rank over the duration of the JOINT dataset. The difference parameters, is transmitted to Alexa’s servers under data.alexa.com. of variability between top and bottom domains is striking and in Because of the injected JavaScript, the visit is only transmitted if line with our previous findings: the ranks of top domains are fairly the site actually exists and was loaded. In April 2018, Alexa’s API stable, while the ranks of bottom domains vary drastically. DNS name had a rank of ≈30k in the Umbrella list, indicating at least 10k unique source IP addresses querying that DNS domain 6.4 Summary name through OpenDNS per day (cf §7.2). We investigate the stability of top lists, and find abrupt changes, Due to its dominance, the Alexa rank of a domain is an impor- weekly patterns, and significant churn for some lists. Lower ranked tant criterion in domain trading and optimisation. domains fluctuate more, but the effect heavily depends on thelist Unsurprisingly, there is a gray area industry of sites promising to and the subset (Top 1k or Top 1M). We can confirm that the weekly “optimise” the Alexa rank of a site for money [111–113]. Although pattern stems from leisure-oriented domains being more popular sending synthetic data to Alexa’s backend API should be possible on weekends, and give examples for domain rank variations. at reasonable effort, we refrain from doing so for two reasons: (i) in April 2018, the backend API has changed, breaking communication 7 UNDERSTANDING AND INFLUENCING TOP with the toolbar, and (ii) unclear ethical implications of actively LISTS RANKING MECHANISMS injecting values into this API. We refer the interested reader to le Pochat et al. [114], who have recently succeeded in manipulating We have seen that top lists can be rather unstable from day to day, Alexa ranks through the toolbar API. and hence we investigate what traffic levels are required and at what effort it is possible to manipulate the ranking of a certain domain.As discussed previously, the Alexa list is based on its browser toolbar 7.2 Umbrella and “various other sources”, Umbrella is based on OpenDNS queries, As the Umbrella list is solely based on DNS queries through the and Majestic is based on the count of subnets with inbound links to OpenDNS public resolver, it mainly reflects domains frequently the domain in question. In this section, we investigate the ranking resolved, not necessarily domains visited by humans, as confirmed mechanisms of these top lists more closely. in §5.3. Examples are the Internet scanning machines of various research institutions, which likely show up in the Umbrella ranking 7.1 Alexa 10We detail the reverse engineering process in our dataset Alexa obtains visited through “over 25,000 different browser 11As of 2018-05-17, these are google.com, instacart.com, shop.rewe.de, extensions” to calculate site ranks through visitor and page view .com, search.yahoo.com, jet.com and ocado.com Significance, Structure, and Stability of Internet Top Lists IMC ’18, October 31-November 2, 2018, Boston, MA, USA

data [119]. As this approach is similar to PageRank [120], except that Majestic does not weigh incoming links by the originating domain, it is to be expected that referral services can increase a domain’s popularity. We can, however, not see an efficient way to influence a domain’s rank in the Majestic list without using suchre- ferral services. Le Pochat et al. [114] recently influenced a domain’s rank in the Majestic link through such purchasing of back links.

Figure 5: Umbrella rank depending on probe count, query fre- 8 IMPACT ON RESEARCH RESULTS quency, and weekday (Friday left, Sunday right). Empty fields indi- §3 highlighted that top lists are broadly used in networking, security cate the settings did not result in a Top 1M ranking. and systems research. Their use is especially prevalent in Internet measurement research, where top lists are used to study aspects across all layers. This motivates us to understand the impact of through automated forward-confirmed reverse-DNS at scanned top list usage on the outcome of these studies. As the replication hosts, and not from humans entering the URL into their browser. of all studies covered in our survey is not possible, we evaluate Building a top list based on DNS queries has various trade-offs and the impact of the lists’ structure on research results in the Internet parameters, which we aim to explore here. One specifically is the measurement field by investigating (i) common layers, such as DNS TTL value of a DNS domain name. As the DNS highly relies on and IP, that played a role in many studies, and (ii) a sample of caching, TTL values could introduce a bias in determining pop- specific studies across a variety of layers, aiming for one specific ularity based on DNS query volume: domain names with higher study per layer. Time-To-Live values can be cached longer and may cause fewer We evaluate those scientific results with 3 questions in mind: DNS queries at upstream resolvers. To better understand Umbrella’s (i) what is the bias when using a top list as compared to a general ranking mechanism and query volume required, we set up 7 RIPE population of all com/net/org domains12 (ii) what is the difference Atlas measurements [115], which query the OpenDNS resolvers for in result when using a different top list? (iii) what is the difference DNS names under our control. in result when using a top list from a different day? Probe Count versus Query Volume: We set up measurements with 100, 1k, 5k, and 10k RIPE Atlas probes, and at frequencies of 1, 8.1 Domain Name System (DNS) 10, 50, and 100 DNS queries per RIPE Atlas probe per day [115]. The A typical first step in list usage is DNS resolution, which isalso resulting ranks, stabilised after several days of measurement, are a popular research focus (cf. §3). We split this view into a record depicted in Figure 5. A main insight is that the number of probes type perspective (e.g., IPv6 adoption) and a hosting infrastructure has a much stronger influence than the query volume per probe: perspective (e.g., CDN prevalence and AS mapping). For both, we 10k probes at 1 query per day (a total of 10k queries) achieve a rank download lists and run measurements daily over the course of one of 38k, while 1000 probes at 100 queries per day (a total of 100k year. queries) only achieve rank 199k. It is a reasonable and considerate choice to base the ranking 8.1.1 Record Type Perspective. We investigate the share of NX- mechanism mainly on the number of unique sources, as it makes DOMAIN domains and IPv6-enabled domains, and the share of the ranking less susceptible to individual heavy hitters. CAA-enabled domains as an example of a DNS-based measurement Upon stopping our measurements, our test domains quickly study [122]. Results are shown in Table 5 and Figure 6. (within 1-2 days) disappeared from the list. Assessing list quality via NXDOMAIN: We begin by using TTL Influence: To test whether the Umbrella list normalises NXDOMAIN as a proxy measure for assessing the quality of en- the potential effects of TTL values, we query DNS names with5 tries in the top lists. An NXDOMAIN error code in return to a different TTL values from 1000 probes at a 900s interval [116]. We DNS query means that the queried DNS name does not exist at could not determine any significant effect of the TTL values: all5 the respective authoritative nameserver. This error code is unex- domains maintain a distance of less than 1k list places over time. pected for allegedly popular domains. Ideally, a top list would only This is coherent with our previous observation that the Umbrella provide existing domains. Surprisingly, we find the amount of NX- rank is mainly determined from the number of clients and not the DOMAIN responses in both the Umbrella (11.5%) and the Majestic query volume per client: as the TTL volume would mainly impact (2.7%) top lists higher than in the general population of com/net/org the query volume per client, its effect should be marginal. domains (0.8%). This is in alignment with the fact that already ≈23k domains in the Umbrella list belong to non-existent top-level do- 7.3 Majestic mains (cf., §5.1). Figure 6a shows that the NXDOMAIN share is, The Majestic Million top list is based on a custom web crawler except for Umbrella, stable over time. We found almost no NX- mainly used for commercial link intelligence [117]. Initially, Ma- DOMAINs among Top 1k ranked domains. One notable exception jestic ranked sites by the raw number of referring domains. As is teredo.ipv6.microsoft.com, a service discontinued in 2013 this had an undesired outcome, the link count was normalised by 12com/net/org is still only a 45% sample of the general population (156.7M of 332M the count of referring /24-IPv4-subnets to limit the influence of domains as per [121]), but more complete and still unbiased samples are difficult to single IP addresses [118]. The list is calculated using 90 days of obtain due to ccTLDs’ restrictive zone file access policies. [21, 122–125] IMC ’18, October 31-November 2, 2018, Boston, MA, USA Scheitle et al.

Table 5: Internet measurement characteristics compared across top lists and general population, usually given as µ ± σ. For each cell, we highlight if it significantly (50%6) exceeds ▲ or falls behind ▼ the base value (1k / 1M, 1M / com/net/org), or not ■ . In almost all cases (▲ and ▼ ), top lists significantly distort the characteristics of the general population.

Alexa Umbrella Majestic Alexa Umbrella Majestic com/net/org Study 1K 1K 1K 1M 1M 1M 157.24M ± 172K NXDOMAIN1 ▼ ∼0.0% ± 0.0% ▼ ∼0.0% ± 0.0% ▼ ∼0.0% ± 0.0% ▼ 0.13% ± 0.02 ▲ 11.51% ± 0.9 ▲ 2.66% ± 0.09 0.8% ± 0.02 IPv6-enabled2 ▲ 22.7% ± 0.6 ▲ 22.6% ± 1.0 ▲ 20.7% ± 0.4 ▲ 12.9% ± 0.9 ▲ 14.8% ± 0.8 ▲ 10.8% ± 0.2 4.1% ± 0.2 CAA-enabled1 ▲ 15.3% ± 0.9 ▲ 5.6% ± 0.3 ▲ 27.9% ± 0.3 ▲ 1.7% ± 0.1 ▲ 1.0% ± 0.0 ▲ 1.5% ± 0.0 0.1% ± 0.0 CNAMEs3 ■ 53.1% ± 1.1 ▲ 44.46% ± 0.43 ▲ 64.8% ± 0.34 ■ 44.1% ± 1 ▼ 27.86% ± 1 ▲ 39.81% ± 0.15 51.4% ± 1.7 CDNs (via CNAME)3 ▲ 27.5% ± 0.89 ▲ 29.9% ± 0.37 ▲ 36.1% ± 0.22 ▲ 6% ± 0.6 ▲ 10.14% ± 0.63 ▲ 2.6% ± 0.01 1.3% ± 0.004 Unique AS IPv4 (avg.)3,4 256 ± 5 132 ± 4 250 ± 3 19511 ± 597 16922 ± 584 17418 ± 61 34876 ± 53 Unique AS IPv6 (avg.)3,4 44 ± 5 26 ± 2 48 ± 30 1856 ± 56 2591 ± 157 1236 ± 793 3025 ± 9 Top 5 AS (Share)3 ▲ 52.68% ± 1.74 ▲ 53.33% ± 1.75 ▲ 51.74% ± 1.73 ▲ 25.68% ± 0.67 ■ 33.95% ± 1.06 ▲ 22.29% ± 0.17 40.22 ± 0.09 TLS-capable5 ▲ 89.6% ▲ 66.2% ▲ 84.7% ▲ 74.65% ■ 43.05% ▲ 62.89% 36.69% HSTS-enabled HTTPS5 ▲ 22.9% ■ 13.0% ▲ 27.4% ▲ 12.17% ▲ 11.65% ■ 8.44% 7.63% HTTP23 ▲ 47.5% ± 0.75 ▲ 36.3% ± 2.4 ▲ 36.6% ± 0.72 ▲ 26.6% ± 0.88 ▲ 19.11% ± 0.63 ▲ 19.8% ± 0.15 7.84% ± 0.08 1: µ Apr, 2018 2: µ of JOINT period (6.6.17–30.4.18) 3: µ Apr, 2018 - 8. May, 2018 4: no share, thus no ▼ , ■ , or ▲ 5: Single day/list in May, 2018 6: For base values over 40%, the test for significant deviation is 25% andσ 5 .

(a) % of NXDOMAIN responses. (b) % of IPv6 Adoption. (c) % of CAA-enabled domains. Figure 6: DNS characteristics in the Top 1M lists and general population of about 158M domains. and unreachable, but still commonly appearing at high ranks in Top 1k lists feature a CAA adoption of up to 28%, distorting the Umbrella, probably through requests from legacy clients. 0.1% in the general population by two magnitudes. This also highlights a challenge in Majestic’s ranking mechanism: Takeaway: The DNS-focused results above highlight that top while counting the number of links to a certain website is quite lists may introduce a picture where results significantly differ from stable over time, it also reacts slowly to domain closure. the general population, a popularity bias to be kept in mind. Figure 6 Tracking IPv6 adoption has been the subject of several scien- also shows that Umbrella, and recently Alexa, can have different tific studies such as[126, 127]. We compare IPv6 adoption across results when using a different day. The daily differences, ranging, top lists and the general population, for which we count the num- e.g., from 1.5–1.8% of CAA adoption around a mean of 1.7% for ber of domains that return at least one routed IPv6 address as an Alexa, are not extreme, but should be accounted for. AAAA record or within a chain of up to 10 CNAMEs. At 11–13%, we find IPv6 enablement across top lists to significantly exceed the 8.1.2 Hosting Infrastructure Perspective. Domains can be hosted by general population of domains at 4%. Also, the highest adoption lies users themselves, by hosting companies, or a variety of CDNs. The with Umbrella, a good indication for IPv6 adoption: when the most hosting landscape is subject to a body of research that is using top frequently resolved DNS names support IPv6, many subsequent lists to obtain target domains. Here, we study the share of hosting content requests are enabled to use IPv6. infrastructures in different top lists. CAA Adoption: Exemplary for other record types, we also CDN Prevalence: We start by studying the prevalence of CDNs investigate the adoption of Certification Authority Authorization in top lists and the general population of all com/net/org domains. (CAA) records in top lists and the general population. CAA is a Since many CDNs use DNS CNAME records, we perform daily rather new record type, and has become mandatory for CAs to DNS resolutions in April 2018, querying all domains both raw check before certificate issuance, cf.,[122, 128]. We measure CAA www-prefixed. We match the observed CNAME records against a adoption as described in [122], i.e., the count of base domains with list of CNAME patterns for 77 CDNs [129] to identify CDN use. an issue or issuewild set. Similar to IPv6 adoption, we find CAA We first observe that the prevalence of CDNs differs by list and adoption among top lists (1–2%) to significantly exceed adoption domain rank (see Table 5), with all Top 1M lists exceeding the among the general population at 0.1%. Even more stunning, the general population by at least a factor of 2, and all Top 1k lists exceeding the general population by at least a factor of 20. When Significance, Structure, and Stability of Internet Top Lists IMC ’18, October 31-November 2, 2018, Boston, MA, USA

Akamai Amazon Zenedge Google Fastly Amazon Akamai (20940) Microsoft (8075) M 0.265 0.069 0.297 0.087 0.362 0.026 Google WordPress Highwinds Akamai Incapsula Instart Cloudflare (13335) GoDaddy (26496) Fastly Facebook CHN Net WordPress Google (15169) OVH (16276) T 0.280 0.055 0.295 0.102 0.361 0.026 Incapsula Instart Amazon (16509) 1&1 (8560) 1.0 1.0 Amazon (14618) Confluence (40034) W 0.279 0.055 0.303 0.106 0.361 0.026 Fastly (54113) 0.8 0.8 T 0.282 0.055 0.302 0.105 0.363 0.026 0.6 0.6 0.4 F 0.279 0.056 0.301 0.106 0.363 0.026 0.4 0.4 S 0.275 0.059 0.299 0.105 0.360 0.026 0.2 0.2 0.2 S 0.263 0.068 0.297 0.100 0.361 0.026 1k 1M 1k 1M 1k 1M 0.0 0.0 0.0 Alexa Umbrella Majestic Alexa Umbrella Majestic c/n/o Alexa Umbrella Majestic Alexa Umbrella Majestic c/n/o (a) Ratio of detected CDNs by (b) Share of top 5 CDNs, (c) Share of top 5 CDNs, (d) Share of top 5 ASes, list (x-axis) & weekday (y-axis). Top 1k vs. Top 1M vs. com/net/org. daily pattern (Mon - Sun). Top 1k vs. Top 1M vs. com/net/org

Figure 7: Overall CDN ratio, ratio of top 5 CDNs, and ratio of top 5 ASes, dependent on list, list type, and weekday. grouping the CDN ratio per list by weekdays (see Figure 7a), we While Alexa and Majestic share a somewhat similar distribution observe minor influences of weekends vs. weekdays due to the top for both the Top 1M and Top 1k lists, Umbrella offers a quite differ- list dynamics described in §6.2. ent view, with a high share of Google/AWS hosted domains, which After adoption of CDNs in general, we study the structure of also relates to the CDN analysis above. CDN adoption. We analyse the top 5 CDNs and show their distribu- This view is also eye-opening for other measurement studies: tion in Figure 7 to study if the relative share is stable over different with a significant share of a population hosted by different 5ASes, lists. We thus show the fraction of domains using one of the top 5 it is of no surprise that certain higher layer characteristics differ. CDNs for both a subset of the Top 1k and the entire list of Top 1M domains per list. We first observe that the relative share of the top 8.2 TLS 5 CDNs differs by list and rank (see Figure 7b), but is generally very In line with the prevalence of TLS studies amongst the surveyed top high at >80%. The biggest discrepancy is between using a top list list papers in §3, we next investigate TLS adoption among lists and and focusing on the general population of com/net/org domains. the general population. To probe for TLS support, we instruct zgrab Google dominates the general population with a share of 71.17% to visit each domain via HTTPS for one day per list in May 2018. As due to many (private) Google-hosted sites. Domains in top lists are in the previous section, we measure all domains with and without more frequently hosted by typical CDNs (e.g., Akamai). Grouping www prefix (except for Umbrella that contains subdomains), as the CDN share per list by weekday in Figure 7c shows a strong we found greater coverage for these domains. We were able to weekend/weekday pattern for Alexa, due to the rank dynamics dis- successfully establish TLS connections with 74.65% of the Alexa, cussed in §6.2). Interestingly, the weekend days have a higher share 62.89% of the Majestic, 43.05% of the Umbrella, and 36.69% of the of Google DNS, indicating that more privately-hosted domains are com/net/org domains (cf., Table 5). For Top 1k domains, TLS support visited on the weekend. further increases by 15–30% per list. These observations highlight that using a top list or not has These results show TLS support to be most pronounced among significant influence on the top 5 CDNs observed, and, ifusing Alexa-listed domains, and that support in top lists generally exceeds Alexa, the day of list creation as well. the general population. ASes: We next analyse the distribution of Autonomous Systems HSTS: As one current research topic [21], we study the preva- (AS) that announce a DNS name’s A record in BGP, as per Route lence of HTTP Strict Transport Security (HSTS) among TLS enabled Views data from the day of the measurement, obtained from [130]. domains. We define a domain to be HSTS–enabled if the domain First, we study the AS diversity by counting the number of different provides a valid HSTS header with a max-age setting >0. Out of the ASes hit by the different lists. We observe lists to experience large TLS-enabled domains, 12.17% of the Alexa, 11.65% of the Umbrella, differences in the number of uniquecf. ASes( , Table 5); while Alexa 8.44% of the Majestic, and 7.63% of the com/net/org domains pro- Top 1M hits the most ASes, i.e., 19511 on average, Umbrella Top 1M vide HSTS support (see Table 5). Only inspecting Top 1k domains hits the fewest, i.e., 16922 on average. To better understand which again increases support significantly to 22.9% for Alexa, 13.0% for ASes contribute the most IPs, we next focus on studying the top Umbrella, and 27.4% for Majestic. HSTS support is, again, over- ASes. Figure 7d shows the top 5 ASes for the Top 1k and Top 1M represented in top lists. domains of each list, as well as the set of com/net/org domains. We observe that both the set and share of involved ASes differ by list. 8.3 HTTP/2 Adoption We note that the general share of the top 5 ASes is 40% in the One academic use of top lists is to study the adoption of upcom- general population, compared to an average of 53% in the Top 1k ing protocols, e.g., HTTP/2 [125, 131]. The motivation for probing and an average of 27% in the Top 1M lists. top listed domains can be based on the assumption that popular In terms of structure, we further observe that GoDaddy (AS26496) domains are more likely to adopt new protocols and are thus promis- clearly dominates the general population with a share of 25.99%, ing targets to study. We thus exemplify this effect and the influence while it only accounts for 2.74% on the Alexa Top 1M and for 4.45% of different top lists by probing domains in top lists and the general on the Majestic Top 1M. population for their HTTP/2 adoption. IMC ’18, October 31-November 2, 2018, Boston, MA, USA Scheitle et al.

60 Alexa 1M Umbrella 1M Majestic 1M c/n/o we have observed in §8 that almost all conceivable measurements Alexa 1k Umbrella 1k Majestic 1k suffer significant bias when using a Top 1M list, and excessive bias 50 in terms of magnitudes when using a Top 1k list. This indicates 40 that domains in top lists exhibit behaviour significantly different from the general population—quantitative insights based on top 30 list domains likely will not generalise. Share [%] 20 Second, we have shown that top lists can significantly change 10 from day to day, rendering results of one-off measurements un- stable. A similar effect is that lists may be structurally different 0 on weekends and weekdays, yielding differences in results purely 2018-04-112018-04-142018-04-172018-04-202018-04-232018-04-262018-04-292018-05-022018-05-052018-05-08 based on the day of week when a list was downloaded. Third, the choice of a certain top list can significantly influence Figure 8: HTTP/2 adoption over time for the Top 1k and Top 1M measurement results as well, e.g., for CDN or AS structure (cf., lists and com/net/org domains. §8.1.2), which stems from different lists having different sampling biases. While these effects can be desired, e.g., to find many domains We try to fetch the domains’ via HTTP/2 by using that adopt a certain new technology, it leads to bad generalisation the nghttp2 library. We again www-prefix all domains in Alexa and of results to “the Internet”, and results obtained from measuring Majestic. In case of a successfully established HTTP/2 connection, top lists must be interpreted very carefully. we issue a GET request for the / page of the domain. We follow up to 10 redirects and if actual data for the landing page is transferred via HTTP/2, we count the domain as HTTP/2-enabled. We probe 9.1 Recommendation for Top List Use top lists on a daily basis and the larger zone file on a weekly basis. Based on our observations, we develop specific recommendations We show HTTP/2 adoption in Figure 8. First, we observe that the for the use of top lists. §3 has revealed that top lists are used for HTTP/2 adoption of all com/net/org domains is 7.84% on average different purposes in diverse fields of research. The impact ofthe and thus significantly lower than for domains listed in Top 1M lists, specific problems we have discussed will differ by study purpose, (up to 26.6% for Alexa) and even more so for Top 1k lists, which which is why we consider the following a set of core questions to show adoption around 35% or more. be considered by study authors—and not a definite guideline. One explanation is that, as shown above, popular domains are Match Choice of List to Study Purpose: Based on a precise more likely hosted on progressive infrastructures (e.g., CDNs) than understanding of what the domains in a list represent, an appropri- the general population. ate list type should be carefully chosen for a study. For example, the We next investigate HTTP/2 adoption between top lists based Umbrella list represents DNS names queried by many individual on Figure 8. Unsurprisingly, we observe HTTP/2 adoption differs clients using OpenDNS (not only PCs, but also mobile devices and by list and by weekday for those lists with a weekday pattern (cf., IoT devices), some bogus, some non-existent, but overall a repre- §6.2). We also note the extremely different result when querying sentation of typical DNS traffic, and may a good base for DNS the Top 1k lists as compared to the general population. analyses. The Alexa list gives a solid choice of functional websites frequently visited by users, and may be a good choice for a human 8.4 Takeaway web-centered study. Through its link-counting, the Majestic list We have analysed the properties of top lists and the general popula- also includes “hidden” links, and may include domains frequently tion across many layers, and found that top lists (i) generally show loaded, but not necessarily knowingly requested by humans. To significantly more extreme measurement results, e.g., protocol adop- obtain a reasonably general picture of the Internet, we recommend tion. This effect is pronounced to up to 2 orders of magnitude for to scan a large sample, such as the “general population” used in §8, the Top 1k domains. Results can (ii) be affected by a weekly pattern, i.e., the set of all com/net/org domains. e.g., the % of protocol adoption may yield a different result when Consider Stability: With lists changing up to 50% per day, using a list generated on a weekend as compared to a weekday. insights from measurement results might not even generalise to This is a significant limitation to be kept in mind when using top the next day. For most measurement studies, stability should be lists for measurement studies. increased by conducting repeated, longitudinal measurements. This also helps to avoid bias from weekday vs. weekend lists. 9 DISCUSSION Document List and Measurement Details: Studies should note the precise list (e.g., Alexa Global Top 1M), its download date, We have shown in §3 that top lists are being frequently used in and the measurements date to enable basic replicability. Ideally, the scientific studies. We acknowledge that using top lists has distinct list used should be shared in a paper’s dataset. advantages—they provide a set of relevant domains at a small and stable size that can be compared over time. However, the use of top lists also comes with certain disadvantages, which we have 9.2 Desired Properties for Top Lists explored in this paper. Based on the challenges discussed in this work, we derive various First, while it is the stated purpose of a top list to provide a properties that top lists should offer: sample biased towards the list’s specific measure of popularity, Consistency: The characteristic, mainly structure and stability, these samples do not represent the general state in the Internet well: of top lists should be kept static over time. Where changes are Significance, Structure, and Stability of Internet Top Lists IMC ’18, October 31-November 2, 2018, Boston, MA, USA required due to the evolving nature of the Internet, these should be al. [90] discuss the challenges of using top lists for web measure- announced and documented. ments. They demonstrate that results vary when including www Transparency: Top list providers should be transparent about subdomains, and investigate root causes such as routing failures. their ranking process and biases to help researchers understand The aforementioned recent work by le Pochat et al. [114] focuses and potentially control those biases. This may, of course, contradict on manipulating top lists. the business interests of commercial list providers. Stability: List stability faces a difficult trade-off: While capturing 11 CONCLUSION the ever-evolving trends in the Internet requires recent data, many To the best of our knowledge, this is the first comprehensive study of typical top list uses are not stable to changes of up to 50% per day. the structure, stability, and significance of popular Internet top lists. We hence suggest that lists should be offered as long-terme.g. ( , a We have shown that use of top lists is significant among networking 90-day sliding window) and short-term (e.g., only the most recent papers, and found distinctive structural characteristics per list. List data) versions. stability has revealed interesting highlights, such as up to 50% churn per day for some lists. We have closely investigated ranking 9.3 Ethical Considerations mechanisms of lists and manipulated a test domain’s Umbrella rank in a controlled experiment. Systematic measurement of top list We aim to minimise harm to all stakeholders possibly affected by domain characteristics and reproduction of studies has revealed our work. For active scans, we minimise interference by following that top lists in general significantly distort results from the general best scanning practices [132], such as maintaining a blacklist, us- population, and that results can depend on the day of week. We ing dedicated servers with meaningful rDNS records, websites, and closed our work with a discussion on desirable properties of top abuse contacts. We assess whether data collection can harm individ- lists and recommendations for top list use in science. We share code, uals and follow the beneficence principle as proposed by [133, 134]. data, and additional insights under Regarding list influencing in §7, the ethical implications of insert- ing a test domain into the Top 1M domains is small and unlikely to https://toplists.github.io cause any harm. In order to influence Umbrella ranks, we generated For long-term access, we provide an archival mirror at the TUM DNS traffic. For this, we selected query volumes unlikely tocause University Library: https://mediatum.ub.tum.de/1452290. problems with the OpenDNS infrastructure or the RIPE Atlas plat- Acknowledgements: We thank the scientific community for form. Regarding the RIPE Atlas platform, we spread probes across the engaging discussions and data sharing leading to this publica- the measurements as carefully as possible: 10k probes queried spe- tion, specifically Johanna Amann, Mark Allman, Matthias Wählisch, cific domains 100, 50, 10, and 1 times per day. In addition, 100, 1000, Ralph Holz, Georg Carle, Victor le Pochat, and the PAM’18 poster and 5000 probes performed an additional 100 queries per day. Per session participants. We thank the anonymous reviewers of the probe, that means 6,100 probes generated 261 queries per day (fewer IMC’18 main and shadow PCs for their comments, and Zakir Du- rumeric for shepherding this work. This work was partially funded than 11 queries per hour), and another 3,900 probes generated 161 by the German Federal Ministry of Education and Research under queries per day. Refer to Figure 5 to visualise the query volume. project X-Check (grant 16KIS0530), by the DFG as part of the CRC That implies a total workload of around 2,220,000 queries per day. 1053 MAKI, and the US National Science Foundation under grant As the OpenDNS service is anycasted across multiple locations, it number CNS-1564329. seems unlikely that our workload was a problem for the service. REFERENCES [1] Alexa. Top 1M sites. https://www.alexa.com/topsites, May 24, 2018. http: 10 RELATED WORK //s3.dualstack.us-east-1.amazonaws.com/alexa-static/top-1m.csv.zip. We consider our work to be related to three fields: [2] Cisco. Umbrella Top 1M List. https://umbrella.cisco.com/blog/blog/2016/12/14/ cisco-umbrella-1-million/. Sound Internet Measurements: There exists a canon of work [3] Majestic. https://majestic.com/reports/majestic-million/, May 17, 2018. with guidelines on sound Internet measurements, such as [132, 135– [4] Matthew Woodward. Ahrefs vs Majestic SEO – 1 Million Reasons Why Ahrefs 137]. These set out useful guidelines for measurements in general, Is Better. https://www.matthewwoodward.co.uk/experiments/ahrefs-majestic- seo-1-million-domain-showdown/, May 23, 2018. but do not specifically tackle the issue of top lists. [5] Alexa. The Alexa Extension. https://web.archive.org/web/20160604100555/http: Measuring Web Popularity: Understanding web popularity //www.alexa.com/toolbar, June 04, 2016. [6] Alexa. Alexa Increases its Global Traffic Panel. https://blog.alexa.com/alexa- is important for marketing as well as for business performance panel-increase/, May 17, 2018. analyses. A book authored by Croll and Power [138] warns site [7] Alexa. Top 6 Myths about the Alexa Traffic Rank. https://blog.alexa.com/top-6- owners about the potential instrumentation biases present in Alexa myths-about-the-alexa-traffic-rank/, May 22, 2018. [8] Alexa. What’s going on with my Alexa Rank? https://support.alexa.com/hc/en- ranks, specially with low-traffic sites. Besides that, there is aset us/articles/200449614, May 17, 2018. of blog posts and articles from the SEO space about anecdotal [9] Majestic. Majestic Million CSV now free for all, daily. https://blog.majestic. problems with certain top lists, but none of these conduct systematic com/development/majestic-million-csv-daily/, May 17, 2018. [10] Quantcast. https://www.quantcast.com/top-sites/US/1. analyses [4, 139]. [11] Statvoo. https://statvoo.com/top/sites, May 17, 2018. Limitations of Using Top Lists in Research: Despite the fact [12] Google. Chrome User Experience Report. https://developers.google.com/web/ tools/chrome-user-experience-report/, May 15, 2018. that top lists are widely used by research papers, we are not aware [13] SimilarWeb Top Websites Ranking. https://www.similarweb.com/top-websites. of any study focusing on the content of popular lists. However, a [14] Vasileios Giotsas, Philipp Richter, Georgios Smaragdakis, Anja Feldmann, number of research papers mentioned the limitations of relying on Christoph Dietzel, and Arthur Berger. Inferring BGP Blackholing Activity in the Internet. In Proceedings of the 2017 Internet Measurement Conference, IMC those ranks for their specific research efforts [45, 67]. Wählisch et ’17, November 2017. IMC ’18, October 31-November 2, 2018, Boston, MA, USA Scheitle et al.

[15] Srikanth Sundaresan, Xiaohong Deng, Yun Feng, Danny Lee, and Amogh Dhamd- the 26th USENIX Security Symposium (USENIX Security ’17), August 2017. here. Challenges in Inferring Internet Congestion Using Throughput Measure- [37] Katharina Krombholz, Wilfried Mayer, Martin Schmiedecker, and Edgar Weippl. ments. In Proceedings of the 2017 Internet Measurement Conference, IMC ’17, "I Have No Idea What I’m Doing" – On the Usability of Deploying HTTPS. November 2017. In Proceedings of the 26th USENIX Security Symposium (USENIX Security ’17), [16] Zhongjie Wang, Yue Cao, Zhiyun Qian, Chengyu Song, and Srikanth V. Krish- August 2017. namurthy. Your State is Not Mine: A Closer Look at Evading Stateful Internet [38] Adrienne Porter Felt, Richard Barnes, April King, Chris Palmer, Chris Bentzel, Censorship. In Proceedings of the 2017 Internet Measurement Conference, IMC and Parisa Tabriz. Measuring HTTPS Adoption on the Web. In Proceedings of ’17, November 2017. the 26th USENIX Security Symposium (USENIX Security ’17), August 2017. [17] Savvas Zannettou, Tristan Caulfield, Emiliano De Cristofaro, Nicolas Kourtelris, [39] Ben Stock, Martin Johns, Marius Steffens, and Michael Backes. How the Web Ilias Leontiadis, Michael Sirivianos, Gianluca Stringhini, and Jeremy Blackburn. Tangled Itself:Uncovering the History of Client-Side Web (In)Security. In Pro- The Web Centipede: Understanding How Web Communities Influence Each ceedings of the 26th USENIX Security Symposium (USENIX Security ’17), August Other Through the Lens of Mainstream and Alternative News Sources. In 2017. Proceedings of the 2017 Internet Measurement Conference, IMC ’17, November [40] Pepe Vila and Boris Köpf. Loophole: Timing Attacks on Shared Event Loops 2017. in Chrome. In Proceedings of the 26th USENIX Security Symposium (USENIX [18] Austin Murdock, Frank Li, Paul Bramsen, Zakir Durumeric, and Vern Paxson. Security ’17), August 2017. Target Generation for Internet-wide IPv6 Scanning. In Proceedings of the 2017 [41] Jörg Schwenk, Marcus Niemietz, and Christian Mainka. Same-Origin Policy: Internet Measurement Conference, IMC ’17, November 2017. Evaluation in Modern Browser. In Proceedings of the 26th USENIX Security [19] Jan Rüth, Christian Bormann, and Oliver Hohlfeld. Large-scale Scanning of Symposium (USENIX Security ’17), August 2017. TCP’s Initial Window. In Proceedings of the 2017 Internet Measurement Confer- [42] Stefano Calzavara, Alvise Rabitti, and Michele Bugliesi. CCSP: Controlled ence, IMC ’17, November 2017. Relaxation of Content Security Policies by Runtime Policy Composition. In [20] Umar Iqbal, Zubair Shafiq, and Zhiyun Qian. The Ad Wars: Retrospective Proceedings of the 26th USENIX Security Symposium (USENIX Security ’17), August Measurement and Analysis of Anti-adblock Filter Lists. In Proceedings of the 2017. 2017 Internet Measurement Conference, IMC ’17, November 2017. [43] Fang Liu, Chun Wang, Andres Pico, Danfeng Yao, and Gang Wang. Measuring [21] Johanna Amann, Oliver Gasser, Quirin Scheitle, Lexi Brent, Georg Carle, and the Insecurity of Mobile Deep Links of Android. In Proceedings of the 26th Ralph Holz. Mission Accomplished?: HTTPS Security After Diginotar. In USENIX Security Symposium (USENIX Security ’17), August 2017. Proceedings of the 2017 Internet Measurement Conference, IMC ’17, November [44] Paul Pearce, Roya Ensafi, Frank Li, Nick Feamster, and Vern Paxson. Augur: 2017. Internet-Wide Detection of Connectivity Disruptions. In IEEE Symposium on [22] Joe DeBlasio, Stefan Savage, Geoffrey M. Voelker, and Alex C. Snoeren. Trip- Security and Privacy, 2017. wire: Inferring Internet Site Compromise. In Proceedings of the 2017 Internet [45] Sumayah Alrwais, Xiaojing Liao, Xianghang Mi, Peng Wang, Xiaofeng Wang, Measurement Conference, IMC ’17, November 2017. Feng Qian, Raheem Beyah, and Damon McCoy. Under the Shadow of Sun- [23] Shehroze Farooqi, Fareed Zaffar, Nektarios Leontiadis, and Zubair Shafiq. Mea- shine: Understanding and Detecting Bulletproof Hosting on Legitimate Service suring and Mitigating Oauth Access Token Abuse by Collusion Networks. In Provider Networks. In IEEE Symposium on Security and Privacy, 2017. Proceedings of the 2017 Internet Measurement Conference, IMC ’17, November [46] Oleksii Starov and Nick Nikiforakis. XHOUND: Quantifying the Fingerprint- 2017. ability of Browser Extensions. In IEEE Symposium on Security and Privacy, [24] Janos Szurdi and Nicolas Christin. Typosquatting. In Proceedings of the 2017. 2017 Internet Measurement Conference, IMC ’17, November 2017. [47] Chaz Lever, Platon Kotzias, Davide Balzarotti, Juan Caballero, and Manos An- [25] Enrico Bocchi, Luca De Cicco, Marco Mellia, and Dario Rossi. The Web, the tonakakis. A Lustrum of Network Communication: Evolution and Users, and the MOS: Influence of HTTP/2 on User Experience. In International Insights. In IEEE Symposium on Security and Privacy, 2017. Conference on Passive and Active Network Measurement, pages 47–59. Springer, [48] James Larisch, David Choffnes, Dave Levin, Bruce M. Maggs, Alan Mislove, and 2017. Christo Wilson. CRLite: A Scalable System for Pushing All TLS Revocations to [26] Ilker Nadi Bozkurt, Anthony Aguirre, Balakrishnan Chandrasekaran, P Brighten All Browsers. In IEEE Symposium on Security and Privacy, 2017. Godfrey, Gregory Laughlin, Bruce Maggs, and Ankit Singla. Why is the Internet [49] Nethanel Gelernter, Senia Kalma, Bar Magnezi, and Hen Porcilan. The Password so Slow?! In International Conference on Passive and Active Network Measurement, Reset MitM Attack. In IEEE Symposium on Security and Privacy, 2017. pages 173–187. Springer, 2017. [50] Milad Nasr, Amir Houmansadr, and Arya Mazumdar. Compressive Traffic [27] Stephen Ludin. Measuring What is Not Ours: A Tale of 3rd Party Performance. Analysis: A New Paradigm for Scalable Traffic Analysis. In CCS ’17: Proceedings In Passive and Active Measurement: 18th International Conference, PAM 2017, of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Sydney, NSW, Australia, March 30-31, 2017, Proceedings, volume 10176, page 142. November 2017. Springer, 2017. [51] Daiping Liu, Zhou Li, Kun Du, Haining Wang, Baojun Liu, and Haixin Duan. [28] Kittipat Apicharttrisorn, Ahmed Osama Fathy Atya, Jiasi Chen, Karthikeyan Don’t Let One Rotten Apple Spoil the Whole Barrel: Towards Automated De- Sundaresan, and Srikanth V Krishnamurthy. Enhancing WiFi Throughput with tection of Shadowed Domains. In CCS ’17: Proceedings of the 2017 ACM SIGSAC PLC Extenders: A Measurement Study. In International Conference on Passive Conference on Computer and Communications Security, November 2017. and Active Network Measurement, pages 257–269. Springer, 2017. [52] Thomas Vissers, Timothy Barron, Tom Van Goethem, Wouter Joosen, and Nick [29] Alexander Darer, Oliver Farnan, and Joss Wright. FilteredWeb: A framework Nikiforakis. The Wolf of Name Street: Hijacking Domains Through Their for the Automated Search-based Discovery of Blocked URLs. In Network Traffic Nameservers. In CCS ’17: Proceedings of the 2017 ACM SIGSAC Conference on Measurement and Analysis Conference (TMA), 2017, pages 1–9. IEEE, 2017. Computer and Communications Security, November 2017. [30] Jelena Mirkovic, Genevieve Bartlett, John Heidemann, Hao Shi, and Xiyue Deng. [53] Ada Lerner, Tadayoshi Kohno, and Franziska Roesner. Rewriting History: Chang- Do You See Me Now? Sparsity in Passive Observations of Address Liveness. In ing the Archived Web from the Present. In CCS ’17: Proceedings of the 2017 ACM Network Traffic Measurement and Analysis Conference (TMA), 2017, pages 1–9. SIGSAC Conference on Computer and Communications Security, November 2017. IEEE, 2017. [54] Yinzhi Cao, Zhanhao Chen, Song Li, and Shujiang Wu. Deterministic Browser. [31] Quirin Scheitle, Oliver Gasser, Minoo Rouhi, and Georg Carle. Large-scale In CCS ’17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Classification of IPv6-IPv4 Siblings with Variable Clock Skew. In Network Traffic Communications Security, November 2017. Measurement and Analysis Conference (TMA), 2017, pages 1–9. IEEE, 2017. [55] Yizheng Chen, Yacin Nadji, Athanasios Kountouras, Fabian Monrose, Roberto [32] Paul Pearce, Ben Jones, Frank Li, Roya Ensafi, Nick Feamster, Nick Weaver, and Perdisci, Manos Antonakakis, and Nikolaos Vasiloglou. Practical Attacks Against Vern Paxson. Global Measurement of DNS Manipulation. In Proceedings of the Graph-based Clustering. In CCS ’17: Proceedings of the 2017 ACM SIGSAC 26th USENIX Security Symposium (USENIX Security ’17), August 2017. Conference on Computer and Communications Security, November 2017. [33] Rachee Singh, Rishab Nithyanand, Sadia Afroz, Paul Pearce, Michael Carl [56] Sebastian Lekies, Krzysztof Kotowicz, Samuel Groß, Eduardo A. Vela Nava, and Tschantz, Phillipa Gill, and Vern Paxson. Characterizing the Nature and Dynam- Martin Johns. Code-Reuse Attacks for the Web: Breaking Cross-Site Scripting ics of Tor Exit Blocking. In Proceedings of the 26th USENIX Security Symposium Mitigations via Script Gadgets. In CCS ’17: Proceedings of the 2017 ACM SIGSAC (USENIX Security ’17), August 2017. Conference on Computer and Communications Security, November 2017. [34] Tao Wang and Ian Goldberg. Walkie-Talkie: An Efficient Defense Against [57] Milad Nasr, Hadi Zolfaghari, and Amir Houmansadr. The Waterfall of Liberty: Passive Website Fingerprinting Attacks. In Proceedings of the 26th USENIX Decoy Routing Circumvention That Resists Routing Attacks. In CCS ’17: Pro- Security Symposium (USENIX Security ’17), August 2017. ceedings of the 2017 ACM SIGSAC Conference on Computer and Communications [35] Sebastian Zimmeck, Jie S Li, Hyungtae Kim, Steven M Bellovin, and Tony Jebara. Security, November 2017. A Privacy Analysis of Cross-device Tracking. In Proceedings of the 26th USENIX [58] Panagiotis Kintis, Najmeh Miramirkhani, Charles Lever, Yizheng Chen, Rosa Security Symposium (USENIX Security ’17), August 2017. Romero-Gómez, Nikolaos Pitropakis, Nick Nikiforakis, and Manos Antonakakis. [36] Taejoong Chung, Roland van Rijswijk-Deij, Balakrishnan Chandrasekaran, Hiding in Plain Sight: A Longitudinal Study of Combosquatting Abuse. In David Choffnes, Dave Levin, Bruce M Maggs, Alan Mislove, and Christo Wilson. CCS ’17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and A Longitudinal, End-to-End View of the DNSSEC Ecosystem. In Proceedings of Communications Security, November 2017. Significance, Structure, and Stability of Internet Top Lists IMC ’18, October 31-November 2, 2018, Boston, MA, USA

[59] Doowon Kim, Bum Jun Kwon, and Tudor Dumitraş. Certified Malware: Measur- 26th International Conference on , 2017. ing Breaches of Trust in the Windows Code-Signing PKI. In CCS ’17: Proceedings [80] Milivoj Simeonovski, Giancarlo Pellegrino, Christian Rossow, and Michael of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Backes. Who Controls the Internet? Analyzing Global Threats using Prop- November 2017. erty Graph Traversals. In Proceedings of the 26th International Conference on [60] Peter Snyder, Cynthia Taylor, and Chris Kanich. Most Websites Don’t Need to World Wide Web, 2017. Vibrate: A Cost-Benefit Approach to Improving Browser Security. In CCS ’17: [81] Ajaya Neupane, Nitesh Saxena, and Leanne Hirshfield. Neural Underpinnings of Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communica- Website Legitimacy and Familiarity Detection: An fNIRS Study. In Proceedings tions Security, November 2017. of the 26th International Conference on World Wide Web, 2017. [61] Benjamin Greschbach, Tobias Pulls, Laura M. Roberts, Phillip Winter, and Nick [82] Qian Cui, Guy-Vincent Jourdan, Gregor Bochmann, Russell Couturier, and Feamster. The Effect of DNS on Tor’s Anonymity. In 24th Annual Network and Vio Onut. Tracking Phishing Attacks Over Time. In Proceedings of the 26th Distributed System Security Symposium, NDSS 2017, NDSS ’17’, February 2017. International Conference on World Wide Web, 2017. [62] Tobias Lauinger, Abdelberi Chaabane, Sajjad Arshad, William Robertson, Christo [83] Li Chang, Hsu-Chun Hsiao, Wei Jeng, Tiffany Hyun-Jin Kim, and Wei-Hsi Lin. Wilson, and Engin Kirda. Thou Shalt Not Depend on Me: Analysing the Use Security Implications of Redirection Trail in Popular Websites Worldwide. In of Outdated JavaScript Libraries on the Web. In 24th Annual Network and Proceedings of the 26th International Conference on World Wide Web, 2017. Distributed System Security Symposium, NDSS 2017, NDSS ’17’, February 2017. [84] Enrico Mariconti, Jeremiah Onaolapo, Sharique Ahmad, Nicolas Nikiforou, [63] Najmeh Miramirkhani, Oleksii Starov, and Nick Nikiforakis. Dial One for Scam: Manuel Egele, Nick Nikiforakis, and Gianluca Stringhini. What’s in a Name? A Large-Scale Analysis of Technical Support Scams. In 24th Annual Network Understanding Profile Name Reuse on . In Proceedings of the 26th Inter- and Distributed System Security Symposium, NDSS 2017, NDSS ’17’, February national Conference on World Wide Web, 2017. 2017. [85] ACM. Result and Artifact Review and Badging. https://www.acm.org/ [64] Marc Anthony Warrior, Uri Klarman, Marcel Flores, and Aleksandar Kuz- publications/policies/artifact-review-badging, Acc. Jan 18 2017. manovic. Drongo: Speeding Up CDNs with Subnet Assimilation from the Client. [86] Quirin Scheitle, Matthias Wählisch, Oliver Gasser, Thomas C. Schmidt, and In CoNEXT ’17: Proceedings of the 13th International Conference on Emerging Georg Carle. Towards an Ecosystem for Reproducible Research in Computer Networking EXperiments and Technologies. ACM, December 2017. Networking. In ACM SIGCOMM 2017 Reproducibility Workshop, 2017. [65] Shinyoung Cho, Rishab Nithyanand, Abbas Razaghpanah, and Phillipa Gill. A [87] Matthias Flittner, Mohamed Naoufal Mahfoudi, Damien Saucez, Matthias Wäh- Churn for the Better. In CoNEXT ’17: Proceedings of the 13th International Con- lisch, Luigi Iannone, Vaibhav Bajpai, and Alex Afanasyev. A Survey on Artifacts ference on Emerging Networking EXperiments and Technologies. ACM, December from CoNEXT, ICN, IMC, and SIGCOMM Conferences in 2017. SIGCOMM 2017. Comput. Commun. Rev., 48(1):75–80, April 2018. [66] Wai Kay Leong, Zixiao Wang, and Ben Leong. TCP Congestion Control Be- [88] Damien Saucez and Luigi Iannone. Thoughts and Recommendations from the yond Bandwidth-Delay Product for Mobile Cellular Networks. In CoNEXT ACM SIGCOMM 2017 Reproducibility Workshop. ACM SIGCOMM Computer ’17: Proceedings of the 13th International Conference on Emerging Networking Communication Review, 48(1):70–74, 2018. EXperiments and Technologies. ACM, December 2017. [89] Mark Allman. Comments On DNS Robustness. IMC, 2018. [67] Mario Almeida, Alessandro Finamore, Diego Perino, Narseo Vallina-Rodriguez, [90] Matthias Wählisch, Robert Schmidt, Thomas C Schmidt, Olaf Maennel, Steve and Matteo Varvello. Dissecting DNS Stakeholders in Mobile Networks. In Uhlig, and Gareth Tyson. RiPKI: The tragic story of RPKI deployment in the CoNEXT ’17: Proceedings of the 13th International Conference on Emerging Net- Web ecosystem. In Proceedings of the 14th ACM Workshop on Hot Topics in working EXperiments and Technologies. ACM, December 2017. Networks, page 11. ACM, 2015. [68] David Naylor, Richard Li, Christos Gkantsidis, Thomas Karagiannis, and Peter [91] Ralph Holz, Lothar Braun, Nils Kammenhuber, and Georg Carle. The SSL Steenkiste. And Then There Were More: Secure Communication for More Than Landscape - A Thorough Analysis of the X.509 PKI Using Active and Passive Two Parties. In CoNEXT ’17: Proceedings of the 13th International Conference on Measurements. In IMC, Nov. 2011. Emerging Networking EXperiments and Technologies. ACM, December 2017. [92] The Internet Archive. Alexa Crawls. https://archive.org/details/alexacrawls, [69] Thomas Holterbach, Stefano Vissicchio, Alberto Dainotti, and Laurent Vanbever. May 22, 2018. Swift: Predictive fast reroute. In Proceedings of the Conference of the ACM Special [93] Mozilla. Public Suffix List: commit 2f9350. https://github.com/publicsuffix/list/ Interest Group on Data Communication, SIGCOMM ’17. ACM, August 2017. commit/85fa8fbdf, Apr. 20, 2018. [70] Costas Iordanou, Claudio Soriente, Michael Sirivianos, and Nikolaos Laoutaris. [94] IANA. TLD Directory. http://data.iana.org/TLD/tlds-alpha-by-domain.txt, May Who is Fiddling with Prices?: Building and Deploying a Watchdog Service for 20, 2018. E-commerce. In Proceedings of the Conference of the ACM Special Interest Group [95] ICANN. Notices of Termination and Status of gTLD. https://www.icann.org/ on Data Communication, SIGCOMM ’17. ACM, August 2017. resources/pages/gtld-registry-agreement-termination-2015-10-09-en, Apr. 20, [71] Vaspol Ruamviboonsuk, Ravi Netravali, Muhammed Uluyol, and Harsha V. Mad- 2018. hyastha. Vroom: Accelerating the Mobile Web with Server-Aided Dependency [96] Nick Parsons. Stop using .IO Domain Names for Production Traf- Resolution. In Proceedings of the Conference of the ACM Special Interest Group fic. https://hackernoon.com/stop-using-io-domain-names-for-production- on Data Communication, SIGCOMM ’17. ACM, August 2017. traffic-b6aa17eeac20, May 21, 2018. [72] Sanae Rosen, Bo Han, Shuai Hao, Z. Morley Mao, and Feng Qian. Push or [97] Matthew Bryant. The .io Error – Taking Control of All .io Domains With a Request: An Investigation of HTTP/2 Server Push for Improving Mobile Web Targeted Registration. https://thehackerblog.com/the-io-error-taking-control- Performance. In Proceedings of the 26th International Conference on World Wide of-all-io-domains-with-a-targeted-registration/, May 21, 2018. Web, 2017. [98] Tomislav Lombarovic. Be aware: How domain registrar can kill your busi- [73] Elias P. Papadopoulos, Michalis Diamantaris, Panagiotis Papadopoulos, Thanasis ness. https://www.uptimechecker.io/blog/how-domain-registrar-can-kill-your- Petsas, Sotiris Ioannidis, and Evangelos P. Markatos. The Long-Standing Privacy business, May 21, 2018. Debate: Mobile Websites vs Mobile Apps. In Proceedings of the 26th International [99] ICANN. new gTLD Program Timeline. https://newgtlds.icann.org/en/program- Conference on World Wide Web, 2017. status/timelinesen, Apr. 20, 2018. [74] Sanae Rosen, Bo Han, Shuai Hao, Z. Morley Mao, and Feng Qian. Extended [100] Abbas Razaghpanah, Rishab Nithyanand, Narseo Vallina-Rodriguez, Srikanth Tracking Powers: Measuring the Privacy Diffusion Enabled by Browser Exten- Sundaresan, Mark Allman, Christian Kreibich, and Phillipa Gill. Apps, Trackers, sions. In Proceedings of the 26th International Conference on World Wide Web, Privacy, and Regulators: A Global Study of the Mobile Tracking Ecosystem. In 2017. NDSS, 2018. [75] Deepak Kumar, Zane Ma, Zakir Durumeric, Ariana Mirian, Joshua Mason, J. Alex [101] OpenDNS. Domain Tagging. https://domain.opendns.com. Halderman, and Michael Bailey. Security Challenges in an Increasingly Tangled [102] Abbas Razaghpanah, Narseo Vallina-Rodriguez, Srikanth Sundaresan, Christian Web. In Proceedings of the 26th International Conference on World Wide Web, Kreibich, Phillipa Gill, Mark Allman, and Vern Paxson. Haystack: A multi- 2017. purpose mobile vantage point in user space. arXiv preprint arXiv:1510.01419, [76] Gareth Tyson, Shan Huang, Felix Cuadrado, Ignacio Castro, Vasile C. Perta, 2015. Arjuna Sathiaseelan, and Steve Uhlig. Exploring HTTP Header Manipulation [103] hpHosts. hpHosts Domain Blacklist, May 21, 2018. https://hosts-file.net/. In-The-Wild. In Proceedings of the 26th International Conference on World Wide [104] Anukool Lakhina, Konstantina Papagiannaki, Mark Crovella, Christophe Diot, Web, 2017. Eric D Kolaczyk, and Nina Taft. Structural Analysis of Network Traffic Flows. [77] Luca Soldaini and Elad Yom-Tov. Inferring Individual Attributes from Search En- In ACM SIGMETRICS Performance Evaluation Review, volume 32, pages 61–72. gine Queries and Auxiliary Information. In Proceedings of the 26th International ACM, 2004. Conference on World Wide Web, 2017. [105] Konstantina Papagiannaki, Nina Taft, Z-L Zhang, and Christophe Diot. Long- [78] Kyungtae Kim, I Luk Kim, Chung Hwan Kim, Yonghwi Kwon, Yunhui Zheng, term Forecasting of Internet Backbone Traffic: Observations and Initial Models. Xiangyu Zhang, and Dongyan Xu. J-Force: Forced Execution on JavaScript. In In INFOCOM 2003. Twenty-Second Annual Joint Conference of the IEEE Computer Proceedings of the 26th International Conference on World Wide Web, 2017. and Communications. IEEE Societies, volume 2, pages 1178–1188. IEEE, 2003. [79] Dolière Francis Some, Nataliia Bielova, and Tamara Rezk. On the Content [106] Phillipa Gill, Martin Arlitt, Zongpeng Li, and Anirban Mahanti. Youtube Traf- Security Policy Violations due to the Same-Origin Policy. In Proceedings of the fic Characterization: A View from the Edge. In Proceedings of the 7th ACM IMC ’18, October 31-November 2, 2018, Boston, MA, USA Scheitle et al.

SIGCOMM conference on Internet measurement, pages 15–28. ACM, 2007. [125] T. Zimmermann, J. Rüth, B. Wolters, and O. Hohlfeld. How HTTP/2 pushes [107] Meeyoung Cha, Haewoon Kwak, Pablo Rodriguez, Yong-Yeol Ahn, and Sue the web: An empirical study of HTTP/2 Server Push. In 2017 IFIP Networking Moon. I Tube, You Tube, Everybody Tubes: Analyzing the World’s Largest User Conference (IFIP Networking) and Workshops, pages 1–9, June 2017. Generated Content Video System. In Proceedings of the 7th ACM SIGCOMM [126] Jakub Czyz, Mark Allman, Jing Zhang, Scott Iekel-Johnson, Eric Osterweil, and conference on Internet measurement, pages 1–14. ACM, 2007. Michael Bailey. Measuring IPv6 Adoption. In ACM SIGCOMM, 2014. [108] Maurice G Kendall. A New Measure of Rank Correlation. Biometrika, 30(1/2):81– [127] Steffie Jacob Eravuchira, Vaibhav Bajpai, Jürgen Schönwälder, and Sam Crawford. 93, 1938. Measuring web similarity from dual-stacked hosts. In Network and Service [109] Alexa. How are Alexa’s traffic rankings determined? https://support.alexa.com/ Management (CNSM), 2016 12th International Conference on, pages 181–187. hc/en-us/articles/200449744, May 17, 2018. IEEE, 2016. [110] Adam Feuerstein. E-commerce loves Street: Critical Path plans encore. San [128] Jukka Ruohonen. An Empirical Survey on the Early Adoption of DNS Certifica- Francisco Business Times, https://www.bizjournals.com/sanfrancisco/stories/ tion Authority Authorization. arXiv preprint arXiv:1804.07604, 2018. 1999/05/24/newscolumn4., May 1999. [129] Google. WebPagetest CDN domain list, cdn.h. https://github.com/WPO- [111] Alexa Specialist. http://www.improvealexaranking.com/, May 22, 2018. Foundation/webpagetest/blob/master/agent/wpthook/cdn.h. [112] Rankboostup. https://rankboostup.com/, May 22, 2018. [130] University of Oregon. Route Views Project. http://www.routeviews.org. [113] UpMyRank. http://www.upmyrank.com/, May 22, 2018. [131] Matteo Varvello, Kyle Schomp, David Naylor, Jeremy Blackburn, Alessandro [114] Victor Le Pochat, Tom Van Goethem, and Wouter Joosen. Rigging Research Finamore, and Konstantina Papagiannaki. Is the Web HTTP/2 Yet? In Thomas Results by Manipulating Top Websites Rankings. arXiv preprint arXiv:1806.01156, Karagiannis and Xenofontas Dimitropoulos, editors, Passive and Active Mea- June 4, 2018. surement, pages 218–232, Cham, 2016. Springer International . [115] RIPE Atlas. Measurement IDs 124307{26,28-33}. [132] Zakir Durumeric, Eric Wustrow, and J. Alex Halderman. ZMap: Fast Internet- [116] RIPE Atlas. Measurement IDs 124674{03-10}. wide Scanning and Its Security Applications. In USENIX Security, 2013. [117] Majestic. About Majestic. https://blog.majestic.com/about/, May 22, 2018. [133] David Dittrich, Erin Kenneally, et al. The Menlo Report: Ethical Principles [118] Majestic. Majestic Million – Reloaded! https://blog.majestic.com/company/ Guiding Information and Communication Technology Research. US Department majestic-million-reloaded/, May 22, 2018. of Homeland Security, 2012. [119] Majestic. A Million here... A Million there.... https://blog.majestic.com/case- [134] Craig Partridge and Mark Allman. Ethical Considerations in Network Measure- studies/a-million-here-a-million-there/, May 22, 2018. ment Papers. Communications of the ACM, 2016. [120] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pager- [135] Vern Paxson. Strategies for Sound Internet Measurement. In Proceedings of the ank citation ranking: Bringing order to the web. Technical report, Stanford 4th ACM SIGCOMM conference on Internet measurement, pages 263–271. ACM, InfoLab, 1999. 2004. [121] Verisign. The Domain Name Industry Brief 2017Q4, 2018. [136] Mark Allman. On Changing the Culture of Empirical Internet Assessment. ACM [122] Quirin Scheitle, Taejoong Chung, Jens Hiller, Oliver Gasser, Johannes Naab, Computer Communication Review, 43(3), July 2013. Editorial Contribution. Roland van Rijswijk-Deij, Oliver Hohlfeld, Ralph Holz, Dave Choffnes, Alan [137] Mark Allman, Robert Beverly, and Brian Trammell. Principles for measurability Mislove, and Georg Carle. A First Look at Certification Authority Authorization in protocol design. ACM SIGCOMM Computer Communication Review, 47(2):2– (CAA). ACM SIGCOMM CCR, April 2018. 12, 2017. [123] Roland van Rijswijk-Deij, Mattijs Jonker, Anna Sperotto, and Aiko Pras. A High- [138] Alistair Croll and Sean Power. Complete web monitoring: watching your visitors, Performance, Scalable Infrastructure for Large-Scale Active DNS Measurements. performance, communities, and competitors. " O’Reilly Media, Inc.", 2009. IEEE JSAC, 2016. [139] Michael Arrington. Alexa’s Make Believe Internet; Alexa Says YouTube Is [124] Oliver Gasser, Quirin Scheitle, Sebastian Gebhard, and Georg Carle. Scanning Now Bigger Than Google. Alexa Is Useless. https://techcrunch.com/2007/11/25/ the IPv6 Internet: Towards a Comprehensive Hitlist. In TMA, 2016. alexas-make-believe-internet/, 2007.