Linköping University | Department of Computer and Information Science Bachelor’s thesis, 16 ECTS | Informationsteknologi 2019 | LIU-IDA/LITH-EX-G--19/037--SE

DNS Performance – A study of free, public and popular DNS servers in 2019

DNS prestanda – En studie av gratis, publika och populära DNS servrar år 2019

Filip Ström Felix Zedén Yverås

Supervisor : Niklas Carlsson Examiner : Marcus Bendtsen

Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Upphovsrätt

Detta dokument hålls tillgängligt på - eller dess framtida ersättare - under 25 år från publicer- ingsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko- pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis- ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker- heten och tillgängligheten finns lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dok- umentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för up- phovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to down- load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

Filip Ström © Felix Zedén Yverås Students in the 5 year Information Technology program complete a semester-long soft- ware development project during their sixth semester (third year). The project is completed in mid-sized groups, and the students implement a mobile application intended to be used in a multi-actor setting, currently a search and rescue scenario. In parallel they study several topics relevant to the technical and ethical considerations in the project. The project culmin- ates by demonstrating a working product and a written report documenting the results of the practical development process including requirements elicitation. During the final stage of the semester, students create small groups and specialise in one topic, resulting in a bach- elor thesis. The current report represents the results obtained during this specialisation work. Hence, the thesis should be viewed as part of a larger body of work required to pass the semester, including the conditions and requirements for a bachelor thesis. Abstract

The System (DNS) is an integral part of making the internet a more human-friendly place. However, it comes with the cost of an added abstraction layer that introduces extra latency in many aspects of the modern computing experience - a great selling point for many DNS services. In this thesis we look at the performance of DNS services and servers through the scope of 51 unique free, public and popular DNS servers. We use a specifically designed tool, DNSHoarder, to collect 714,000 datapoints of 250 differ- ent hostnames of varying popularity over seven days. From this data we find most DNS servers to exhibit a similar relative distribution of response times and performance differ- ences between IPv4 and IPv6 to be minor or nonexistent. We also find network distance and quality to have a big effect on the performance of DNS as well as network latency to be a major limiting factor in further DNS performance improvements. Acknowledgments

First, a big thank you to our supervisor Niklas Carlsson for his support and feedback during the writing of this thesis. We would also like to thank Xiangfeng Yang of the Department of Mathematics for taking the time to help out two random students suddenly appearing at his door. Additionally, we would like to extend our gratitude to Philippe Biondi and the Scapy community for enabling this work with their excellent Scapy tool. Finally we would like to thank two members of the Scapy community in particular, Gabriel (@gpotter2) and Guillaume Valadon (@guedou), for their personal assistance. Without your help, the results presented herein would have suffered greatly.

v Contents

Abstract iv

Acknowledgments v

Contents vi

List of Figures viii

List of Tables ix

List of Code x

1 Introduction 1 1.1 Motivation ...... 2 1.2 Aim...... 2 1.3 Research Questions ...... 2 1.4 Delimitations ...... 2

2 Background 3 2.1 The (DNS) ...... 3 2.2 Related Work ...... 5

3 Method 8 3.1 Tool Development ...... 8 3.2 Automated Data Collection ...... 10

4 Results 12 4.1 Overview of Failed DNS Queries ...... 12 4.2 IPv4 vs IPv6 Performance ...... 17 4.3 Performance Variation Between DNS Servers ...... 23 4.4 Performance Based on Hostname Popularity ...... 26

5 Discussion 27 5.1 Results ...... 27 5.2 Method ...... 28 5.3 The Work in a Wider Context ...... 30

6 Conclusion 31 6.1 Going Further ...... 31

Bibliography 33

A Structure of DNSHoarder Output Data 36

vi B Additional Graphs 41

C DNSHoarder CLI Arguments 49

D .gitlab-ci.yml 50

E DNS Servers 52

F Hostnames 54

vii List of Figures

3.1 How input files are combined into DNSHoarder jobs ...... 9

4.1 DNS servers failing 100% of queries ...... 13 4.2 Excerpt of traceroute for DNS servers failing 100% of queries ...... 13 4.3 Excerpt of DNS servers failing some queries, but not all ...... 14 4.4 Excerpt of traceroute for DNS servers failing some queries, but not all ...... 14 4.5 Excerpt of DNS servers intermittently failing 100% of queries ...... 15 4.6 Excerpt of traceroute for DNS servers intermittently failing 100% of queries . . . . 15 4.7 Excerpt of DNS servers completing almost 100% of queries ...... 16 4.8 Excerpt of traceroute for DNS servers completing almost 100% of queries . . . . . 17 4.9 Average performance over time, per DNS server (IPv4 / A record) ...... 18 4.10 Average performance over time, per DNS server (IPv6 / AAAA record) ...... 19 4.11 Performance per DNS server (IPv4 / A record) ...... 20 4.12 Performance per DNS server (IPv6 / AAAA record) ...... 21 4.13 Performance per day of week ...... 22 4.14 Excerpt of response sizes per DNS server ...... 23 4.15 IPv4 performance comparison based on median and average performance values of primary and secondary DNS servers. Blue favors the primary DNS server. . . . 24 4.16 IPv6 performance comparison based on median and average performance values of primary and secondary DNS servers. Blue favors the primary DNS server. . . . 24 4.17 Performance per day of week ...... 25 4.18 Median performance per hostname ...... 26

5.1 Disparity in response size where the 209.88.198.133 and 208.76.50.50 DNS server yielded significantly smaller responses than other DNS servers...... 27 5.2 Comparison of the heatmap of failed requests before and after considering empty DNS responses invalid...... 28

A.1 Hierarchical structure of DNSHoarder’s output data ...... 37

B.1 Failed DNS queries over time, per DNS server (IPv4 / A record) ...... 42 B.2 Failed DNS queries over time, per DNS server (IPv6 / AAAA record) ...... 43 B.3 Average distance and ping performance of the routes to each DNS server as meas- ured by traceroute ...... 44 B.4 Average failed DNS queries per DNS server, over time ...... 45 B.5 Average failed DNS queries per DNS server, over time (excluding DNS servers that fail 100% of the requests) ...... 45 B.6 Response size per DNS server (IPv4 / A record) ...... 46 B.7 Response size per DNS server (IPv6 / AAAA record) ...... 47 B.8 Performance per day of week ...... 48

viii List of Tables

3.1 DNS services providing a primary and secondary DNS server ...... 11

A.1 Attributes for the entry ...... 36 A.2 Attributes for the IP entry ...... 38 A.3 Attributes for the UDP entry ...... 38 A.4 Attributes for the DNS entry ...... 39 A.5 Attributes for each qd entry ...... 39 A.6 Attributes for each an entry ...... 39 A.7 Attributes for each ns entry ...... 40 A.8 Attributes for each ar entry ...... 40

E.1 DNS servers used for data collection ...... 53

F.1 Hostnames used for data collection ...... 61

ix List of Code

C.1 Available arguments and associated descriptions for DNSHoarder ...... 49 D.1 Gitlab CI configuration file (.gitlab-ci.yml) used to automate data collection . . 50

x 1 Introduction

The Domain Name System (DNS) is an integral part of making the internet a more human- friendly place, allowing for the use of more memorable hostnames over series of numbers - and with the introduction of IPv6 even letters - for addressing of individual, interconnected devices. The caveat is an additional layer of abstraction that introduces extra latency in many aspects of the modern computing experience. To combat the overhead introduced by DNS, multiple free and public DNS services have appeared, claiming to offer ever better perform- ing services. This thesis was based around and grew from the following description:

Characterization of and comparison of HTTPS-based public DNS services Although we heavily rely on DNS to obtain server-to-IP mappings for all our web traffic, the DNS queries and replies are typically sent in plain text, making it pos- sible to intercept, eavesdrop or even perform man-in-the-middle attacks on this traffic. In addition to DNSSEC (which have seen slow deployment), recently, a number of public, end-to-end encrypted services have appeared. In this thesis you are expected to setup a measurement methodology to evaluate and compare the performance observed (plus other things such as infrastructure contacted by your example clients, the answers provided by these services, etc.) when using public DNS services such as Google’s 8.8.8.8, CloudFlare’s 1.1.1.1, and others. Baseline comparisons should also be done against when using regular DNS from some example locations. The methodology need to be carefully designed and the experiments should be automated and sufficiently large that we can use the dataset to answer meaningful questions and do statistical comparisons for dif- ferent subsets of websites (e.g., from the top-million websites). We want to be able to go beyond the information provided by public websites such as dnsperf (https://www.dnsperf.com/#!dns-resolvers). The goals is that the datasets collected with the tool can be used to help answer some example research questions. As with the code projects above, the dataset and tools should not be shared publically until we potentially publish a research article using these tools and datasets. This project requires good programming skills. Familiarity with using web APIs is also recommended.

1 1.1. Motivation

1.1 Motivation

For this thesis we want to explore these claims and the actual performance of DNS servers, the underlying reasons for performance differences and the current performance challenges. We also want to explore the performance impact of the introduction of IPv6 addresses as well as the performance differences between hostnames of varying popularity. By exploring these aspects we hope to provide a snapshot of the current DNS landscape and highlight areas in need of further study.

1.2 Aim

The purpose of this thesis is to study the performance of current free, public and popular DNS servers and what these results can tell about factors affecting DNS performance in general.

1.3 Research Questions

From the perspective of today’s free, public and popular DNS services:

1. How and why do IPv4 (A record) and IPv6 (AAAA record) DNS lookups differ in re- spect to observed performance?

2. How big is the variation in observed DNS performance between current free, public and popular DNS servers?

3. Are there any differences in DNS performance between commonly looked up host- names and less commonly looked up hostnames?

1.4 Delimitations

This thesis will mainly focus on the performance aspect of DNS servers that are currently (i) free, (ii) publicly available and (iii) popular. We use the following definitions:

• Performance The time between the sending of a DNS query packet and the receival of all response packets, as measured by the DNSHoarder tool.

• Free DNS DNS services that does not require a user fee for usage.

• Publicly Available DNS DNS services that does not require registration and may be used by anyone by simply knowing the IP address of the service’s DNS servers.

• Popular DNS DNS services likely to be found and used by an average user looking for an alternative to their ISP provided DNS service.

Further, focus is primarily placed on the performance of IPv4 (A record) and IPv6 (AAAA record) lookup performance as we believe these to be some of the more common DNS look- ups. The comparison of IPv4 and IPv6 is also of special interest with regards to the ongoing migration from IPv4 to IPv6. Due to limited resources the number of DNS servers and hostnames selected for data collection must be severely limited. This is the main reason for the focus on popular DNS services, the low number of runs per day and the rather extreme narrowing of the hostname list used in the study.

2 2 Background

In this chapter, some general information about DNS and its inner workings are presented before diving deeper into some related work.

2.1 The Domain Name System (DNS)

The Domain Name System (DNS) is a system of servers that helps us navigate between web- sites in an easy manner. Instead of remembering every IP address for different resources, humans can just use the domain names and then have a DNS server provide them with the correct IP address. This makes it a lot easier to navigate between resources on the internet.

2.1.1 Basic Principles DNS is a distributed system of computers that helps humans navigate on the internet. It can be likened to a phonebook but instead of containing different peoples’ phone numbers, it contains IP addresses to different peoples’ servers. There are three major components in DNS. The first component consists of the domain name space and resource records (RR) that makes up a tree structure. A user could then ask for specific information about a host by using the host’s domain name and a specific resource type. The second part is the name servers that hold the information about the previously men- tioned tree structure. The usually only caches complete information on a subset of the total domain tree. If the requested information is not available, the server points to other servers that have the complete information. A name server is said to be an authority of the parts it has complete information about, and this authoritative information is organised in zones to create redundancy. The third component is the resolver. The resolver is the program that takes the information from the name server and gives it to the client. The resolver’s task is to answer the client’s query. This is done either by connecting to the right name server or by getting the information about a server that holds the requested information. Therefore a resolver must have at least one name server accessible [11, 28, 29]. The DNS message format is divided into two types of messages: queries and replies. Both queries and replies consist of a header - which contains a four bit field known as the opcode and is used to separate queries - and the same four sections: question (qd), answer (an), authority (ns) and additional (ar) [29]. The first section is the question section, which

3 2.1. The Domain Name System (DNS) contains the query name (qdname) and other parameters. The answer section contains the RRs that answers the query in the question section. The authority section carries RRs that describe other authoritative servers. The last section, the additionall section, contains RRs which might be helpful when using the RRs from other sections.

2.1.2 How DNS Lookups are Performed CloudFlare cites 8 steps in a DNS lookup, some of which may not be required due to local browser caching [25]:

The 8 steps in a DNS lookup:

1. A user types ’example.com’ into a web browser and the query travels into the Internet and is received by a DNS recursive resolver. 2. The resolver then queries a DNS root nameserver (.). 3. The root server then responds to the resolver with the address of a Top Level Domain (TLD) DNS server (such as .com or .net), which stores the informa- tion for its domains. When searching for example.com, our request is pointed toward the .com TLD. 4. The resolver then makes a request to the .com TLD. 5. The TLD server then responds with the IP address of the domain’s nameserver, example.com. 6. Lastly, the recursive resolver sends a query to the domain’s nameserver. 7. The IP address for example.com is then returned to the resolver from the nameserver. 8. The DNS resolver then responds to the web browser with the IP address of the domain requested initially.

As can be noted from the above excerpt from CloudFlare, there are four different types of servers that are involved in a DNS lookup: a root nameserver, a Top Level Domain (TLD) nameserver, an authoritative nameserver and a recursive resolver [24, 30]. The root nameservers are the servers at the top of the DNS hierarchy and are known by every DNS resolver. This means that if no data relating to a particular query is known by a resolver, it will contact the root nameservers. The root nameserver can then point to the correct TLD nameservers which will hopefully be able to help resolve the request. Although there are many root nameservers, there are only 13 different root server IP addresses. The reason for this lies in limitations in the original architecture [22, 23]. The TLD nameserver maintains information about the different top level domains, such as .com and .net, and holds information about all websites with that extension. When re- questing information from a TLD nameserver, the TLD nameserver will provide details on an authoritative nameserver that contains the requested information. The authoritative nameserver is the last server in a DNS lookup and is the server that actually holds the requested information, such as the IP address of a web site. The recursive resolver acts as the middleman between the user and the nameserver. It might have the requested data cached but can make a separate request to a root nameserver if it does not. From the root nameserver, the recursive resolver will in that case get an IP address to a TLD nameserver which can be further queried for an authoritative nameserver. This authorative nameserver can then be queried by the recursive resolver to find the requested IP address.

4 2.2. Related Work

2.1.3 Resource Record Types The DNS system utilizes a 16 bit value to specify different types of data present in a resource record (RR) [28, 29]. By querying for different resource types, different pieces of information about a domain name may be requested. Among others, the information available includes the authorative name server for an alias (NS record), preferentially ranked mail exchanges for an alias (MX record) and the canoncial name for an alias (CNAME) [28, 29]. As the definitions for available resource types are spread out across RFC’s it is not easy to know their exact number without a deeper investigation. However, as an indication, the In- ternet Assigned Numbers Authority (IANA) lists around 85 different RR types [3] and Wiki- pedia’s List of DNS record types [42] may be noted to include 40 Resource records, 4 Other types and pseudo resource records as well as 37 Obsolete record types. As this thesis will mainly utilize A and AAAA records, these record types will be discussed in more detail below.

A Record An A record contains a 32 bit internet address in the form of four octets [29, 32], more gener- ally known as an IPv4 address [40]. This record type was introduced at the same time as DNS [29] and was assigned the RR type value 1 [3]. It may be noted that a host may have multiple IPv4 addresses associated with it at a time in which case it will also have multiple A records associated with it [29].

AAAA Record In likeness with the A record type, the AAAA record also contains an internet address, how- ever in the form of an IPv6 address [40]. An IPv6 address consists of 128 bits split into eight 16 bit pieces [19] which makes it unsuitable for the 32 bit A record [40]. In response, the AAAA record was introduced. The AAAA record type has been assigned the RR type value 28 (decimal) [3, 40] and as with A records a single host may have multiple AAAA records associated with it. In the case of an AAAA query, all associated AAAA resource records will be returned [40].

2.2 Related Work

DNS has been used for a long time and many studies related to DNS have been performed since its introduction in the 1980s. The following section strives to provide insight into some of this work and will act as the staging ground for our own research.

2.2.1 DNS Robustness In his 2018 paper ‘Comments on DNS Robustness’ [2], Mark Allman notes the importance of robustness for the domain name system. Robustness is what ensures a DNS service will remain operational, even during partial outages. A DNS service lacking in robustness could potentially cause delays or even connection failures for clients. However, “DNS was designed to facilitate robustness” Allman states, referencing RFC 1034 and 2182. RFC 1034 [28] formalizes requirements that each DNS zone maintain at least two DNS servers for redundancy purposes and RFC 2182 [14] recommends using more. RFC 2182 additionally recommends the physical separation of nameservers to ensure local disrup- tions does not affect the entire service. While Allman notes that these requirements “[...] are insufficient to achieve robust oper- ation”, he later finds that roughly 28% of second-level domains (SLDs) are not yet fulfilling them. Not fulfilling even these basic requirements makes the SLDs more vulnerable and Allman recommends that owners of the involved SLDs take steps to resolve the situation.

5 2.2. Related Work

A second-level domain is the next highest domain type, directly below the top-level do- main (TLD). For example, the example.com domain consists of the TLD .com and the SLD example.

2.2.2 Distance to DNS Servers In a 2011 study, Huang et al. [20] presented comparisons between local DNS servers (LDNS) and public DNS servers. In the study, the Huang et al. noted that only about 2.5% of studied clients were using public DNS at the time. The study mainly focused on three public DNS services - Google, Level3 and OpenDNS and was conducted using LDNS and public DNS in Europe, North America and South America. The authors found that in Europe, 80% of clients were located within 253 km of their closest LDNS while the best public DNS service, in this case Level3, was located within 1228 km of these same clients. In North America the result was similar to the European, with public DNS services found to be further away from clients than their local counterparts. Here, OpenDNS was found to be the closer of the public DNSs with 80% of the clients being located located within 616 km of their ISP assigned LDNSs and within 1504 km of OpenDNS. In the South American comparison it was found that Level3 and OpenDNS did not have any servers in the area since the closest server for South Americans using Level3 and OpenDNS was located in North America. Google did however provide a local server which was found to be located over 3000 km away for 80% of the clients. The same figure for LDNS was found to be within a few hundred kilometers.

2.2.3 DNS Caching Sometimes when a DNS lookup is performed the first contacted server will not hold the requested record. In this case the request is redirected to another DNS server that may hold it instead. This process might take a few redirects with more and more time stacking up for each extra request. To alleviate this problem, save time and reduce the number of unnecessary redirects, caching is performed along the way. There are two main ways for a cached DNS record to be deleted. The first is when the server’s cache capacity is reached. In this case the least requested records will be deleted. The second is when a record is no longer valid. The validity of a record is defined by the authoritative nameserver that distributed the record and is called (TTL). The TTL is a 32 bit signed integer which dictates for how long servers are allowed to distribute the record [29]. In a study by Schomp et al. [37], the authors found that cache evictions due to capacity limits are infrequent. Instead, the more common reason for cache eviction was found to be the records’ TTL value. One reason for this turned out to be local modifications of the TTL value set by the authoritative nameserver. This was found to be more common for large TTLs than for small ones, with 64% of large TTLs found to be reduced whereas only 11% of small TTLs were.

2.2.4 DNS Cache Poisoning and DNSSEC The prominent and vital position of DNS has made it the alluring target of many attacks. One of these attacks is known as DNS cache poisoning. The goal of a DNS cache poisoning attack is for the attacker to be able to provide false DNS responses to clients, potentially redirect- ing them to malicious hosts. In a study named ‘Internet-wide study of DNS cache injections’ [26], the authors evaluate injection vulnerabilities allowing cache poisoning attacks. From these evaluations it is found that a vast majority of platforms show significant security prob- lems and risk persistent poisoning with small effort, including platforms operated by large enterprises and internet service providers.

6 2.2. Related Work

A way of counteracting these kinds of attacks would be by allowing for and enforcing the verification of DNS records, which is exactly what the DNS security extension (DNSSEC) was designed to do. DNSSEC allows clients to verify the integrity and authenticity of DNS records and its security properties are derived from its hierarchical public key infrastructure (PKI). The DNSSEC PKI establishes a chain of trust that mirrors the infrastructure of the DNS hierarchy with the root of trust located at the DNS root zone. This means that a domain in a zone can be authenticated only if the zones above it support DNSSEC. Though significant work has been done to introduce DNSSEC, data from 2017 by Chung et al. [7] showed that actual adoption has been very slow. While 90.5% of generic TLDs and 47% of country-code TLDs had enabled DNSSEC, only 0.6% of the .com SLDs and 1% of the .org SLDs had actually published the DNSKEY records used by resolvers to verify their signature. Based on the low level of adoptation among SLDs, Chung et al. notes that very few DNS responses provide the security and authenticity DNSSEC could provide. Likewise, whereas 83% of the resolvers were found to request the relevant records for DNSSEC, only 12% of these were found to actually perform any validation [6].

7 3 Method

From the initial description of the thesis project, it was clear that a tool for data collection was to be developed and for the thesis to center on some set of collected data. In this chapter, the developed tool is first described in more depth. Then follows a description of how the investigated dataset was selected. Finally, we describe how the automated data collection was executed. Some figures and tables related to these descriptions may also be found in appendices A, C, D, E and F.

3.1 Tool Development

The tool developed for data collection came to be known as DNSHoarder and was developed in python during the initial phases of the thesis work. To provide as much data as possible the tool was designed as a packet sniffer, meaning it is able to capture individual network packets. The scapy python module [5] was used to provide most of DNSHoarder’s core func- tionality, i.e. sending DNS queries and capturing the response packets.

3.1.1 Tool Overview The DNSHoarder tool accepts multiple command line arguments - the complete list with de- scriptions can be found in appendix C. Three of these make up the input data which consists of three separate csv-formatted files:

1. a list of DNS servers to contact,

2. a list of hostnames to perform DNS lookups on, and

3. a list of resource record types.

From these lists, individual jobs containing one item from each list are created such that all possible combinations are covered by one job each (Figure 3.1). The jobs are ordered as to spread out requests to individual DNS servers as much as possible. The purpose of this is to reduce the load on individual DNS servers and to reduce the likelihood of triggering any countermeasures due to a high request volume.

8 3.1. Tool Development

DNS Server Hostname Resource Record Type DNS 1 host 1 type 1 DNS 2 host 2 type 2 DNS 3 host 3 type 3

Job # DNS Server Hostname Resource Record Type 1 DNS 1 host 1 type 1 2 DNS 2 host 1 type 1 3 DNS 3 host 1 type 1 4 DNS 1 host 2 type 1 5 DNS 2 host 2 type 1 6 DNS 3 host 2 type 1 7 DNS 1 host 3 type 1 8 DNS 2 host 3 type 1 9 DNS 3 host 3 type 1 10 DNS 1 host 1 type 2 11 DNS 2 host 1 type 2 12 DNS 3 host 1 type 2 13 DNS 1 host 2 type 2 14 DNS 2 host 2 type 2 15 DNS 3 host 2 type 2 16 DNS 1 host 3 type 2 17 DNS 2 host 3 type 2 18 DNS 3 host 3 type 2 19 DNS 1 host 1 type 3 20 DNS 2 host 1 type 3 21 DNS 3 host 1 type 3 22 DNS 1 host 2 type 3 23 DNS 2 host 2 type 3 24 DNS 3 host 2 type 3 25 DNS 1 host 3 type 3 26 DNS 2 host 3 type 3 27 DNS 3 host 3 type 3

Figure 3.1: How input files are combined into DNSHoarder jobs

9 3.2. Automated Data Collection

Each job is picked up by a configurable number of worker threads - by default the number of cpu cores times two minus one - and attempted exactly once. The high number of threads is motivated by the fact that each job will send a very low number of packets and spend a relatively long time waiting for a response. If a response is not received within a configurable time limit (default 3000 ms) the job is reported as timed out. When a worker thread has completed one job, the results are submitted back to the main thread after which a new job will be picked up if available. While the worker threads perform data collection, the main thread is responsible for con- tinously outputting the results. Initial testing with a total of 23 threads with output on an SSD indicated that, on average, the single main thread was able to output results to disk faster than the worker threads were able to submit new results. This, in combination with the HDF5 output file format used by DNSHoarder implies that the tool is able to process large amounts of response data with relatively low memory usage.

3.1.2 Output Data Format The DNSHoarder tool stores result data in the HDF5 format[39] using the h5py[9] python module. The HDF5 format is a binary data format [9] that is “[...] arranged in a POSIX-style hierarchy [...]” based on datasets (which can be likened to arrays) and groups (which can be likened to folders) [10]. The format also utilizes attributes for storing extra data associated with a dataset or group [10]. One key benefit with the HDF5 format is that it does not require the complete output file to be kept in memory. This greatly reduces DNSHoarder’s memory footprint and allows for larger scale data collection. The structure of the output files from DNSHoarder may be viewed in appendix A.

3.2 Automated Data Collection

The data used in this report was collected over the span of seven days, starting on Friday the 3rd of May and concluding on Thursday the 9th of May. The data was collected every six hours using the aforementioned DNSHoarder tool, run with four worker processes. Each run was also set up to collect information on distance and route quality to each DNS server in the background using the traceroute application. The complete data collection setup may be found in appendix D. Each run took approximately 35 minutes and resulted in about 350 MB of data for a total of almost 10 GB of data or 714,000 data points. To automate the data collection, Gitlab’s Continuous Integration (CI) feature was used.

3.2.1 Data Sources The DNS servers used for the data collection were selected using the search query “best free public DNS servers 2019” on Google. The search yielded approximately 13,400,000 results from which the five first results were selected. Collectively, these five sites listed 53 different DNS servers of which 51 were listed as active (two servers belonging to Norton ConnectSafe were listed as retired) [1, 4, 16, 38, 43]. The complete list of DNS servers may be found in appendix E. During the selection of these DNS servers we noted that multiple DNS servers would often be listed for a single DNS service. At times, these DNS servers would even be listed as either primary or secondary. From the original list of 51 DNS servers, a subset of 7 DNS services were selected based on official documentation or promotion expressly preferring one DNS server over the other. The result may be found in Table 3.1. The hostnames used in the data collection were selected to represent both commonly looked up hostnames as well as uncommonly looked up hostnames. Based on recommenda- tions by Scheitle et al. [36], the Cisco Umbrella Top 1M list [8] which is based on actual DNS traffic was selected to serve this purpose. This list was divided into five ranges from each

10 3.2. Automated Data Collection

DNS Service Primary Secondary Source CloudFlare 1.1.1.1 1.0.0.1 [33] Quad9 9.9.9.9 149.112.112.112 [34] DNS.Watch 84.200.69.80 84.200.70.40 [12] Dyn 216.146.35.35 216.146.36.36 [13] Yandex.DNS 77.88.8.8 77.88.8.1 [44] 64.6.64.6 64.6.65.6 [41] OpenDNS Home 208.67.222.222 208.67.220.220 [31]

Table 3.1: DNS services providing a primary and secondary DNS server of which 100 linearly spaced hostnames were selected. These ranges were: 1-100, 101-1000, 1001-10,000, 10,001-100,000 and 100,001-1,000,000. Due to resource restrictions this list was then further reduced by removing every second entry, resulting in the list of 250 hostnames available in appendix F.

3.2.2 Gitlab CI To run the DNSHoarder tool and collect the data at regular intervals, Gitlab’s Continuous Integration (CI) feature was used. The service makes it possible to run preconfigured scripts at regular intervals using so called runners. These runners work by setting up an environment - optionally in a docker container - before cloning an associated git repository and executing a series of commands in this environment. After the commands have been executed, directories and files may optionally be saved as build artifacts. The complete Gitlab CI configuration used is available in appendix D and was scheduled in Gitlab using the cron syntax string 0 */6 3´9 5 * in the UTC timezone. The free tier avail- able at Gitlab offer the use of shared runners run on the Google Cloud platform [18] for up to 2000 minutes per month [17]. The data collection performed used up 1032 of these.

11 4 Results

From the week of data collection a collection of graphs were generated to provide insight into the data. Most are presented below and some that were deemed too large or superfluous may be found in appendix B. It should be noted that most graphs are sorted Monday to Sunday instead of the chronologically correct Friday to Thursday. The reason for this were initial plans to compare daily fluctuations in performance that were later abandoned.

4.1 Overview of Failed DNS Queries

Some of the first data explored was the number of failed DNS queries. Figures 4.1, 4.3, 4.5 and 4.7 show the failrate of DNS requests in heatmaps, with each cell signifying 250 requests made by a specific DNS server at a specific time. Note that these figures are excerpts from two larger heatmaps found in appendix B: Figures B.1 and B.2. Figures 4.2, 4.4, 4.6 and 4.8 are excerpts of Figure B.3. These figures visualize the average number of hops to reach each individual DNS server as well as ping times for each hop as measured by traceroute. In these figures, each cell signifies the average ping times measured at the specified hop distance with grey signifying no response. In some cases, traceroute reached the upper limit of 30 hops for a server. The figures have been cut off at 14 hops as the servers reaching 30 hops were the only ones still showing any data at the 14th hop. This data consisted only of grey cells. Overall the data is quite similar for both IPv4 and IPv6 lookups. The one exception is the 209.88.198.133 DNS server which seem to fail a significantly higher number of IPv6 lookups than IPv4 lookups (Figures 4.3a and 4.3b). Taking an overall impression of the collected dataset, we find 4 different groups of DNS servers.

12 4.1. Overview of Failed DNS Queries

(a) IPv4 / A record

(b) IPv6 / AAAA record

Figure 4.1: DNS servers failing 100% of queries

Figure 4.2: Excerpt of traceroute for DNS servers failing 100% of queries

The first group (Figure 4.1) makes up almost 12% of the dataset. These are servers failing 100% of requests received. When these servers are compared with the averaged traceroute data collected at each run (Figure 4.2) we find that most failing servers seem to have been unreachable though we cannot say for what reason.

13 4.1. Overview of Failed DNS Queries

(a) IPv4 / A record

(b) IPv6 / AAAA record

Figure 4.3: Excerpt of DNS servers failing some queries, but not all

Figure 4.4: Excerpt of traceroute for DNS servers failing some queries, but not all

The second group (Figure 4.3) consists of 3 servers: 209.88.198.133, 99.192.182.101 and 99.192.182.100. These servers appear to consistently fail a certain portion of requests but far from all. We believe this could be due to a number of reasons, including refusal to serve the number of requests sent by our tool and bad network conditions. The traceroute data in Figure 4.4 shows that all three of these servers are located relatively far away with moderate ping times.

14 4.1. Overview of Failed DNS Queries

(a) IPv4 / A record

(b) IPv6 / AAAA record

Figure 4.5: Excerpt of DNS servers intermittently failing 100% of queries

Figure 4.6: Excerpt of traceroute for DNS servers intermittently failing 100% of queries

The third group (Figure 4.5) can be found from the servers either completing or failing 100% of requests. This behavior could indicate times of high load, either on the DNS servers or on route. While requests seem to fail in the same way for both IPv4 and IPv6, the fails themselves do not seem to follow a specific pattern. We also note that the servers in this group belong to two different DNS services and that overlaps causing the entire service to be offline are very few. The traceroute data in Figure 4.6 generally show a short route with short response times on the way. This data is compiled from multiple measurements however, and cannot confirm nor deny the theory about intermittent failures being due to temporary network overload.

15 4.1. Overview of Failed DNS Queries

(a) IPv4 / A record

(b) IPv6 / AAAA record

Figure 4.7: Excerpt of DNS servers completing almost 100% of queries

16 4.2. IPv4 vs IPv6 Performance

Figure 4.8: Excerpt of traceroute for DNS servers completing almost 100% of queries

The fourth and final group consists of the remaining DNS servers. These servers rarely fail a single request and are well in the majority, as might be expected. The traceroute data in Figure 4.8 show a variety of route lengths and response times as well as one DNS server failing the traceroute lookup.

4.2 IPv4 vs IPv6 Performance

Figures 4.9 and 4.10 show the average response times for individual DNS servers at the meas- ured points in time. The color of a cell provides a quick overview of this average and the rounded value is visible in the cell’s middle. Grey cells signifies points in time where the specific DNS server did not return any responses.

17 4.2. IPv4 vs IPv6 Performance

Figure 4.9: Average performance over time, per DNS server (IPv4 / A record)

18 4.2. IPv4 vs IPv6 Performance

Figure 4.10: Average performance over time, per DNS server (IPv6 / AAAA record)

Figures 4.11 and 4.12 provide an overview of the distribution of response times per DNS server based on the total measured data. Note that these figures exclude servers failing all their requests. The whiskers signify max and min values for each individual DNS server with boxes encompassing 50% of the response times, from quartile 1 through to quartile 3. The median value of each distribution is visible as an additonal vertical line inside each box.

19 4.2. IPv4 vs IPv6 Performance

Comparing the response time heatmap for IPv4 lookups (Figure 4.9) and IPv6 lookups (Figure 4.10), there are no clear and major differences in average response time for serv- ers based on the record type alone. Neither are there any shared or obvious discrepancies between the distribution of response times for individual DNS servers (Figures 4.11 and 4.12), The most notable exception being the 1.0.0.1 DNS server which seems to perform better for IPv4 lookups.

Figure 4.11: Performance per DNS server (IPv4 / A record)

20 4.2. IPv4 vs IPv6 Performance

Figure 4.12: Performance per DNS server (IPv6 / AAAA record)

Figure 4.13a show the median response time based on all requests sent at a particular time for IPv4 and IPv6. Figure 4.13b is similar but instead based on the average response time. It should be noted that the distribution of responses visible in Figures 4.11 and 4.12 indicate this second value may be more unstable and influenced by outlying values. From these figures, we find a small indication that IPv6 lookups may be up to around 1 ms slower than IPv4 lookups. However, we are not confident our dataset is large enough to

21 4.2. IPv4 vs IPv6 Performance

(a) Median

(b) Average

Figure 4.13: Performance per day of week draw this conclusion. Instead, we conclude that there may be a performance difference but if there is, it is likely quite small. In Figure 4.13b, the general average response times are seen to be higher during the weekend. We do not believe these values to indicate a general trend but insteda believe them to be related to the 109.69.8.51 DNS server, which show increased response times for the period in question in Figures 4.9 and 4.10. A final interesting insight into the performance difference between IPv4 and IPv6 comes from the size difference in the answers returned for each type of lookup. Figure 4.14 show the size distribution of all successful DNS requests made to a specific server. Again, the whiskers denote maximum and minumum values whereas boxes start at quartile 1 and end at quartile 3. As before, the median value is denoted by a vertical line found within the boxes. From the size distribution in Figure 4.14a, the lower size limit seem to be more or less common across DNS servers. The distribution of response sizes and the upper limit in partic- ular seem to have some variation however. Compared to Figure 4.14b we find IPv6 lookups to return responses with very similar size distributions, indicating more widespread stand- ardization. It is also clear that IPv6 lookups tend to return more data, making the previous statements about IPv6 lookups performing comparably to IPv4 lookups seem more impress- ive.

22 4.3. Performance Variation Between DNS Servers

(a) IPv4 / A record

(b) IPv6 / AAAA record

Figure 4.14: Excerpt of response sizes per DNS server

4.3 Performance Variation Between DNS Servers

Returning to the heatmaps in Figures 4.9 and 4.10 there is a clear difference in average re- sponse time for different DNS servers, with values falling in the range of 10-680ms. However, when looking at the average response time for individual DNS servers over the week, their performance seem to be somewhat consistent. This observation is further confirmed by the response time distributions in Figures 4.11 and 4.12. While there is a great deal of individual variation in response time, the overall distribution of responses would appear to be similar for most DNS servers. Most responses are found at the lower end of the individual server’s performance spectrum in a range of up to 100 ms, with the median response time at the lower end of this range. This suggests that most DNS servers essentially perform similarly and that, at a high level, variation in performance between DNS servers mainly originates from constant delays. While these delays could have multiple and overlapping explanations such as time delays in the processing of DNS requests, one major factor would appear to be route performance. When comparing the lower end of a DNS server’s performance spectrum (visible in Figures 4.11 and 4.12) to the corresponding distance and ping time visible in Figures 4.1, 4.3, 4.5 and 4.7 (or as a single figure in appendix B, Figure B.3), a pattern is visible. Servers with higher lower-bound response times generally have routes to the client that are longer and worse performing than servers with lower lower-bound response times.

23 4.3. Performance Variation Between DNS Servers

4.3.1 Performance Variation Between Primary and Secondary DNS Servers Some DNS services provide primary and secondary DNS servers or promote an individual server in a way that clearly favors it over others provided by the service. A subset of DNS services doing this were listed in Table 3.1. By looking at the difference in response time between the server listed as secondary and the server listed as primary, we hoped to make observations on how using a DNS service’s primary DNS server would affect performance when compared to its secondary DNS server. In Figures 4.15 and 4.16, the results of these measurements are visible. The data is based either on taking the average or median response time of all successful requests for both primary and secondary DNS server. The value for the primary server was then subtrac- ted from the value of the secondary, with positive values indicating better primary response times and negative values indicating better secondary response times. The figures show the absolute value of these calculations, with blue bars representing positive values, i.e. better performing primary servers, and orange bars representing negative values, i.e. better per- forming secondary servers.

(a) Median performance (b) Average performance

Figure 4.15: IPv4 performance comparison based on median and average performance values of primary and secondary DNS servers. Blue favors the primary DNS server.

(a) Median performance (b) Average performance

Figure 4.16: IPv6 performance comparison based on median and average performance values of primary and secondary DNS servers. Blue favors the primary DNS server.

Before examining the results further, a short note on the meaning of the median and the average response difference (e.g. Figure 4.15a and Figure 4.15b). As is clear from Figures 4.11 and 4.12, the median response time is usually quite low whereas the average response time can be quite strongly affected by slower to process requests. A difference in median response time should therefore be indicative of the general performance difference a user could expect from using one server over the other. A difference in average response time on the other hand

24 4.3. Performance Variation Between DNS Servers could be indicative of performance differences in cases involving requests that tend to take longer to answer. It should be noted though that the range and distribution of the involved values may make observations based on the average values less reliable. In Figures 4.11 and 4.12 we see that most servers appear to reach a lower limit in response time. This limit is further pronounced when the distribution of all requests are plotted to- gether, such as in Figure 4.17.

(a) IPv4 / A record

(b) IPv6 / AAAA record

Figure 4.17: Performance per day of week

These figures show the distribution of all response times based on the day of week with each dot signifying the response time of a single request. The boxes are as before starting at quartile 1 and reaching quartile 3 to contain 50% of values and the horisontal line in each box marks the median value. Unlike before, the whiskers are now placed in a way meant to signify limits outside which individual values are outliers. This is a side effect of our graphing tool however, and not one that was intended. From Figure 4.17 we find the previously mentioned lower limit to be around 12 ms, re- gardless of the day of week. This provides a means of comparison where a time differential of 12 ms between primary and secondary server would be equal to the time required for performing a DNS query to a highly performant DNS server. The first and most obvious observation is that the Verisign DNS service delivered sub- stantially better performance on its primary DNS server than on its secondary. The difference appears to be mostly constant across the figures and when the distance and response time from Figure 4.8 is taken into account we find that Verisign’s primary DNS server has a shorter and better performing route to the client than its secondary DNS server.

25 4.4. Performance Based on Hostname Popularity

In the cases of the remaining DNS services, the individual server routes are mostly of uniform length or at least perform comparably. This is also consistent with the median dif- ferences for IPv4 (Figure 4.15a) and IPv6 (Figure 4.16a). For three of these DNS servers the difference in median response time is nearly zero. For the other three, there are individual differences that reach close to five milliseconds. While the differences are slightly more pro- nounced in the case of IPv6, it does not appear that the DNS server listed as primary generally performs better than the DNS server listed as secondary. The individual differences that are visible could be explained by multiple factors, such as insufficient data, load balancing issues or fluctuations in route quality. The average performance difference between primary and secondary DNS servers tells a different story. In the IPv6 case, visible in Figure 4.16b, differences mostly fall within five milliseconds of zero, with DNS.Watch breaking the pattern to more clearly favor its second- ary DNS server. While still close to zero, Cloudflare breaks a different pattern - that of being closest to zero in regards to median performance difference for both IPv4 and IPv6 - to slightly favor its secondary DNS server. In the IPv4 case (Figure 4.15b) the tendency to favor the sec- ondary DNS servers is even more clear with Cloudflare displaying a 20 ms time differential. It should be noted however that the primary DNS server provided by Cloudflare holds a re- sponse time distribution with an upper limit far exceeding that of its secondary dns server (Figure 4.11) which could affect the average values.

4.4 Performance Based on Hostname Popularity

Figure 4.18 visualizes the median response time of all dns servers for each individual host- name as blue dots. To make any trends more prominent, a regression line is shown in orange. The hostnames are ordered in popularity order from most popular on the left to least popular on the right (see appendix F for the original Cisco Umbrella list position of each hostname).

(a) IPv4 / A record (b) IPv6 / AAAA record

Figure 4.18: Median performance per hostname

For more popular hostnames the response time tends to cluster and reach a high of around 100 ms. For more uncommon hostnames there is a larger spread, with some median response times almost reaching 500 ms. These observations are confirmed by the regression line (or- ange) which holds a positive slope for both IPv4 and IPv6 lookups. The difference in cluster- ing and positive slope of the regression line may be explained by the fact that it is inefficient to cache more uncommon data and that requests for more uncommon hostnames therefore require extra processing. A comparison of the data for IPv4 lookups with the data for IPv6 lookups (Figure 4.18) does not reveal any obvious differences in distribution of the median response times. What may be noted is that the regression line is a little bit steeper in the IPv4 case, indicating that IPv6 tends to have a little bit more balanced performance between lookups for common and uncommon hostnames.

26 5 Discussion

With the results chapter presenting and analyzing most of the collected data, this chapter will instead take a step back and look at some broader takeaways and things that made us think differently about our dataset. The chapter will also take a more critical stance to discuss limitations and factors affecting the reliability of the presented results and produced tool. Finally, the importance of DNS performance in a wider context is discussed.

5.1 Results

The extractible information from the collected dataset is considerable and even the results presented are barely scratching the surface of what is possible to learn. The data in our graphs have surprised us on multiple occasions and forced us to think about and discuss our data in new ways.

Figure 5.1: Disparity in response size where the 209.88.198.133 and 208.76.50.50 DNS server yielded significantly smaller responses than other DNS servers.

27 5.2. Method

Figure 5.2: Comparison of the heatmap of failed requests before and after considering empty DNS responses invalid.

One example of this is a disparity we found in the size of responses for one DNS Server (Figure 5.1). This disparity prompted an investigation where we found a few DNS servers that would return valid DNS responses except for the fact that they did not contain any of the requested information. This discovery led to a discussion on what would be considered a valid response. The result was a redefinition where DNS responses having the fields ancount, arcount and nscount set to 0 (i.e. DNS responses missing any answer data) were to be considered invalid. This proved to affect multiple graphs, most notably the heatmaps of failed requests (ap- pendix B, Figures B.1 and B.2), with dramatic effects on a few of the handful DNS servers that would return empty DNS responses (Figure 5.2). Another example was found in Figure 4.17 where a subset of the DNS queries placed themselves well below the otherwise seemingly lower response time limit of 12 ms, with some response times reaching into the fractions of a millisecond. Further investigation and comparison with Figures 4.11 and 4.12 proved the queries in questions had been made to the 8.8.8.8 and 8.8.4.4 DNS servers, both belonging to Google. As noted in chapter 3, the servers used by GitLab’s CI service are run by Google. The low response times would therefore indicate that Google’s DNS service is, at least partially, served from the same data center or even server that was used by GitLab CI to run our data collection tool. We also noted that not all of the requests made to Google DNS servers would take less than 12 ms (Figures 4.11 and 4.12), potentially indicating requests where the Google DNS would need to perform external queries before responding. When taking into account the traceroute data (full figure available in appendix B, Figure B.3) and the distinct line at 12 ms visible in Figure 4.17, this would seem to suggest that DNS servers are able to handle some queries very fast and that network latency is a major factor limiting the lower end of DNS performance.

5.2 Method

Two major factors that had an effect on the results in this thesis were time and resources. The design of DNSHoarder as a packet sniffer and the size of generated data placed certain requirements on the host machine such as root access and simple artifact management. These requirements and the nature of the program made it harder to find suitable hosts. After discovering that DNSHoarder would work if run as root inside a docker container it was decided that the Gitlab CI service, with which we had previous experience, would likely be a good choice. However this choice did come with major restrictions, such as a maximum of 2000 run minutes per month. It is unknown how much of an effect using a docker container may have had on our results. Initially the plan was to perform data collection over several weeks with more data points per day. However, the limitations imposed by the Gitlab CI service made this difficult and eventually we were forced to reduce the number of DNS servers, hostnames and DNS records

28 5.2. Method to lookup drastically. Narrowing these to the dataset used in this thesis took a long time, eventually allowing time to perform only a single data collection run. With less run restric- tions, a larger dataset could have been investigated resulting in broader and more conclusive results. During the data collection we also noted irregular offsets between the scheduled and actual data collection runs of a few hours. While data was still collected four times at spaced intervals each day, these offsets prevents us from looking closer at patterns throughout each day. One final issue stemming of the use of Gitlab’s CI service for data collection is that we are not entirely sure where the data collection has geographically taken place. It is likely that the geographical location of the client will affect DNS response time, especially if a DNS service’s servers would be located on another continent. We also cannot be sure if the data collection location has been the same for all data collection runs and what effects this may have had on the collected dataset. The main focus of this thesis has explicitly been free, public and popular DNS servers. However, our general impression after working through the DNS list in appendix E is that some of the servers investigated may not fulfill these requirements to the expected extent. An example of this are the 4.2.2.X DNS servers that were recommended by multiple of the sources used to build the original DNS server list. Further investigation questioned the public aspect of these servers however, with multiple sources, e.g. tummy.com [35], expressly advicing against the use of these servers by the public. It is unknown how many of the DNS servers found with the current method are at risk of violating the limitations set and we recommended similar studies to require official en- dorsement of use by the public from each DNS server’s underlying DNS service as part of the definition of publicly available DNS. With most DNS servers providing answers to all DNS queries, we do not believe potential violations to have impacted the data presented to a major degree. Violations could however offer an alternative explanation to some of the DNS servers failing 100% of requests. We also note that the DNSHoarder tool does have some shortcomings. During devel- opment we found the documentation of Scapy to be lacking and the code responsible for performing DNS queries is therefore mostly based on example code from The Very Unofficial Dummies Guide To Scapy [27]. This means that the tool may contain hidden factors that could potentially affect the result. One example of a hidden factor discovered is that Scapy by de- fault uses the same id for all DNS queries [15]. Since DNSHoarder utilizes multiple worker threads, this resulted in packets being recognized as responses for the wrong requests and in turn, negative response times. This particular issue was partially adressed during development by implementing ran- dom ids for DNS queries. After a few days of data collection and a few hundred thousand requests, only five were found to have negative response times after which positive response times were made part of the success criteria for responses. These five requests have been filtered in the final dataset. Other known shortcomings of the DNSHoarder tool includes the use of unreliable udp packets for lookups and unknown handling of fragmented IP packets. DNSHoarder was designed without support for retrying failed packet transmissions and combined with the unreliability of udp this makes the tool very sensitive to network conditions. A final aspect of great interest that was not taken into consideration is the DNS security extension (DNSSEC). DNSSEC was first developed in 2005 with the intention of providing a layer of data integrity and data origin authentication to DNS. However, a study from 2009 [11] noted that only 0.02% of zones were using and maintaining DNSSEC at the time. While a more resent study showed that 90.5% of generic TLDs and 47% of the country-code. A new look at the widespreadness of DNSSEC might have been interesting but was ultimately decided against due to limited knowledge about the inner workings of the scapy module.

29 5.3. The Work in a Wider Context

5.3 The Work in a Wider Context

The main focus of this thesis has been on the performance of DNS services and differences in performance between individual DNS servers. While DNS performance is an important aspect to consider in the selection of a DNS service, another aspect is privacy. DNS queries are generally not encrypted, meaning anyone monitoring the network could potentially capture user data. However, not even encryption can protect the client from having its data abused by a legitimate DNS server. Examples of DNS service abuse includes redirections to advertisement, user tracking or even governmental censorship. An example of the latter took place back in 2014 when the turkish government used their national ISP’s DNS resolvers to block access to Twitter, for- cing people to use alternative DNS servers to circumvent the imposed restrictions [33]. In conclusion, DNS performance is an important aspect - but also one of many others.

30 6 Conclusion

In this thesis we have analyzed the performance aspect of current free, public and popular DNS servers based on approximately 714,000 data points. The analyzed dataset was collected at regular intervals over the span of one week and was based on the IPv4 (A record) and IPv6 (AAAA record) queries for 51 DNS servers (appendix E) and 250 hostnames (appendix F). The data collection was performed using a tool specifically developed for this thesis and automated using the Gitlab CI service. No significant performance differences were found between IPv4 and IPv6 lookups from the data studied in this thesis. Median response times indicated the possibility of IPv6 look- ups generally being up to 1 ms slower than IPv4 but the size of the investigated dataset is not large enough to provide a conclusive answer. What was noted in the comparison of IPv4 and IPv6 was a generally higher and more consistent DNS response size in the IPv6 case, indicating better standardization which in itself could provide a performance increase which is hard to measure. Performance variation between individual servers was observed to be quite large, reach- ing into the hundreds of milliseconds at times. They could however mostly be explained by the individual server’s network distance and route quality. Another observation was that while most DNS servers had response times above 1000 ms, DNS responses were in general consistently clustered very close to the best performing query. From this we concluded that most DNS servers perform similarly and that network latency is an important factor in the percieved DNS performance. With regards to performance variation based on hostname popularity, a trend showing more spread and higher response times for less popular hostnames were found. We found the probable reason for this to be that more popular hostnames are more likely to (i) be cached and (ii) to require less additional lookups in cases where they are not. Using regression lines we also found an indication that IPv6 lookups may be more balanced in regards to perform- ance based on hostname popularity, though a larger dataset should probably be investigated before this can be said for certain.

6.1 Going Further

One of the main takeaways from this thesis is the importance of network condititons for push- ing DNS performance further. Further research into the locations and reasons behind net-

31 6.1. Going Further work bottlenecks might help improve DNS performance overall. Also, reducing the amount of long-running queries is likely to produce tangible performance improvements for DNS. To do this, we recommend further investigation into why some DNS queries take orders of magnitude longer to complete than others and how the density of current response time clus- tering can be increased. An additional area of interest is the effect of DNSSEC and related technologies on the overall performance of the domain name system. While adoption is still low, there is steady growth and if the system introduces any performance penalties - are they really worth it? Finally, the dataset used throughout this thesis is too small to answer some of the ques- tions posed and we would welcome repeat research involving an even larger dataset that could more conclusively answer these questions. Such a repeat study could also take a step further and investigate if DNS responses differ based on which DNS server was queried. If so, it would be interesting to see if there are any differences in performance of requests to the returned addresses as well as if there exists a correlation between the queried DNS server and the returned address families.

32 Bibliography

[1] Rachit Agarwal. 7 Best DNS Servers in 2019 (Free and Public). en-US. Feb. 2019. URL: https://beebom.com/best-dns-servers/ (visited on 26/04/2019). [2] Mark Allman. ‘Comments on DNS Robustness’. In: Proceedings of the Internet Meas- urement Conference (IMC). Boston, MA, USA: ACM, 2018, pp. 84–90. DOI: 10.1145/ 3278532.3278541. [3] Roy Arends, Frederico AC Neves, Olafur Gudmundsson and Ray Bellis. Domain Name System (DNS) Parameters. URL: https : / / www . iana . org / assignments / dns - parameters / dns - parameters . xhtml # dns - parameters - 4 (visited on 01/04/2019). [4] Hammad Baig. 15 Best Free Public DNS Servers List For Faster Internet 2019. en-US. Apr. 2019. URL: https : / / twitgoo . com / best - free - dns - servers/ (visited on 02/05/2019). [5] Philippe Biondi and the Scapy community. Scapy. en. URL: https : / / secdev . github.io/ (visited on 05/04/2019). [6] Taejoong Chung, Roland van Rijswijk-Deij, Balakrishnan Chandrasekaran, David Choffnes, Dave Levin, Bruce M. Maggs, Alan Mislove and Christo Wilson. ‘A Longitud- inal, End-to-End View of the DNSSEC Ecosystem’. In: Proceedings of the USENIX Secur- ity Symposium (USENIX Security). Vancouver, BC: USENIX Association, 2017, pp. 1307– 1322. [7] Taejoong Chung, Roland van Rijswijk-Deij, David Choffnes, Dave Levin, Bruce M. Maggs, Alan Mislove and Christo Wilson. ‘Understanding the role of registrars in DNSSEC deployment’. en. In: Proceedings of the Internet Measurement Conference (IMC). London, United Kingdom: ACM Press, 2017, pp. 369–383. DOI: 10.1145/3131365. 3131373. [8] Cisco Umbrella. Cisco Popularity List. URL: http://s3-us-west-1.amazonaws. com/umbrella-static/index.html (visited on 28/03/2019). [9] Andrew Collette and contributors. HDF5 for Python. URL: https://www.h5py.org/ (visited on 08/04/2019). [10] Andrew Collette and contributors. Quick Start Guide. Revision 3e828c87. URL: http: //docs.h5py.org/en/latest/quick.html (visited on 08/04/2019).

33 Bibliography

[11] Casey Deccio, Chao-Chih Chen, Prasant Mohapatra, Jeff Sedayao and Krishna Kant. ‘Quality of name resolution in the Domain Name System’. In: Proceedings of the IEEE International Conference on Network Protocols (INCP) (Oct. 2009). DOI: 10.1109/icnp. 2009.5339693. [12] DNS.WATCH. Fast, free and uncensored. DNS.WATCH. - DNS.WATCH. URL: https : //dns.watch/index (visited on 14/05/2019). [13] DYN. Internet Guide Setup | Dyn Help Center. URL: https : / / help . dyn . com / internet-guide-setup/ (visited on 14/05/2019). [14] R. Elz, R. Bush, S. Bradner and M. Patton. Selection and Operation of Secondary DNS Servers. BCP 16. RFC Editor, July 1997. URL: http://www.rfc-editor.org/rfc/ rfc2182.txt. [15] FelixZY, guedou and gpotter2. ‘sent.sent_time - received.time‘ yields negative results. en. URL: https : / / github . com / secdev / scapy / issues / 1952 (visited on 09/04/2019). [16] Tim Fisher. Free and Public DNS Servers. en. May 2019. URL: https : / / www . lifewire . com / free - and - public - dns - servers - 2626062 (visited on 02/05/2019). [17] GitLab. GitLab Pricing. en. URL: https://about.gitlab.com/pricing/ (visited on 14/05/2019). [18] GitLab. GitLab.com settings | GitLab. URL: https://docs.gitlab.com/ee/user/ gitlab_com/index.html#shared-runners (visited on 23/05/2019). [19] R. Hinden and S. Deering. IP Version 6 Addressing Architecture. RFC 4291. RFC Editor, Feb. 2006. URL: http://www.rfc-editor.org/rfc/rfc4291.txt. [20] Cheng Huang, David A. Maltz, Jin Li and Albert Greenberg. ‘Public DNS system and Global Traffic Management’. In: (:unav) (Apr. 2011). DOI: 10.1109/infcom.2011. 5935088. URL: http://ieeexplore.ieee.org/document/5935088/. [21] A. Hubert and R. van Mook. Measures for Making DNS More Resilient against Forged Answers. RFC 5452. RFC Editor, Jan. 2009. URL: http://www.rfc- editor.org/ rfc/rfc5452.txt. [22] IAB. IAB Technical Comment on the Unique DNS Root. RFC 2826. RFC Editor, May 2000. URL: http://www.rfc-editor.org/rfc/rfc2826.txt. [23] Cloudflare Inc. DNS Root Server. en. URL: https : / / www . cloudflare . com / learning/dns/glossary/dns-root-server/ (visited on 05/04/2019). [24] Cloudflare Inc. DNS Server Types. en. URL: https : / / www . cloudflare . com / learning/dns/dns-server-types/ (visited on 05/04/2019). [25] Cloudflare Inc. What Is DNS? | How DNS Works. en. URL: https : / / www . cloudflare.com/learning/dns/what-is-dns/ (visited on 05/04/2019). [26] Amit Klein, Haya Shulman and Michael Waidner. ‘Internet-wide study of DNS cache injections’. In: Proceedings of the IEEE Conference on Computer Communications (IN- FOCOM). Atlanta, GA, USA: IEEE, May 2017, pp. 1–9. DOI: 10 . 1109 / INFOCOM . 2017.8057202. [27] Adam Maxwell. The Very Unofficial Dummies Guide To Scapy. English. May 2012. URL: https : / / theitgeekchronicles . files . wordpress . com / 2012 / 05 / scapyguide1.pdf (visited on 24/05/2019). [28] P. Mockapetris. Domain names - concepts and facilities. STD 13. RFC Editor, Nov. 1987. URL: http://www.rfc-editor.org/rfc/rfc1034.txt. [29] P. Mockapetris. Domain names - implementation and specification. STD 13. RFC Editor, Nov. 1987. URL: http://www.rfc-editor.org/rfc/rfc1035.txt.

34 Bibliography

[30] NS1. DNS Protocol Explained. en. text/html. Aug. 2018. URL: https : / / ns1 . com / resources/dns-protocol (visited on 05/04/2019). [31] OpenDNS. Windows 10 Configuration. en-US. URL: http://support.opendns.com/ hc/en- us/articles/228007207- Windows- 10- Configuration (visited on 14/05/2019). [32] . . STD 5. RFC Editor, Sept. 1981. URL: http://www.rfc- editor.org/rfc/rfc791.txt. [33] Matthew Prince. Announcing 1.1.1.1: the fastest, privacy-first consumer DNS service. en. Apr. 2018. URL: https://blog.cloudflare.com/announcing-1111/ (visited on 28/05/2019). [34] Quad9. Quad9 Frequently Asked Questions. en-US. URL: https://www.quad9.net/ faq/ (visited on 14/05/2019). [35] Sean Reifschneider. 4.2.2.2: The Story Behind a DNS Legend – tummy.com, ltd. URL: https : / / www . tummy . com / articles / famous - dns - server/ (visited on 23/05/2019). [36] Quirin Scheitle, Oliver Hohlfeld, Julien Gamba, Jonas Jelten, Torsten Zimmermann, Stephen D. Strowes and Narseo Vallina-Rodriguez. ‘A Long Way to the Top: Signific- ance, Structure, and Stability of Internet Top Lists’. In: Proceedings of the Internet Meas- urement Conference (IMC). Boston, MA, USA: ACM, 2018, pp. 478–493. DOI: 10.1145/ 3278532.3278574. [37] Kyle Schomp, Tom Callahan, Michael Rabinovich and Mark Allman. ‘On measuring the client-side DNS infrastructure’. en. In: Proceedings of the conference on Internet Measure- ment Conference (IMC). Barcelona, Spain: ACM Press, 2013, pp. 77–90. DOI: 10.1145/ 2504730.2504734. [38] Team Veditto. 10 Best DNS Server for Gaming In 2019: Free & Public DNS server for Gamers!! en-US. Apr. 2019. URL: https://veditto.com/10-best-dns-server- gaming-free-public-2019 (visited on 02/05/2019). [39] The HDF Group. The HDF5® Library & File Format. en-US. URL: https : / / www . hdfgroup.org/solutions/hdf5/ (visited on 08/04/2019). [40] S. Thomson, C. Huitema, V.Ksinant and M. Souissi. DNS Extensions to Support IP Version 6. RFC 3596. RFC Editor, Oct. 2003. URL: http://www.rfc- editor.org/rfc/ rfc3596.txt. [41] Verisign. Verisign Public DNS Set Up: Configuration Instructions. en. URL: https : / / publicdnsforum.verisign.com/discussion/13/verisign-public-dns- set-up-configuration-instructions (visited on 14/05/2019). [42] Wikipedia contributors. List of DNS record types — Wikipedia, The Free Encyclopedia. 2019. URL: https : / / en . wikipedia . org / wiki / List _ of _ DNS _ record _ types (visited on 29/03/2019). [43] Mike Williams. Best free and public DNS servers of 2019. en. Apr. 2019. URL: https : //www.techradar.com/news/best-dns-server (visited on 02/05/2019). [44] Yandex.DNS. Yandex.DNS. URL: https://dns.yandex.com/advanced/ (visited on 14/05/2019).

35 A Structure of DNSHoarder Output Data

DNSHoarder outputs data in the HDF5 format. This data is structured in the tree-like struc- ture visible in Figure A.1. Each entry in the tree is a HDF5 group and most entries contain additonal information in their attributes. What this additional information is can be seen in tables A.1-A.8.

Key Example Value Notes Whether the associated job was suc- cessful success True Note that the record’s group will not contain any subgroups (i.e. IP etc.) if success is False timestamp 1553897822 POSIX timestamp timeout 5000 Timeout limit for job Time from DNS request to answer received Note that this value may be neg- rtt 4.693 ative if two DNS packets are sent with the same id [15] Only available if success is True Traceback (most Python traceback exception recent call last): Only available if success is False File [...]

Table A.1: Attributes for the entry

36 / ... . . ... . . ... . . IP UDP DNS qd 0 . . an 0 . . ns 0 . . ar 0 . .

Figure A.1: Hierarchical structure of DNSHoarder’s output data

37 Key Example Value Notes version 4 ihl 5 tos 192 len 74 id 63869 flags DF frag 0 ttl 57 proto 17 chksum 12192 src 1.0.0.1 dst 192.168.86.28 The options entry was not found to be set (or if so, to cause any issues) options [] during testing of DNSHoarder. It is currently not known how the tool would react to a value in this field.

Table A.2: Attributes for the IP entry

Key Example Value Notes sport 53 Section 9.2. of RFC 5452 requires DNS resolvers to use a randomized dport 53 port number [21]. However, DN- SHoarder will always use port 53. len 54 chksum 47792

Table A.3: Attributes for the UDP entry

38 Key Example Value Notes Empty is an internal representation length Empty(dtype=dtype(’float32’)) used by the h5py module to repres- ent null values Randomized in accordance with id 2160 section 9.2. of RFC 5452 [21]. qr 1 opcode 0 aa 0 tc 0 rd 1 ra 1 z 0 ad 0 cd 0 rcode 0 qdcount 1 ancount 1 nscount 0 arcount 0

Table A.4: Attributes for the DNS entry

Key Example Value qname facebook.com. qtype 1 qclass 1

Table A.5: Attributes for each qd entry

Key Example Value Notes rrname facebook.com. type 1 rclass 1 ttl 90 Empty is an internal representation rdlen Empty(dtype=dtype(’float32’)) used by the h5py module to repres- ent null values rdata 31.13.72.36

Table A.6: Attributes for each an entry

39 Key Example Value Notes rrname apple.com. type 6 rclass 1 ttl 677 Empty is an internal representation rdlen Empty(dtype=dtype(’float32’)) used by the h5py module to repres- ent null values rdata 0561646e[...] Hexadecimal data

Table A.7: Attributes for each ns entry

Key Example Value Notes rrname ns1.k-msedge.net. type 1 rclass 1 ttl 158579 Empty is an internal representation rdlen Empty(dtype=dtype(’float32’)) used by the h5py module to repres- ent null values rdata 13.107.18.1

Table A.8: Attributes for each ar entry

40 B Additional Graphs

Some graphs generated to investigate the dataset were not included in the main part of the thesis, either due to them taking up too much space or due to them not providing any addi- tional insights. These graphs are instead listed below. Figures B.1 and B.2 show the failrate of DNS requests in two heatmaps, with each cell signifying 250 requests made by a specific DNS server at a specific time. In Chapter 4, excerpts from these graphs were used instead.

41 Figure B.1: Failed DNS queries over time, per DNS server (IPv4 / A record)

42 Figure B.2: Failed DNS queries over time, per DNS server (IPv6 / AAAA record)

43 Figure B.3: Average distance and ping performance of the routes to each DNS server as meas- ured by traceroute

Figure B.3 display the average number of hops to reach each individual DNS server as well as ping times for each hop as measured by traceroute. In this heatmap, each cell signi- fies the average ping times measured at the specified hop distance with grey signifying no response. In some cases, traceroute reached the upper limit of 30 hops for a server. The figure has been cut off at 14 hops as the servers reaching 30 hops were the only ones still showing any data at the 14th hop. This data consisted only of grey cells. Excerpts from this graph was used in Chapter 4.

44 Figure B.4: Average failed DNS queries per DNS server, over time

Figure B.5: Average failed DNS queries per DNS server, over time (excluding DNS servers that fail 100% of the requests)

Figure B.4 takes the total number of failed requests and split these equally across all DNS servers to provide an average number of failed requests per DNS server. This data is strongly influenced by the six servers failing 100% of requests visible in Figures B.1 and B.2 and was therefore updated to exclude these servers in Figure B.5. While the number of failed queries were slightly higher for AAAA records we found the results to be small and the additional inclusion of these graphs to cause too great of a deviation from the intended subject of the performance of DNS.

45 Figure B.6: Response size per DNS server (IPv4 / A record)

Figures B.6 and B.7 visualize the size of individual DNS responses. The whiskers denote maximum and minumum values whereas boxes start at quartile 1 and end at quartile 3. The median value is denoted by a vertical line found within the boxes. Excerpts from these graphs was used in Chapter 4.

46 Figure B.7: Response size per DNS server (IPv6 / AAAA record)

47 (a) Performance per day of week (A record - excluding answers that took 12 ms or more).

(b) Performance per day of week (AAAA record - excluding answers that took 12 ms or more).

Figure B.8: Performance per day of week

Figure B.8 zoom in on DNS queries taking less than 12 ms from Figure 4.17. These figures were used during the investigation of those queries before Figures 4.11 and 4.12 were gener- ated. After the generation of Figures 4.11 and 4.12, Figure B.8 were deemed to be less clear that the Google DNS servers were responsible for response times below 12 ms and therefore of less interest.

48 C DNSHoarder CLI Arguments

Below is a listing of the available command line arguments for the DNSHoarder tool as listed by the argparse module when passing the ´´help argument to DNSHoarder.

§$ python3 main.py ´´help usage: main.py [´h] ´´dns DNS ´´hosts HOSTS ´´records RECORDS [´´timeout TIMEOUT] ´´out OUT [´´workers WORKERS] [´´verbose | ´´silent]

optional arguments: ´h, ´´help show this help message and exit ´´dns DNS DNS servers to contact (CSV formatted file) ´´hosts HOSTS Hostnames to perform lookup on (CSV formatted file) ´´records RECORDS Record types to request (CSV formatted file) ´´timeout TIMEOUT Maximum time to wait for response from DNS servers in millis (default 3000) ´´out OUT, ´o OUT Output file (hdf5 format) ´´workers WORKERS Number of threads to run queries from (defaults to 23 for this system) ´´verbose, ´v Enable debug´level logging ´´silent, ´s Prevent logging below warning´level

¦ Code C.1: Available arguments and associated descriptions for DNSHoarder ¥

49 D .gitlab-ci.yml

The following is the configuration file used to run the DNSHoarder tool using Gitlab’s CI service. From the configuration file it can be seen that the tool was run in a docker container based on debian stretch running python versio 3.7.2. The file specifies two main configura- tions. First, a test - DNSHoarder Test - run for each git commit pushed to Gitlab to ensure the code was operational. Second, the configuration used for the data collection - DNSHoarder Data Collection. Each run would start with a fresh clone of the DNSHoarder git repository inside a docker container based on the python:3.7.2´slim´stretch docker image. The commands listed un- der before_script would then be executed to install required packages. After these commands were executed, the commands listed under script in each respective configuration would be next. The first of these would start collecting traceroute data - in the background to optim- ize time utilization - and the second would start DNSHoarder with a specific configuration (see available configuration options in appendix C). Finally, after each run, the output data from the src/output/ directory and logs from the src/logs/ directory would be saved as job artifacts to be downloaded at a later time.

§image: python:3.7.2´slim´stretch

stages: ´ build ´ test ´ run

before_script: ´ apt´get update ´ apt´get install ´y libhdf5´100 build´essential traceroute ´ pip install ´r requirements.txt ´ cd src/ ´ mkdir ´p logs/ ´ mkdir ´p output/

50 after_script: ´ cd $CI_PROJECT_DIR

DNSHoarder Test: stage: test script: ´ (cat ../testfiles/test/dns.csv | xargs ´I{} ´n1 sh ´c "traceroute ´m 30 ´n ´w 1,2,5 ´ T ´q 5 ´p 53 {} 2>&1 > output/{}.traceroute") & ´ python3 main.py ´´workers 4 ´´verbose ´´dns ../testfiles/test/dns.csv ´´hosts ../testfiles/test/hosts.csv ´´records ../testfiles/test/records.csv ´´out "output/ DNSDatasetTest.$(date +'%y%m%d.%H%M%S').hdf5" artifacts: name: "DNSDatasetTest" paths: ´ src/output/ ´ src/logs/ expire_in: 3 hour except: ´ schedules

DNSHoarder Data Collection: stage: run script: ´ (cat ../testfiles/data_collection/dns.csv | xargs ´I{} ´n1 sh ´c "traceroute ´m 30 ´n ´w 1,2,5 ´T ´q 5 ´p 53 {} 2>&1 > output/{}.traceroute") & ´ python3 main.py ´´workers 4 ´´silent ´´dns ../testfiles/data_collection/dns.csv ´´hosts ../testfiles/data_collection/hosts.csv ´´records ../testfiles/ data_collection/records.csv ´´out "output/DNSDataset.$(date +'%y%m%d.%H% M%S').hdf5" artifacts: name: "DNSDataset" paths: ´ src/output/ ´ src/logs/ expire_in: 7 days only: ´ schedules

¦ Code D.1: Gitlab CI configuration file (.gitlab-ci.yml) used to automate data collection ¥

51 E DNS Servers

The 51 DNS servers used for data collection are listed below in the order they appeared when fed as input data to DNSHoarder. To make the data more human-readable each server is listed with its belonging DNS service.

DNS Service DNS Server DNS.Watch 84.200.70.40 CloudFlare 1.1.1.1 CleanBrowsing 185.228.168.9 Neustar 156.154.70.1 CenturyLink 4.2.2.5 Tenta 99.192.182.100 AdGuard DNS 176.103.130.130 Verisign 64.6.64.6 OpenDNS Home 208.67.222.222 Comodo Secure DNS 8.20.247.20 Yandex.DNS 77.88.8.1 GreenTeamDNS 81.218.119.11 DNS.Watch 84.200.69.80 Dyn 216.146.35.35 AdGuard DNS 176.103.130.131 Google 8.8.8.8 FreeDNS 37.235.1.177 CenturyLink 4.2.2.4 CenturyLink 209.244.0.3

52 Fourth Estate 45.77.165.194 CleanBrowsing 185.228.169.9 Alternate DNS 23.253.163.53 Google 8.8.4.4 FreeDNS 45.33.97.5 Comodo Secure DNS 8.26.56.26 CenturyLink 209.244.0.4 puntCAT 109.69.8.51 SafeDNS 195.46.39.39 OpenDNS Home 208.67.220.220 GreenTeamDNS 209.88.198.133 CenturyLink 4.2.2.2 Dyn 216.146.36.36 SafeDNS 195.46.39.40 Verisign 64.6.65.6 CenturyLink 4.2.2.3 Hurricane Electric 74.82.42.42 CloudFlare 1.0.0.1 Neustar 156.154.71.1 Alternate DNS 198.101.242.72 UncensoredDNS 91.239.100.100 SmartViper 208.76.51.51 CenturyLink 4.2.2.1 Quad9 9.9.9.9 OpenNIC 23.94.60.240 FreeDNS 37.235.1.174 SmartViper 208.76.50.50 Yandex.DNS 77.88.8.8 Tenta 99.192.182.101 Quad9 149.112.112.112 UncensoredDNS 89.233.43.71 OpenNIC 128.52.130.209

Table E.1: DNS servers used for data collection

53 F Hostnames

The 250 hostnames used for data collection are listed below in the order they appeared when fed as input to DNSHoarder (i.e. most popular to least popular). To aid the reader, the original position of each hostname in the Cisco Umbrella Top 1M list has been included.

Cisco Umbrella Hostname List Position 1 netflix.com 3 prod.netflix.com 5 google.com 7 microsoft.com 9 doubleclick.net 11 facebook.com 13 googleads.g.doubleclick.net 15 data.microsoft.com 17 live.com 19 secure.netflix.com 21 www.google-analytics.com 23 settings-win.data.microsoft.com 25 googleusercontent.com 27 amazonaws.com 29 www.facebook.com 31 officeapps.live.com 33 skype.com 35 accounts.google.com

54 37 akamaiedge.net 39 ocsp.digicert.com 41 graph.facebook.com 43 adservice.google.com 45 googleadservices.com 47 www.apple.com 49 play.google.com 51 office365.com 53 ytimg.com 55 www.googletagmanager.com 57 fbcdn.net 59 pagead2.googlesyndication.com 61 lh3.googleusercontent.com 63 itunes.apple.com 65 outlook.office365.com 67 i.ytimg.com 69 connect.facebook.net 71 akadns.net 73 events.data.microsoft.com 75 com.akadns.net 77 play.googleapis.com 79 windows.net 81 scorecardresearch.com 83 mp.microsoft.com 85 securepubads.g.doubleclick.net 87 nexus.officeapps.live.com 89 aria.microsoft.com 91 weather.microsoft.com 93 twitter.com 95 customerevents.netflix.com 97 adnxs.com 99 prod.ftl.netflix.com 109 apple-dns.net 127 v10.events.data.microsoft.com 145 outlook.com 163 e4478.a.akamaiedge.net 181 everesttech.net

55 199 idsync.rlcdn.com 217 newrelic.com 235 pr-bh.ybp.yahoo.com 253 pool.ntp.org 271 trc.taboola.com 289 outbrain.com 307 prod.cms.msn.com 325 bam.nr-data.net 343 ocsp.godaddy.com 361 cdninstagram.com 379 lijit.com 397 i.s-microsoft.com 415 sls.update.microsoft.com 433 dropbox.com 451 bidr.io 469 dis.criteo.com 487 ads.linkedin.com 505 sharethrough.com 523 digitru.st 541 pippio.com 559 appsflyer.com 577 nrb.footprintdns.com 595 ocsp.pki.goog 613 sc.omtrdc.net 631 cm.adgrx.com 649 nflxvideo.net 667 contentstorage.osi.office.net 685 us-west-2.amazonaws.com 703 dsum.casalemedia.com 721 partnerad.l.doubleclick.net 739 i1.ytimg.com 757 mxptint.net 775 socdm.com 793 apple.news 811 districtm.io 829 srv.stackadapt.com 847 visitor.omnitagjs.com

56 865 t.mookie1.com 883 mid.rkdms.com 901 pxl.connexity.net 919 gsp10-ssl.ls.apple.com 937 adrta.com 955 crazyegg.com 973 adentifi.com 991 prod.mozaws.net 1090 ads.scorecardresearch.com 1270 geo1.ggpht.com 1450 ad.crwdcntrl.net 1630 live.rezync.com 1810 typekit.com 1990 vzwwo.com 2170 mobile.yahoo.com 2350 ugc.bazaarvoice.com 2530 attachments.office.net 2710 config.mobile.yahoo.com 2890 dc-storm.com 3070 stats.avast.com 3250 scontent-dfw5-1.xx.fbcdn.net 3430 analytics.rayjump.com 3610 wac.thetacdn.net 3790 wikipedia.firstpartyapps.oaspapps.com 3970 twitter.map.fastly.net 4150 b.tiles.mapbox.com 4330 global.siteimproveanalytics.io 4510 bannerflow.com 4690 servebom.com 4870 d3.sc.omtrdc.net 5050 americanexpress.com 5230 tps11032.doubleverify.com 5410 consent.google.com 5590 p1.opendns.com 5770 dm-us.hybrid.ai 5950 g-bing-com.a-0001.a-msedge.net 6130 swrve.com

57 6310 yellowhammerflashint188671193078.s.moatpixel.com 6490 rambler.ru 6670 pubmine.com 6850 44.courier-push-apple.com.akadns.net 7030 p47-keyvalueservice.icloud.com 7210 prod-video-us-west-1.pscp.tv 7390 pay.sandbox.google.com 7570 adroll.com.edgekey.net 7750 ae.nflximg.net 7930 assets.ubembed.com 8110 comcluster.cxense.com 8290 line.me 8470 api-cf.affirm.com 8650 api.vungle.akadns.net 8830 cdn.viafoura.net 9010 comodo.com 9190 www.amazon.com.au 9370 h5.m.taobao.com 9550 api.circularhub.com 9730 e4900.dsca.akamaiedge.net 9910 navvy.media.net 10,900 fota-cloud-dn.ospserver.net 12,700 lax-1-apex.go.sonobi.com 14,500 e12767.d.akamaiedge.net 16,300 convertlanguage.com 18,100 s3-r-w.eu-west-1.amazonaws.com 19,900 paylocity.com 21,700 31043-77b51.api.pushwoosh.com 23,500 enterprise.com 25,300 ocsp.root-x1.letsencrypt.org 27,100 www.androidcentral.com 28,900 campaigner.com 30,700 r5—sn-5hnekn7k.googlevideo.com 32,500 gamma.cachefly.net 34,300 guc3-accesspoint-c-xgpv.ap.spotify.com 36,100 popupstats.brontops.com 37,900 awsdns-13.org

58 39,700 appleinsider.com 41,500 myuhc.com 43,300 attribution.bankrate.com 45,100 services.gasbuddy.com 46,900 s-usc1c-nss-269.firebaseio.com 48,700 tms.vclk.akadns.net 50,500 event.lucidchart.com 52,300 storylineonline.net 54,100 andr-785f3ec7eb-cbc62794911ff31b-a93c5fcf2ef7a8b7c6- 2420327.na.api.amazonvideo.com 55,900 markmonitor.com 57,700 ktplay.com 59,500 p1web.accessredbox.net 61,300 notifier.win-rar.com 63,100 s-ssl.wordpress.com 64,900 static.cbox.ws 66,700 svc.ovi.com.a662.p01.hereglb.com 68,500 live-community.platform.intuit.com 70,300 emcbmm10.webex.com 72,100 cdn-main1.123formbuilder.com 73,900 secure8.i-doxs.net 75,700 familysafety.microsoft.com 77,500 ushipcdn.com 79,300 www.kayak.pl 81,100 gigenet.com 82,900 angieslist.tt.omtrdc.net 84,700 newark.com 86,500 control-eu-local.adap.tv.akadns.net 88,300 prevapp.hyundaiusa.com 90,100 adv.google-global.com 91,900 ns1.p19.dynect.net 93,700 p5cdn3static.sharpschool.com 95,500 smetrics.ticketsatwork.com 97,300 sg-emails.global.ssl.fastly.net 99,100 r4—sn-q4fl6nsl.c.2mdn.net 109,000 clipartkid.com 127,000 shopper.ipsy.com

59 145,000 fy4.b.yahoo.com 163,000 www.pewforum.org 181,000 m31.oiuzqem.net 199,000 m39.lfpoolp.biz 217,000 m29.gwryfkd.cc 235,000 www.xnxx.net 253,000 images.netflix.com.edgesuite.net 271,000 m2.klrfyid.cc 289,000 udp.push.mob.com 307,000 mzima.net 325,000 images.lecho.be 343,000 play.doubledowncasino2.com 361,000 bfmtv.eulerian.net 379,000 link.grubstreet.com 397,000 resource.vipkid.com.cn 415,000 tribuneonlineng.com 433,000 appliedcloudservices.com 451,000 a1964.b.akamai.net 469,000 redcheetah.com 487,000 production-lb-1-972052434.us-west-2.elb.amazonaws.com 505,000 enews.bluestoneperennials.com 523,000 elmananerodiario.com 541,000 content.pivotal.io 559,000 mail.itjobs-online.com 577,000 a4892.casalemedia.com 595,000 search.centamap.com 613,000 timedotcom-files-wordpress-com.cdn.ampproject.org 631,000 afsp.donordrive.com 649,000 api.90strivia.com 667,000 stat.reportide.com.cdnga.net 685,000 www.ubu.ru 703,000 www.fanzo.me 721,000 login.case.edu 739,000 se.wild.pccc.com 757,000 minneapolis.edu 775,000 netd.com.tr 793,000 x26.gmrnhlsn.com

60 811,000 ad.harman.com 829,000 espeakers.com 847,000 pod-000-1103-02.backblaze.com 865,000 adserver.getblue.io 883,000 s00.static-shell.com 901,000 game-a4.mbga.jp 919,000 usa-rwc-cm-ibmp.wipro.com 937,000 blog.niche.co 955,000 forum-console.worldoftanks.com 973,000 id57431.hotelgrande.si 991,000 gseonline.us

Table F.1: Hostnames used for data collection

61