Bachelor Degree Project

Current practices for DNS Privacy - Protection towards pervasive surveillance

Author: Songho Lee Supervisor: Ola Flygt Semester: VT 2019 Subject: Computer Science Abstract Current usage of the DNS system is a significant loophole of Internet users’ privacy, as all queries and answers for resolving web address are not protected in most cases. The report elaborates which Internet users’ privacy interests exist, and presents the current technologies to enhance DNS Privacy through a systematic literature review. The report also explores the limitations of the current practices and presents sev- eral proposals such as DNS-over- and methods to change the trusted recursive resolver to mitigate current limitations periodically.

Keywords: DNS, DNS-over-HTTPS, DNS-over-TLS, DNS Privacy

Sammanfattning Den nuvarande användningen av DNS-systemet är ett signifikant kryphål för inter- netanvändares integritet, eftersom alla frågor och svar som krävs för att konvertera en webbadress till IP-adress inte skyddas i de flesta fall. Rapporten identifierar in- ternetanvändarnas integritetsintressen och presenterar den nuvarande tekniken som syftar till att förbättra DNS-sekretessen genom en systematisk litteraturgranskning. Rapporten undersöker också begränsningarna i den nuvarande praxis och redovisar flera förslag såsom DNS-över-Tor och metoder som möjliggör periodiskt ändring av rekursiva resolvrar, och de metoderna förväntas att minimera integritetsläckor.

Nyckelord: DNS, DNS-över-HTTPS, DNS-över-TLS, DNS Sekretess Preface

I express my sincere appreciation of the good diplomatic relations between the Kingdom of Sweden and the Republic of Korea that had facilitated human resource exchange, with their agreement on working holiday programme (SÖ 2010:22), and generosity of Swedish legislation regarding the tuition fee as dictated in Ordinance (2010: 543) 2 § sub-section 4. These circumstances and help from my beloved parents enabled the financing of the studies. My gratitude also goes to my manager Patrick and colleagues of UMI team during my time at Ericsson, for having supported me when I decided to finish the degree project. I cannot forget mentioning my friends and family members who have been emotionally supporting me throughout the process, especially Fangyan, Fengyuan, Jieun and Suryeon. Furthermore, the office door of my supervisor, Ola, has always been open for discussions, and I acknowledge Ola for having helped me whenever I faced difficulties throughout the project. Contents

1 Introduction1 1.1 Background...... 1 1.1.1 DNS...... 1 1.1.2 DNS Servers...... 1 1.1.3 DNS Query process...... 2 1.1.4 EDNS(0) and Client Subnet...... 2 1.1.5 CIA-triad...... 2 1.2 Related work...... 3 1.3 Problem formulation...... 3 1.4 Motivation...... 3 1.5 Objectives...... 4 1.6 Scope/Limitation...... 4 1.7 Target group...... 4 1.8 Outline...... 5

2 Method6 2.1 Systematic literature review...... 6 2.2 Design Science...... 6 2.3 Reliability and Validity...... 6 2.4 Ethical considerations...... 7

3 Privacy infringement in scenarios8 3.1 Affected subjects...... 8 3.2 Sensitive information...... 8 3.3 Controlled disclosure...... 8 3.4 Privacy breaching Scenarios...... 8 3.4.1 Metadata in DNS query...... 8 3.4.2 Impact on individuals...... 9 3.4.3 Impact on corporates/organisations...... 10 3.5 DNS queries as a digital fingerprint...... 10

4 Status of mitigative methods 11 4.1 Encipherment of communication channels...... 12 4.1.1 Reusing the current secure transport protocols...... 12 4.1.2 Implementing own transport protocols...... 13 4.2 Information redactions...... 13 4.3 Architectural shift...... 14

5 Limitation of DNS Privacy methods 15 5.1 Risk areas of DNS Privacy...... 15 5.2 Two phases of DNS Query process...... 15 5.2.1 Insufficient measurements on recursive-to-auth link...... 15 5.2.2 Location of Recursive DNS resolvers...... 16 5.3 Observation on packets’ size...... 18 5.4 Privacy leaks by Transitive trust...... 18 5.5 Availability concerns on Oblivious DNS...... 18 5.6 Prevalence of hierarchical DNS...... 18 5.7 Privacy limitations of Namecoin...... 19 5.8 Integrity limitations on decentralised approaches...... 19 5.9 Opportunistic security...... 20

6 Evaluation of alternative approaches 21 6.1 Use of Traffic anonymisation...... 21 6.1.1 Lack of UDP support from Tor...... 21 6.1.2 Variation of latency...... 21 6.1.3 Tor on DNS Query Phase 1...... 22 6.1.4 Tor on DNS Query Phase 2...... 22 6.2 Use of multiple trusted recursive resolvers...... 22 6.2.1 Actions required on local DNS Resolver admin...... 23 6.2.2 Actions required on Network admin...... 23

7 Discussion 24 7.1 Privacy leaking components apart from the DNS...... 24 7.2 DNS Privacy - possible aids for the criminals...... 24 7.3 Recursive resolver centralisation...... 25 7.3.1 DNS Privacy promoting the use of Public DNS servers...... 25 7.3.2 Allegations on performance decrease...... 26 7.4 IP as a human identifiable information...... 26 7.5 QNAME minimisation...... 26 7.6 Vulnerability of Tor...... 26

8 Conclusion 28 8.1 Scientific contribution...... 28 8.2 Current milestone...... 28 8.3 Future work...... 28

References 29

A Appendix: Test script for validating DNS PrivacyA A.1 Processing frequently visited web domain list...... A A.2 Script for automating the web traffic simulation...... A A.3 Common based source...... B

B Appendix: Experiment of DNS-over-HTTPSE B.1 Experiment background...... E B.2 Experiment setup...... E B.2.1 Hypothesis...... E B.3 Results...... E B.3.1 Firefox’s DNS Lookup log...... F B.4 Evaluation...... F B.4.1 Possible interference caused by web automation...... F B.4.2 Bootstrap procedure...... F B.4.3 Unexplained causes of DoH query failures...... F B.5 Conclusion of the DoH Experiments...... F 1 Introduction

This chapter describes what (DNS) is, and how the legacy design of DNS has become a privacy threat. Before discussing the privacy risks of DNS, the background section introduces relevant structure and mechanisms. Knowledgable readers in DNS and Client subnet function may go to section 1.3.

1.1 Background Digital transformation has brought things used to be done in real life decades ago to the online. At work, people have a video conference call instead of a business trip if not necessary. For shopping goods, people fill in credit card numbers for cross-border payments rather than visiting a bank branch to issue paychecks. In other words, the stage where people work now has been shifted to cyberspace in recent decades, and the Internet has become an essential part of everyday life. Despite the ubiquitousness of the Internet, users’ activities online are collected under pervasive monitoring by different actors. Pervasive monitoring means “widespread attack on privacy [1].” Information collected in such action could lead to a breach of users’ privacy, by re-identifying users based on traffic [2], or could become aids for launching an active form of attacks, such as masquerade and Denial of Service (DoS). Unfortunately, the unsecure architecture of Domain Name System allows pervasive monitoring, and thus it should be mitigated. Before discussing the privacy problems of the DNS, DNS and its components are introduced, since these are important to address.

1.1.1 DNS Every activity on the web most likely begins with entering a human-friendly domain name in the web-browser. Once users enter a domain name for visiting a website, DNS resolves the address to an actual Internet Protocol Address of a web server which hosts the website. In case multiple websites are hosted on a single server, the entered fully qualified domain name(FQDN) is used to differentiate virtual hosts on a web server [3]. Therefore, DNS is a critical component of the Internet.

1.1.2 DNS Servers DNS servers consist of four types: Stub resolver, Recursive resolver, Authoritative server, and Forwarding DNS server. Resolvers refer to programmes that obtain information from name servers upon clients’ requests [4]. Stub resolver is a resolver that serves as an entry-point of querying DNS from applica- tions and directs search request to the nearest recursive resolver [5]. As it cannot complete domain name resolution by itself, stub resolver is dependant on a recursive resolver [6]. The recursive resolver is a server which receives a DNS query from a stub resolver and gets the final answer to the query, by (1) answering from its local cache or (2) sending queries to other DNS servers [6]. After a recursive resolver has sent a query request to other authoritative name servers, it is expected for the resolver to store the answer in a local cache. It is the first server in DNS query flow that contacts other servers to get the answer for the client. Authoritative (name) server is a server that has “authority over one or more DNS zones [6]” and “can answer queries without needing to query on other servers as it knows the content of the queried DNS zone by local knowledge [7].”

1 DNS forwarding server is a server that forwards queries to recursive resolver or other forwarding servers. It does not perform a query process for the stub resolver.

1.1.3 DNS Query process Due to the hierarchical structure of the Domain Name System with delegations of au- thorities [8], getting the exact IP address of a given domain name involves several DNS servers. Figure 1.1 shows an example of querying “saimei.ftp.acc.umu.se.”.

Figure 1.1: DNS Query sequence diagram

In the diagram, steps two and three returns top-level-domain(TLD) from the root servers. The steps four and five obtain the Authoritative name server of Swedish TLD. The .SE TLD returns name server of Umeå University in steps six and seven. In the last, the name server of umu returns IPv4 address (A record) of the given address, so that recursive re- solver can provide the answer to the stub resolver. These steps are performed under the assumption that none of the queries is cached.

1.1.4 EDNS(0) and Client Subnet The extension mechanisms for DNS (EDNS) is specified in RFC 6891. EDNS allows both DNS servers and the client to send “larger DNS packet than the original 512 octet limit [9]” so that it benefits of utilising larger size. It makes sending long IPv6 address and possible Domain Name System Security Extensions (DNSSEC) signatures. As of February 2019, major DNS resolver operators requires authoritative servers to support EDNS [10, 11]. EDNS(0) provides several options, and one of the options is Client Subnet(ECS) fea- ture, as described in RFC 7831 [12]. When ECS is used, recursive DNS servers provide a truncated client IP address in its DNS queries to the upstream authorities to permit “topologically localised answers for Content Delivery Networks (CDN) [13]”.

1.1.5 CIA-triad In information security discussions, threat mitigations of a system are analysed from three perspectives: confidentiality, integrity and availability. These properties as a group are denoted as CIA triad or the security triad. Achieving every aspect of CIA-triad often is not feasible, as enhancing one dimension may interfere with the other dimensions [14].

2 1.2 Related work The project accredits pioneer research of “DNS Privacy Considerations (RFC 7626) [15]” which has provided theoretical foundations in analysis of DNS privacy. As of 2019, there are several studies presented to mitigate privacy issues based on the analysis. These studies are later presented in Chapter4. P. Werneck and J.H.C. van Heugten presented surveys of DNS privacy-enhancing methods. P. Werneck evaluated approaches to improve the privacy of DNS and stated the limitations of identified approaches [16] in 2014. J. Heugten evaluated existing solu- tions to enhance DNS Privacy in 2018 in his study [17]. As standardisation of DNS-over- HTTPS is recently finalised [18], a study from J. Heugten reflects more recent changes. While the current report attempted to describe privacy infringements situations related to the DNS operations in scenarios, the threat model of Pervasive surveillance as a whole is elaborated on in RFC 7624 [19].

1.3 Problem formulation Currently, almost all DNS traffic is sent in clear text [15] over the UDP protocol [20], and it makes DNS queries vulnerable to being hijacked or used to filter users’ traffic. All participants of the DNS query process, as illustrated in Figure 1.1, transmit messages intensively, and these are in plaintext. It is also noteworthy that all participating authorita- tive name servers receive the same questions, although it is not necessary for Authoritative servers in higher hierarchies in the process to know all the complete domain address in question. S. Bortzmeyer has analysed that particular fields in DNS packet [21] such as Query name (QNAME) and Source IP address reveal “communication relationships [15]”. These series of observations indicate that there are risks of information leakage in following places: (1) tapping on the wire “between the stub resolvers and the recursive resolvers”, and (2) information leaks in the servers. The following research questions are formulated, having regards to the privacy breaching circumstances.

RQ1 Which benefits and possible disadvantages would DNS Privacy bring to dif- ferent actors? RQ2 Which are the areas that the existing DNS Privacy enhancement methods could not address? RQ3 Which combination of technologies would be suitable to address the current limitations? Table 1.1: Research questions

1.4 Motivation Most of the internet activities begin with DNS query, hence DNS is vital. Notwithstand- ing the importance of DNS, designers of the current DNS protocol have not taken con- sideration of “confidentiality of protocol metadata” [22]. Therefore DNS queries reveal communication flows, and this property of DNS protocol is used in different contexts by different actors. Examples of usages are traffic monitoring for network management or limiting the influence of malicious websites by DNS Footprinting of malware [23], or detecting malware infections [24].

3 Other exemplary usages of this property of DNS are nation-state surveillance [25], privacy-unfriendly activities of commercial sectors [26], and illegal actions by crimi- nals. Surveillance affects individuals to possess stress and anxiety [27], and behavioural changes like self-censorship [28]. RFC 6973 connotes that Privacy harms involve “harms to financial standing, reputation, solitude, autonomy, and safety [28]” of individuals. S. Farrell et al. state in RFC 7258 that allowing monitoring by benevolent actors and defending privacy against nefarious actors do not hold hand in hand, as the actions required to achieve both, regardless of the motivations, are indistinguishable [1]. Dis- advantages incurred by lack of DNS privacy significantly overweight advantages, and therefore, DNS privacy should be mitigated in any feasible practices.

1.5 Objectives The following objectives are set to answer the research questions and transform the tasks into the smaller pieces, and reasonings on each objective follow below. Hereafter several words are shortened for the ease of denotation, such as an objective as O and a research question as RQ. Table 1.2 summarises the objectives.

O1 Investigate different end-users’ privacy infringement in scenarios without DNS Privacy O2 Explore the status of mitigative methods to enhance DNS Privacy O3 Identify areas which the selected methods could not address. O4 Explore alternative approaches to mitigating the current limitations. Table 1.2: Objectives

RQ1 aims to explore benefits by having DNS privacy on a different group of people. As it could be challenging to motivate benefits of having privacy when obscurity prevails, O1 is defined to analyse scenarios when the privacy is violated caused by having insecure or exposed DNS queries. RQ2 is to find use cases of the existing DNS privacy mitigative methods. However, in order to discuss use cases, the state of these methods need to be examined. Therefore, O2 is set to explore the state of arts of the methods and O3 is set to identify areas for improvements. RQ3 attempts to provide useful interventions by finding a combination of technologies to overcome the limitation of the current DNS Privacy enhancement. For serving the purpose, O4 discusses possible approaches to using other technologies.

1.6 Scope/Limitation The project has a focus on improving the privacy part, from the security perspectives. In other words, reflected a Confidentiality, Integrity, Availability(CIA) triad, enhancing Integrity and availability perspectives are less prioritised in the current project. Issues and challenges of DNS security as a whole may be found in other studies, such as one conducted by Ning Hu et al.[29].

1.7 Target group The project aims to provide insight on DNS Privacy for Internet users and recursive re- solver providers for improving users’ privacy. However, it is anticipated that readers comprehend Internet protocol suite such as TCP/IP [30].

4 1.8 Outline The majority of scientific reports follow in Introduction, Methods, Results, Analysis and Discussion(IMRaD) pattern. This report is also in line with the IMRaD pattern. After the method chapter, four chapters follow, and each chapter presents the result of each objective incrementally. In other words, Chapter3 presents the result of O1, Chapter4 presents O2 and it follows up to Chapter6 which describes the result of the last objective, O4. Afterwards, the discussion chapter follows where relevant ethical discussions and consequences of DNS Privacy can be found. In the last, the conclusion chapter summaries the discussions and results of all research questions to which the thesis answered. There are two appendixes in this report. AppendixA introduces a python script that can be used in the future studies, and AppendixB presented a controlled experiment conducted during the project to acquaints with one of DNS Privacy enhancing method. The conducted experiment, however, does not provide an answer to the objectives set in the study; it provides a reference to one of the arguments. Therefore, the experiment is presented in the appendix for providing an angle in discussing the limitations of the current practice.

5 2 Method

This chapter describes the chosen scientific methods to answer the research questions (Table 1.1) and meet the objectives (Table 1.2). The study used scientific methods of systematic literature review and design science. Sections follow to motivate the choice of a scientific method for meeting objectives.

2.1 Systematic literature review A systematic literature review was performed to accomplish O2, which is to explore mitigative methods of DNS privacy. Although a recently published article provided an overview of DNS Privacy enhancing methods [17], performing systematic literature re- view was necessary to eradicate possible biases and to minimise opportunities of missing suitable solutions. A search criterion was set to list articles that cited RFC 7626 from a database Google Scholar to make the review process systematic. RFC 7626 was chosen, as its analysis had provided a clear insight of DNS privacy issues [15], and since around four years had passed after its publication, it was anticipated that fellow researchers have tried to solve or list risks identified in the article. For inclusion criterion, references of the found articles were further examined, as there were chances of missing to address well-established solutions possibly because the methods had been introduced before publishment of RFC 7626. Exclusion criteria were set to have the contents of the articles to be relevant to the defined problem. Therefore, any study which solves other security aspects of DNS, such as availability but not addressing the privacy problems were excluded. Duplicated entries from the search results were excluded. In case of having a series of revisions of the same article, the proceeding study was chosen to present.

2.2 Design Science Design science “creates and evaluates Information Technology artefacts to solve identified organisational problems [31]”, and it has strong relevance in addressing O3 and O4 which are to identify the area that studied DNS Privacy methods could not address and explore alternative approaches to overcome the limitations. In order to refine problem statements for the existing practices, as an input data for such method, a literature review was made to analyse privacy infringement scenarios on various user scenarios, as defined in O1 and limitations of the current methods were ex- plored, as defined in O3 before applying the design science method.

2.3 Reliability and Validity The study is seen to have reliability on the results of the literature review, as the same effect will be derived by performing a search as described in the previous section. AppendixA includes the source code of the experiment scripts which were used in proving an argument used in the study, and therefore, a similar result is expected to be derived by other researchers as well.

6 2.4 Ethical considerations Discussing ethical considerations has less significance in the chosen methods, as no real data of any physical persons is collected without the consent. However, it can be ques- tioned whether it is adequate to elaborate privacy breaching scenarios in details to use the information for formulating problems for the design science method. Despite the con- cerns, as the project aims to improve the problematic scenarios of not having sufficient privacy enhancements, describing the problematic situations as-they-are was necessary.

7 3 Privacy infringement in scenarios

Privacy means having an access control of confidentiality. In everyday language, Privacy means “the right to control who knows certain things about you”, according to Pfleeger [14]. He introduces three aspects of Information security: affected parties, sensitive data and controlled disclosure. Privacy issues related to the DNS on different subjects are analysed in this chapter.

3.1 Affected subjects The subjects were classified as private persons, and organisations. Organisations were further divided into large organisations that operate their directory server with DNS, and the smaller organisations that do not operate DNS resolvers in its network.

3.2 Sensitive information Defining the extent of what information should be considered as sensitive is in a subjective area, and the sensitivity of the information cannot be measured on an absolute scale. However, several common areas of sensitive information follow. For natural persons, EU [32] has defined sensitive information as the following: per- sonal data revealing “(1) ethnic origin, religious or philosophical beliefs, (2) trade-union membership, (3) health-related data, and (4) data concerning a person’s sex life.” For organisations, of their information, assets, especially copyright (expression of the idea), trade secret, and privileged information are examples of sensitive information [14].

3.3 Controlled disclosure Information assets have a different characteristic compared to any physical assets; these can be duplicated to an infinite amount at relatively low cost, without harming the as- set. This unique characteristic of information makes a “propagation problem”. Affected subjects lose control of the information about themselves after being disclosed.

3.4 Privacy breaching Scenarios This section provides an analysis of the privacy breaching circumstances per affected sub- jects by observing DNS. Before focusing impacts on each subject, common privacy im- pacts are introduced by elaborating the privacy-related metadata. Bortzmeyer highlighted that “DNS data itself and a particular transaction” should be confidential and transaction should not be public [15]. The transaction refers to the DNS lookup in this context. The following sections demonstrate how DNS reveals sensitive information about the affected subjects.

3.4.1 Metadata in DNS query Figure 3.2 identifies relevant metadata created on DNS transaction. The metadata created when looking up DNS record includes Query Name (QNAME or Request Name), Query type and client’s IP address. Contrary to the belief that metadata has only minimal infor- mation, the client IP address can be linked with WHOIS [33]. Furthermore, geographical information of the client and affiliate of the organisation can be derived.

8 Figure 3.2: Privacy related components on DNS transaction

The recursive resolver could see whether the query was answered from its cache or not, and this information provides an insight into the client’s behaviour. Given a sufficient number of clients sharing the same recursive resolver, “DNS queries on Chinese Top- Level-Domains(TLD) server had Zipf-like distribution [34]”. If this phenomenon also applies to the rest of the TLDs, administrators of the recursive DNS resolver could infer that a user attempts to visit not-so-frequently-visited web server.

3.4.2 Impact on individuals The Sensitive information related to the individuals could be derived by observing meta- data created on DNS lookup process. The example scenarios are given in the order of information defined in section 3.2. There may be a domain shared mainly by people with shared philosophical beliefs if people are situated in countries with the legislation of censoring obligations on web- sites’ administrators. It is likely that people that share certain philosophical or politi- cal beliefs would host a dedicated website for holding a community rather than utilising censorship-enabled large social platforms [35]. In such case, when a person visits such website, it may provide sufficient signal of the user’s philosophical beliefs to those who can eavesdrop the traffic, although the detailed activities on such website are protected by encryption. An employee visiting a certain trade-union’s website frequently in a corporative net- work may indicate to an IT-department that either the person is a member of the union or has a sympathy with the organisation. Observing DNS query enabled deriving health-condition of a user. If a chronic disease patient bears a smart sensor (such as Continuous Glucose Monitoring) that securely sends the measurement data to medical institutes [36, 37], observing her DNS traffics may infer her health condition. If someone actively visits several clinics that specialise in a particu- lar disease and relevant medical insurance company’s website, accompanied DNS queries provide meaningful insight into the person’s health status. DNS queries related to online dating sites could also be observed. However, there has no evidence has been found of potential sexual activity of individuals based on the frequency of the visit, without considering its sociosexuality [38]. Although the sexual activity cannot be derived, the sexual orientation of individuals could be induced in case the queried address is mainly visited among sexual minorities such as Grindr - the gay dating app [39].

9 3.4.3 Impact on corporates/organisations The section illustrates privacy breaching scenarios for corporates’ interest. First, scenarios of a large organisation are described. A common scenario regardless of the size of the organisations follows afterwards. Assuming that large-scale enterprises have sophisticated intranet and employees were assigned with managed work laptops, an employee would utilise a web browser’s book- mark to add company specific websites on his work computer. In this case, although users do not necessarily visit the corporative intranet sites, a web browser may prefetch the bookmarks [40, 41], which in its turn leaks the DNS queries. If the employee was trav- elling, the leaked domain names have potentials to reveal employee’s identity in public places by any observers, and the domain names can be collected to facilitate sophisticated and active attacks in the future. In addition to the leaks caused by bookmarks from the browser, a query value ‘_ldap._tcp.hostname’ indicates that a client’s dependency of a directory server [42] and reveals the existence of service in the hostname. Activities on new product development or sensitive assignment may be derived from the DNS query activities originated from the corporative IP address. Section 3.4.1 men- tioned about WHOIS record of IP addresses. Information on Autonomous System (AS) - administrative entity - might have a more substantial impact on identity linking on cor- porates compared to the individuals, once the information is correlated with the content of DNS Queries and frequency of the queries. Exchanging e-mails among various organ- isations or clients also involves DNS resolution and in this case Query type matters, since mail exchange server information is fetched by MX record of the DNS.

3.5 DNS queries as a digital fingerprint Kirchler et al. discussed possible ‘behaviour-based-tracking’ of linking user sessions with the help of unsupervised learning of their DNS traffic patterns. According to the authors, it is an “unobtrusive technique that allows observers to monitor user activities on the Internet over long periods of time - in spite of changing IP addresses [43].” Results of the study indicated that “unsupervised learning techniques could be employed to build comprehensive behavioural profiles [43]” from a DNS Query log. Potential usage of such technique may especially be alarming for members of an organisation with critical missions, not only the private individuals.

10 4 Status of mitigative methods

This chapter presents the result of a systematic literature review on studies related to DNS Privacy. Only the studies with Privacy improvement suggestions were chosen. The mitigative methods were classified into three groups, based on the methods’ approach to reducing the privacy risks. Studies shown in Table 4.3 attempted to secure the communication channel of the DNS query. In other words, these studies suggested applying Encipherment mechanism to deliver Connection Confidentiality as X.800 defines [44]. Applying channel encipherment methods for securing DNS queries mitigates the privacy risks by giving trust to the chosen recursive resolver.

ID Title Year Cites [45] Specification for Dns over (TLS) 2016 27 [18] Dns queries over Https (DoH) 2018 5 [46] Dns over Datagram Transport Layer Security (DTLS) 2017 3 [47] An opportunistic encryption extension for the DNS protocol 2015 2 [48] Usage profiles for and dns over dtls 2018 1 [49] Design and implementation of a lightweight privacy extension 2017 0 of DNSSEC protocol [50] Specification of DNS over Dedicated QUIC Connections 2019 0 [51] DNSCrypt 2015 0 [52] DNSCurve 2009 0 Table 4.3: Works of literature categorised as communication channel encrypting methods

Table 4.4 shows a list of studies which aimed to minimise privacy leaks by addressing the content of the packets that were generated in the DNS query process.

ID Title Year Cites [53] DNS query name minimisation to improve privacy 2016 33 [54] Oblivious DNS - Strong Privacy for DNS Queries 2019 0 [55] Mitigating Client Subnet Leakage in DNS Queries 2018 0 Table 4.4: Works of literature categorised as securing content

There were several pieces of research and design proposals of new architecture which would replace the current DNS system. These are found in Table 4.5.

ID Title Year Cites [56] Security and privacy analysis of national science foundation 2018 3 future internet architectures [57] The GNUnet System 2017 1 [58] A Paged Domain Name System for Query Privacy 2017 0 [59] Namecoin 2014 12 Table 4.5: Works of literature categorised as Architectural proposal

In addition to the classifications made in the above tables, the identified methods were further sorted into two approaches based on the designers’ view of the existing DNS model. Figure 4.3 demonstrated two trees, and of the two trees, the largest tree showed

11 Figure 4.3: Categorisation of DNS Privacy enhancing methods based on the approach methods which preserve the current hierarchical DNS structure, and the other represented proposals taking a radical approach; suggesting a structural change to decentralised DNS. Figure 4.3 resulted in having two labels of (A) and (C). (A) stood for categorisation proposed in this study and leaves with (C)-label were the methods proposed by other research groups.

4.1 Encipherment of communication channels Communication channel encrypting methods, as presented in Table 4.4, attempted to al- leviate privacy risks on the wire. Methods encrypting the data transport channel were sorted into ones utilising current internet standards or the others proposing customised protocols.

4.1.1 Reusing the current secure transport protocols DNS-over-TLS (DoT) [45], DNS-over-HTTPS (DoH) [18] and DNS-over-DTLS [46] were the techniques that reuse the proven Internet standards for securing the connection channel between a stub resolver and a recursive resolver. Technologies in this field could be understood as encapsulations of DNS operations with minimal changes to the existing Internet protocols. DoT and DoH applied encryption on the connection-oriented protocol (i.e. TCP), and by the number of citations and client implementations, these were the most prominent technologies within the DNS Privacy field. In the meantime, DNS-over-DTLS attempted to apply encryption over (UDP). In other words, it utilises Data Transport Layer Security (DTLS) [60].

12 4.1.2 Implementing own transport protocols A developing standard of DNS over Dedicated QUIC [50] also utilised a UDP-like con- nection called QUIC. QUIC is “UDP-Based Multiplexed and Secure Transport [61]”. By the time of the study, QUIC was still in a development phase. However, QUIC is promi- nent to be the next Internet standard as it is on IETF’s standardisation track, and the specification is actively being discussed. DNSCrypt [51] and DNSCurve [52] are the non-IETF-standardised techniques that use “elliptic-curve cryptography (e.g. XSalsa20Poly1305 [62] for channel encryption and for key exchange) for enciphering communications. [17]”

4.2 Information redactions Applying the encipherment mechanism on stub-recursive communications shifts the trust towards the Recursive resolver. Design enhancements of packets’ content also reduce the risks of Data in the DNS request and data leaks in the servers. Query Name (QNAME) minimisation [53] and Oblivious DNS [54] are the techniques of this category. QNAME minimisation reduces the query leaks on higher authoritative name server chains, by presenting only relevant part of a domain name for an authoritative server to answer, instead of querying with FQDN. It makes only the relevant name server to acquire the full QNAME and query type. Thus, it reduces privacy risks on the recursive-auth link. A distinction of different links of DNS communication is presented in the next chapter. The risk of data leak still presents in the last chain of the authoritative NS.

Figure 4.4: An overview of Oblivious DNS [63]

Oblivious DNS (ODNS) aimed to decouple any association of a client IP address and DNS query content and no single party should be able to see both [54]. To decouple client and query name information, using ODNS required two types of servers: normal recursive resolver and ODNS authoritative resolver. A client with a dedicated stub resolver fetches an encryption key by contacting ODNS server without revealing its desired query, but instead, sending a query of ‘special.odns’. The procedure results in creating a unique session key to encrypt its DNS query. The stub resolver appends ‘odns’ TLD to the encrypted query using the key, and the DNS query is sent towards the normal recursive resolver. The recursive resolver queries to one of ODNS authoritative resolvers, which in turn acts as ‘recursive resolver’. Figure 4.4 presents an overview of the ODNS, and the illustration is fetched from its project page [63].

13 4.3 Architectural shift Studies that aimed towards decentralised DNS system utilised peer-to-peer (P2P) imple- mentation, and the examples of such solution were Namecoin and GNU Name System (GNS). Namecoin [59] is described as “timeline-based system that relies on a P2P network to manage updates and store the timeline [64]”, and it settles down any commits related on ‘key-value mapping’ by transactions that are published in an append-only hash chain (a.k.a. blockchain) [65]. GNS [64, 22] is ‘privacy-preserving’ domain name lookup system which takes a radi- cal deviation from the conventional domain resolvers, by utilising P2P network and Dis- tributed Hash Table (DHT). DHT provides “support for an operation: given a key, it maps the key onto a node [66]”, such property is used in GNS domain lookup (resolving) pro- cess. In other words, GNS utilises “distributed storage of DNS records in P2P overlay networks [22]”. In addition to the DHT, GNS is built upon a petname system [67] and utilise Simple Distributed Security Infrastructure (SDSI).

14 5 Limitation of DNS Privacy methods

This section discusses limitations of the DNS privacy-enhancing methods presented in the previous chapter. The investigations of the limitations are presented in the same order as the previous categorisation. In the first section, the limitations of channel encipherment are introduced. Afterwards, the focus shifts to information redactions and proposals for structural shifts.

5.1 Risk areas of DNS Privacy DNS Privacy methods could further be analysed based on which privacy risk they address. S. Bortzmeyer identified risk area of DNS privacy in RFC 7626 as the following [15]:

1. Data in the DNS request

2. On the wire

3. In the servers

4. Re-identification and other Interferences

5.2 Two phases of DNS Query process For the conciseness of further analysis, the communication channel (transport channel) of DNS are abstracted into two phases or paths based on the DNS Query process. Phase 1 refers to steps 1 and 10 of Figure 1.1. These two steps are performed in the DNS Query process when stub resolver queries a domain name to the recursive resolver and the recursive resolver replies to the stub resolver. Phase 1 is also denoted as a stub- to-resolver link. Phase 2 refers to steps 2 to 9 in Figure 1.1, where a recursive resolver finds the final answer to the queried address, by recursively reaching to concerning authoritative servers. In other words, Phase 2 is called recursive-to-auth link.

5.2.1 Insufficient measurements on recursive-to-auth link The distinction of the two phases is noteworthy, as the current implementations that pre- serve DNS hierarchical structure do not secure all communication paths towards all in- volved parties of the DNS resolving. As an example, the majority of the methods do not encrypt communications on Phase 2 (recursive-to-auth link) in contrast to Phase 1 (stub-to-resolver link). The fact that an authoritative server is having a one-to-many server-client relationship from the recursive resolvers is the major obstacle of applying encryption on Phase 2. As a further explanation of one-to-many relationship being an obstacle, DoH [18] and DoT [45] use Transport Layer Security (TLS) protocol [45] for encryption. In the case an authoritative server process multiple TLS session, it is likely to end up exhausting its computational resources [68], similar to a Distributed DoS (DDoS) attack situation. The internet draft “Next step for DPRIVE: resolver-to-auth link [69]” discusses the aforemen- tioned challenges. Furthermore, authentication mechanisms are missing on Phase 2 [69], and the lack of authentication of the authoritative server may potentially enable a Man-in-the-middle attack (MITM).

15 5.2.2 Location of Recursive DNS resolvers The location of the recursive resolver results in having different attack surfaces, regardless of having applied DNS Privacy methods or not. In addition to it, the insufficient measure- ment of recursive-to-auth links makes further complication. Therefore, it is worthwhile of addressing the location of the resolver to address limitations of the channel encipherment approach. From the end user’s point of view, recursive DNS Resolvers can be on a local machine, one provided by the Internet Service Providers (ISP) and Public DNS servers [17]. Se- lection of the location of recursive DNS resolvers leads to different impacts on the user’s privacy, in terms of cache-sharing[17, 34] and obfuscation and logging. Section 1.1.2 described caching on recursive resolvers.

Figure 5.5: A simplified network map when using local recursive resolver server

When a user utilises a local recursive resolver, as illustrated in Figure 5.5, channel encipher methods do not add value to users’ privacy considering that operations of phase two are often not encrypted. Supposing that the user does not share local recursive re- solver among the others, DNS queries which the user makes will not be fetched from a cache but, instead, from all involved Authoritative Name Servers (NS). Sending queries in clear text on phase 2 leaves a possibility for all parties who are involved in the network packet transmission to monitor QNAME, query type and source IP of the traffic towards authoritative NS. Referring to the assumption that local recursive resolver is unique for a person, there is no space for obfuscation since no one else is querying from the IP address of the recursive resolver. However, utilising the local recursive resolver eliminates the risks of queries being logged (i.e. (3) risk in-the-servers) during Phase 1. Using ISP provided Recursive resolver is the most common scenario, as most ISP of- fer DNS resolver to their users by Dynamic Host Configuration Protocol (DHCP). The resolver from ISP is shared with other subscribers of the network, and it increases more chance of having queries cached by another user who acquired the address previously. Reusing cache reduces the need of Phase 2 in its response process [34] and thus gen- erates less ‘often-insecure’ traffics towards authoritative NS. Authoritative name servers see the source IP address of ISP’s resolver in Phase 2 of DNS resolving, but not IP of the individuals, in case E-DNS Client Subset(ECS) is not in place. When ECS is used, the authoritative NS may see truncated IP of clients [13], but the IP address does not present additional privacy harm as ISP’s recursive resolver is often in the same subnet IP range, and authoritative NS already acquaints source IP of the recursive DNS resolver. Privacy

16 Figure 5.6: A simplified network map when using the ISP-provided recursive resolver server risks incurred by logging may exist in ISP provided resolver, as ISPs may be obliged for log retentions due to legal requirements of the countries they operate in [70]. Channel encipherment on Phase 1 adds a value of users’ privacy to a limited extent, as tapping on the wire between the ISP’s recursive resolver and a stub resolver is often feasible for ISP itself rather than third parties.

Figure 5.7: A simplified network map when using public recursive resolver server

A scenario of using a public recursive resolver (as known as Public DNS resolver) is represented in Figure 5.7. A public DNS server has more possibilities of being shared by a broader public compared to the ISP provided resolvers, and it increases the chances of queries being already cached. Authoritative servers see requests from the IP address of the public resolver, instead of stub resolvers’ when ECS is not applied. This scenario benefits users the most when channel encipherment is applied because DNS query contents in Phase 1 is not visible for parties in the middle of the networking path. It brings significant obfuscation in tracking down the end-user by analysing the network traffics. However, Public DNS servers may log the DNS queries and information of the client. Therefore, privacy risk in-the-servers remains.

17 5.3 Observation on packets’ size Siby et al. presented their preliminary study results of observing DNS-over-HTTPS traf- fics [71]. They set two types of experiments: single-query and multi-query. Multi-query refers to simulating to load a web page and let the browser makes follow up queries of the embedded contents of a website. Their simulation of single-query visit reveals that DNS query-response size tuple results in being distinct per each queried address, and when applied multi-query simulation, the number of uniqueness associated to each DNS query-response increased [71]. If more researchers make similar observations, and if the conclusion holds, it becomes a concern since the result indicates that privacy attackers could observe the encrypted query-response set of packets size and induce the actual query despite not decrypting the packets.

5.4 Privacy leaks by Transitive trust Shulman demonstrated that ‘straightforward application of the encryption alone may not suffice [42]’ for protecting DNS Privacy due to possible disruption of DNS Availability and privacy leaks caused by ‘transitive trust [72]’. Shulman further analysed transitive trust as (1) fan-out and (2) chain-length. Chain-length refers to “A number of name servers involved in a resolution of a record that initiates the chain” and fan-out as “number of (transitive-trust) chains involved in a resolution of a domain name” [42]. Transitive trust at authoritative servers possesses a potential risk of revealing the query and client. While DNS QName minimisation [53] limits the scope of the query name leaks, the extensive usage of DNS in Content Delivery Network (CDN) context [73] may have increased the chain-length, and it leads to the interoperation issues [74].

5.5 Availability concerns on Oblivious DNS There are several criteria to define availability. The criteria for defining availability intro- duced by Pfleeger are (1) having a timely response to a given request, (2) ensuring fair allocation of resources and (3) being fault-tolerant [14]. This report defines availability as a chosen recursive resolver to resolve the queried address and provide the correct IP address promptly. High availability of DNS is important since web activities cannot be initiated without name resolution (as mentioned in Section 1.1.1). High performance (performance refers to having a low latency) of DNS is also an important factor to consider because DNS resolutions are a significant cause of having log Web waits [75, 76], and it has a notable impact on ‘User-perceived latency’. However, security concerns arise in terms of availability for Oblivious DNS, as ODNS operations require ODNS-authoritative servers in addition to the conventional recursive resolvers and imposing such design creates a new ‘central point of failure [77]’. Also, questions concerning assuring confidentiality of the keys used in ODNS and issues with fallback have not been answered [77].

5.6 Prevalence of hierarchical DNS Although the primary focus of the study is made on Domain Name resolving part, DNS resolving is only a portion of the entire DNS, as there are diverse actors (such as adminis-

18 trators, domain owners) involved in the system. Figure 5.8, which is presented by SUNET [78], illustrates the involved machines and servers of the system as a whole.

Figure 5.8: Parties involved in DNS as a whole [78]

In the figure, sections are vertically divided. The first column represents organisations or individuals that desire to acquire a new domain name. The third column demonstrates administrators of delegation on each autonomous domain hierarchy. Legacy of the system is often neglected in attempts of securing the domain name resolution process, which is presented in green colour in the figure. The naming hierarchy of the DNS “ties into systems such as the Public Key Infras- tructure (PKI) [79]”, and architecture of decentralised DNS, such as NameCoin [59], may not have considered the PKI structure in its design.

5.7 Privacy limitations of Namecoin Preserving user privacy in Namecoin requires “replication of the full blockchain at the user’s end system [64]” and performing such received criticism that it “may be impractical for some devices [64].” The criticism can be interpreted as applying blockchain does not add significant value, in terms of preserving the privacy of users, as privacy breaches in DNS resolving process could also be addressed if DNS records could be replicated on a client’s system even in the existing hierarchical system.

5.8 Integrity limitations on decentralised approaches Ensuring the integrity of the DNS record is significant, and it is evident from ‘phishing at- tacks [80, 81]’. Per contra, radical protocol design proposals may harm the integrity of the DNS records, as its proof-censorship or legal-attack-proof designs may enable everyone can claim the legitimacy of ownership of the specific domain. Solutions using DHT in its design may put the system vulnerable to possible Sybil attacks [82, 83]. Sybil attacks are a type of attack scenario where an attacker ‘subverts the

19 reputation system by creating a considerable amount of pseudonymous identities [84]’ ‘in the absence of identification authority [85].’ Interventing the malvaceous records are more challenging than the hierarchical struc- ture of the current DNS system in the decentralised design, and this could lead to concerns of enabling phishing attacks. Enhancing the confidentiality aspect of DNS security is es- sential, but it should not compromise the integrity aspect.

5.9 Opportunistic security Stenberg, a contributor for client implementation of DNS-over-HTTPS in Firefox, men- tioned common Secure-DNS challenges as follows [86]: (1) discovery of which recursive resolver to use (2) probing of service (3) opportunistic encryption and (4) blocking force- downgrade. In the case of Firefox’s implementation of DNS-over-HTTPS, the mode of operations are (1) race, (2) Trusted Recursive Resolver (TRR) first, (3) TRR only and (4) shadow mode [87]. Race and shadow mode stands for issuing DNS queries simultaneously on both conventional (insecure) DNS resolving method and Secure DNS method. In DNS privacy’s perspective, these modes are excluded for analysis. Using TRR first mode attempts to resolve DNS query over DoH and when it fails after pre-set limited of counts, it fallbacks to the conventional DNS resolver. At the current stage, a preliminary experiment, which is conducted in AppendixB showed that a limited amount of DNS query still fallbacks to the conventional resolver, and reveals the website a user intended to visit. The actual causes of leading to the query leaks are not dissected at the moment.

20 6 Evaluation of alternative approaches

This section provides the result of evaluating a possible mitigate option of overcoming limitations by DNS Privacy enhancing methods. DNS Privacy methods that require ar- chitectural shift was excluded in the evaluation. J. Heugten evaluated the combination of using different DNS Privacy techniques in his study [17]. The combinations were a group of channel-enciphering technologies (e.g. DNSCrypt, DoH, and DoT) and QNAME minimisation. The evaluations of attack sur- face were introduced based on the location of recursive resolvers: using local recursive resolver, private-remote recursive resolver and the public resolver. The collective result of his evaluation showed that “Eavesdroppers who are close to the last chain of the authoritative servers or close to the recursive resolver will be able to obtain a full frame of DNS packet”, and using public resolver may result in third parties eavesdropping or logging DNS communication [17]. If the cache of the remote-private recursive resolver were not shared among multiple-users, privacy would decrease since the permanent IP address of the remote resolver would be linked to an individual user.

6.1 Use of Traffic anonymisation Apart from J. Heugten’s evaluation, this report further examined the possible combina- tion of using traffic anonymisation techniques and Tor and the combinations as mentioned earlier. Traffic anonymisation technologies on recursive-to auth link can be applied to cir- cumvent the limitations of channel encryption methods, and examples of such technology are [88], GNUNet [57] and Tor [89]. Tor, the ‘low-latency’ traffic anonymisation tool, encrypts the traffic using multiple envelops and relays the encrypted traffic over three nodes and forwards it to one if its exit nodes. By this method, the traffic is encrypted and makes the traffic relatively anonymous. The evaluations are focused on Tor instead of other anonymising services because FreeNet and GNUNet have higher round-trip delay than Tor [90].

6.1.1 Lack of UDP support from Tor Using Tor over regular DNS resolution traffic may not be feasible in a conventional way since the majority of DNS traffics are sent over UDP and Tor does not support forwarding UDP traffic to its exit-node [91, 89]. DNS could also be sent over TCP connection but sending over TCP occurs at limited circumstances where the response exceeds 512 byte- limit [92], and the contacted authoritative servers should respect TCP request and reply in TCP packets for the DNS query to work.

6.1.2 Variation of latency Tor employs a two-hop circuit of onion relay servers to increase anonymity of TCP con- nections between a user and the exit node. There are several algorithms for selecting relay servers, but by default, Tor gives higher weights to relay servers, which are “more longstanding routers and providing higher bandwidth [93].” Since the choice of the relay servers is made using the criteria mentioned above, it may result in that relay servers are chosen without geographical relevance. Due to the limitation of the physical medium of communication-link [94], once relay server of a different continent are selected, it results in having higher latency compared to not using the Tor. The fact that the choice of relay servers differ on each circuit results in latency variation. In addition to this, “load-induced

21 queueing contributes significantly to the overall latency variation [95]”, and considering that Tor’s exit-node process multiple requests, the could be possible bandwidth limitations that result in queueing. Thus, applying Tor is considered as latency-expensive.

6.1.3 Tor on DNS Query Phase 1 There exist DNS Privacy methods with channel encryption. Despite this, if Tor is chosen to be used in Phase 1, it is regarded as the recursive resolver is not trusted for the risks in the servers. In this section, evaluations of Tor, based on the location of recursive resolver follow. Using local resolver does not involve Phase 1 over the network, and therefore is not examined. Using ISP-provided resolver over Tor does not add value in protection against attack- ers near the ISP network, as (1) queries are logged in the servers and (2) DNS transactions on Phase 2 are revealed and (3) although the requester for DNS is diffused the connection can be correlated by session-time observation. Using public resolver over Tor reduces risks of eavesdropping on the wire with Tor’s encryption and risks in the servers are also deducted since the recursive resolver only sees the Tor’s exit node when ECS is not used. Eavesdroppers closed to the last chain of authoritative servers persists, but they can only induce user as the exit node of a Tor circuit without ECS. However, it is questioned whether the additional benefit of confidentiality in the recursive resolver server overweights the sacrifice of availability.

6.1.4 Tor on DNS Query Phase 2 As previously mentioned, applying Tor on DNS queries is costly. While DNS channel encryption methods had limitations on Phase 2, applying anonymity which Tor provides on Phase 2 of DNS query could be beneficial. However, having control of Phase 2 limits the choice of the recursive resolver. For individuals, it indicates using the local recursive resolver. For a group of individuals or organisations, it would indicate that outgoing DNS queries from the managed recursive resolver towards authoritative Name servers are anonymised using the Tor network. Despite the variation of latency imposed by Tor circuit, the performance degradation caused by Tor in Phase 2 could be mitigated by the means of ‘Proactive caching [75]’, which is actively managing the cache stored on the recursive resolver rather than relying on passively created caches that die after their Time-to-Live (TTL) period.

6.2 Use of multiple trusted recursive resolvers This section describes a proposal to address the privacy risk of reidentification and other interferences. Regarding the reidentification risk, section 3.5 introduced a potential pri- vacy violation caused by ‘behaviour-based tracking’ which is proposed by Kirchler et al. [43]. Kirchler’s study had a focus on introducing tracking methods using an unsupervised learning technique and suggested that users could circumvent such tracking attempts by changing their IP address frequently. Although their study did not mention in specific how the researchers obtained the logs related to the DNS queries of various clients, it is anticipated that once DNS privacy-enhancing methods are in place, the most probable site of privacy violation would be inside the DNS recursive resolver.

22 To overcome the risks, this section delegate tasks into two group of administrators: DNS Resolver admin and local network admin. If the DNS resolver is not managed within the same organisation, the recommendations of section 6.2.1 can be ignored.

6.2.1 Actions required on local DNS Resolver admin Administrators could prepare a list of the trusted recursive resolver (TRR) based on log retention policy and deployment of QName minimisation. Once the list is prepared, they could configure the local server to forward DNS queries outside the organisation to ones from the list of TRRs. The selection of designated forwarder could be changed in a certain time interval, such as every six hours. If it is desirable to obfuscate the frequency of visit or temporal information of visiting websites, DNS resolver admin could choose to manage DNS cache actively, by means of the ‘proactive caching [75]’. Alternatively, performing ‘multi-queries [71]’ of the fre- quently visited websites, such as following Alexa Top lists, could be done to maximise cache-sharing effect [34]. AppendixA could be referred for achieving this.

6.2.2 Actions required on Network admin DPRIVE workgroup of IETF has on-going standardisation tasks of “DoT announcements using DHCP or Router Advertisements [96]” and “DoH resolver announcement Using DHCP or Router Advertisements [97]”. Once the standardisation process is finalised and the networking industries implement the new standards in their products, local network administrators can configure in the DHCP server to direct which Secure DNS resolver to use. Once network administrators change DHCP configuration to redirect clients to use different sets of TRR at a certain time interval of choice, it is anticipated that end-clients’ TRR configurations would be changed according to the pre-set DHCP lease time. The network administrators could consider the lease time for periodically changing the trusted DNS recursive resolver. However, there is no study at the moment that has verified the current proposal.

23 7 Discussion

The project examined securing Domain Name queries as a method of enhancing end- users’ privacy towards pervasive monitoring and presented sections for answering the defined objectives. In this chapter, arguments on the current direction of DNS Privacy methods developments are introduced. Arguments of whether it is ethical to empower DNS Privacy also follows.

7.1 Privacy leaking components apart from the DNS A question may arise why it mainly focuses on securing DNS, although there exist other factors which disclose users’ privacy. Mechanisms of a web filter are examined to answer the question, as it is a commonly found practical example of the large scale monitoring [98]. Web filter, also known as content-control software, is software that restricts access to a content that is delivered on the Web. Wazen et. al categorise mechanisms of legacy web- filtering into five techniques: (a) Port-based, (b) DNS, (c) IP Address, (d) Certificate, (e) Payload-based (f) HTTP proxy filtering techniques [99]. Except for the technique based on DNS filtering, the other methods are regarded as well-mitigated due to recent developments of the web environment. Among the various types of filtering techniques mentioned above, methods (a) and (c) are considered less efficient due to changes in the Internet ecosystem in recent decennial; Internet firms such as Google, Facebook and Amazon show strong presence [100], and the phenomenon may have reduced the diversity of traffic endpoint’s IP addresses. Moreover, it has become more common to have web services deployed in cloud environments [101], and IaaS providers extensively use ‘Virtual Host [3]’, which means various Web servers correspond to the same IP address. It also eliminates the need for utilising different ports to co-host services. Thus, port usages are normalised. Also, another notable change of the Internet is that adoption of HTTPS on the web has increased significantly [102]. The change has increased costs of performing technique (e) and brought challenges in payload-based traffic classification [103]. Also, it has made (f) less applicable, as a proxy does not directly process encrypted traffics [99]. Furthermore, the combination of wide deployment of HTTPS and Virtual Hosting has made technique (e) inefficient, because “many companies share the same certificate across different ser- vices and domain names [99]”. However, the trend change of Internet has not brought additional challenges to Domain Name System (DNS) filtering. Therefore, the project studies to remedy the weakest point towards users’ privacy, which in this case, is DNS.

7.2 DNS Privacy - possible aids for the criminals At the beginning of the report, the thesis motivated that securing architecture of DNS is necessary, as the vulnerability cannot only be exploited by the right hand but could also be misused by a malevolent party. It is feasible that the perpetrators may use DNS Privacy enhancement methods to hide their activities, and DNS Privacy is in good practice that makes it difficult for investigators to eavesdrop on queries. In the U.S. or common law jurisdictions, hindering lawful enforcement may be seen as “Obstruction of justice”. In this perspective, hiding DNS query by means of encryptions could be questioned whether such behaviour is concealing or covering up with the intent

24 to impede or obstruct the investigation or proper administration as 18 U.S. Code §1591 states [104]. M Bay has performed a thought experiment of the relation of civil disobedience and unbreakable encryption [105] based on John Rawls’ theory of justice. Not all DNS Pri- vacy enhancing methods have direct linkage with the ‘unbreakable encryption’ since encryptions are breakable with ‘the given sufficient time and computational resources [106, 107]’. However, it is noteworthy to introduce the aspect as exercising DNS Pri- vacy has similarity in a sense that DNS Privacy methods aim to achieve unbreakable- encryption-like situation. M Bay had made three assumptions before applying Rawlsian principles to encryption as follows [105]:

1. Unbreakable/impenetrable encryption is indeed impenetrable

2. The encryption in question is available to citizens and is not exclusive to certain institutions within society

3. The society examined here can be described as well-ordered [108, 109] in Rawls’ terminology

The thought experiment led to the following points: “In a well-ordered society, an ob- struction of law enforcement is at the same time an obstruction of principles of justice agreed upon by the society’s citizens.” and “if society is just, citizens must comply with its institutions in order for it to remain just.” However, in the case of “societies are only partially just or in which, say, an unjust war is waged by an otherwise just society. Then, the basic liberties of the individual take precedence, in the push for the restoration of justice [109].” However, according to Bay, civil disobedience and conscientious refusal lead to a violation of Rawls’ principles as he hypothesised with point three: the society is well-ordered. With regards to applying utilitarian arguments for encryption, M Bay concluded as “utilitarian reasoning does not provide us with a solution with regard to the conflict be- tween encryption, privacy and enforcement of justice, since it simply becomes a version of the age-old conflict between security and freedom at a higher level of abstraction [105].” Judging whether empowering DNS privacy is justice or not is a debatable question in many aspects. However, the encryption or having privacy facilities the freedom of speech in an injustice society, and this aspect shall not be disregarded. A distinction from the ethical discussion on encryption and DNS Privacy is that DNS operators are obliged to retain its logs for criminal investigation purposes, and therefore, encouraging the privacy is difficult to be seen as promoting illegal activities.

7.3 Recursive resolver centralisation E. Nygren from Akamai Technologies expressed concerns in the tendency of increased use of Public resolvers. He claimed that the tendency has ‘the risk of consolidating key parts of the Internet to rely on few services’ and resulting in ‘significantly impaired Inter- net performance’ for some use cases when ECS information is chosen not to sent [79].

7.3.1 DNS Privacy promoting the use of Public DNS servers It is however questioned whether the DNS Privacy technologies are encouraging users to chose the public recursive resolver in favour of the most popular Recursive resolver

25 operators. Due to difficulty in the manual configuration of TRR, it is likely for users to choose the default TRR, which are the [110] for Android 9’s DoT implementation [111] and CloudFlare in case of Firefox. On the other hand, if more Internet service providers were supporting DoH or DoT resolvers and if these providers were gaining the users’ trust, there is no necessary correlation for users deliberately changing to the public resolvers instead of utilising ISP provided resolvers upon DHCP standardisations [97, 96] are finalised and configuration are in practice.

7.3.2 Allegations on performance decrease Not providing ECS information could lead to impaired performance of the Internet service because insufficient information on the geographical location of users misleads the choice of ‘topologically localised [13]’ server for serving the client. However, before blaming users choosing ECS-turned-off recursive resolvers or Recur- sive DNS operators not supporting the ECS, non-availability of clients for opt-outing of ECS from the users’ perspective should be addressed. According to Kintis et al., “nearly five years after the ECS draft was proposed, there are still no client-centric tools that em- power users to control how much of their IP address is revealed [13]”. It is anticipated that once users gain control of how much of their IP to truncate, the decision on whether to sacrifice the availability for confidentiality shall rely on the users. It is not the content provider’s obligation to prohibit Internet users from making their choices for the sake of the performance.

7.4 IP as a human identifiable information People against using ECS often argue that ECS may reveal an IP address of the end- user behind the DNS query, and therefore has a chance of revealing the person’s privacy. However, it is an ongoing debate of whether the truncated IP address itself should be seen as a person identifiable factor. Clent’s IP address has a potential to be privacy harm, but main factors that contribute to the privacy infringement may be associated with the distinctness (uniqueness) of the address and whether the IP can be linked with other identifiers.

7.5 QNAME minimisation BIND, the most commonly deployed software for DNS servers, supports QNAME min- imisation by default from version 9.14.0 [112]. It is a significant achievement of DNS Privacy field, considering the market dominance of the BIND. Since the technology is enabled by default upon the updates, deploying the privacy technique is seen as ease for the recursive DNS server administrators. When it comes to choosing which DNS recursive resolver server to use, DNS server operators and literature on DNS Privacy often recommend examining whether the server does not support ECS or not. Since the implementation of QNAME minimisation is offi- cially in place, examining whether an arbitrary DNS Recursive resolver supports QNAME minimisation or not could also be a parameter to consider in choosing the TRR.

7.6 Vulnerability of Tor The current study presented DNS-over-Tor as a possible solution for addressing the lim- itations of current DNS Privacy technology. However, utilising Tor implies that such

26 practice inherits the vulnerabilities of the Tor as well. As an example of the vulnerabilities of the Tor network, Johnson et al. presented a study about Tor being susceptible to correlation attacks [113]. In other words, if the entry- node and the exit-node of a Tor circuit were operated by the same party, of the metadata of the communication was shared by these parties, correlating the circuit session is feasible. The study concluded that an adversary could deanonymise any given user who uses Tor regularly with over 50% of probability within three months and over 80% within six months. This fact raises a concern to see whether applying Tor for DNS traffic would be an adequate choice or not, considering the technology’s high cost of variable latency. Nev- ertheless, DNS operations over Tor posses slightly different characteristics than correlat- ing any regular web browsing because the portion of network traffics generated by DNS Query is much less compared to the portion created in loading websites. For an arbitrary attacker to correlate DNS queries over Tor, it would require sophisticated efforts. Fur- thermore, the temporal cost of performing the attack is costly. If an attacker managed to correlate the Tor traffic over three months, the value of information might not be valuable anymore. However, to draw the worthiness of the attack, log retention period of sub- scriber information and DHCP allocation of the client’s Internet Service Provider needs to be considered.

27 8 Conclusion

As its first research question, the study aimed to explore the benefits and disadvantages of DNS privacy to different actors. The study demonstrated harms caused by lack of DNS privacy both on the individual level and organisational level. Consequently, identifying privacy threats showed the benefits of having DNS Privacy. However, the disadvantage of DNS Privacy has not been found sufficiently. For the second research question of identifying areas that the existing DNS Privacy enhancement methods could not address, the study demonstrated the limitations of DNS Privacy technologies based on the categorisation the study made. Channel encipher meth- ods were not sufficient on Phase 2, and information redaction approaches raised availabil- ity concern while proposing architecture change of DNS showed integrity issue. As of answering the last research question, which is to identify a suitable combination of technologies to overcome the limitations, the study evaluated using Tor and poten- tial usage of ongoing DHCP standardisation for frequently changing trusted recursive resolver. The end-users, whose interests are analysed in the report, could refer to the re- port to meet their interests, and the approach used in the analysis could be referred to in examining other emerging innovations in DNS Privacy. As for the conclusion of the report’s investigations on privacy impacts on different user groups and circumventions of the DNS Privacy technologies, the report recommends the following practices. Private individuals could utilise already standardised techniques of DNS-over-HTTPS and DNS-over-TLS, but changing the trusted recursive resolver over an arbitrary interval is recommended. Corporates or organisations could benefit from utilising already-available QNAME minimisation in its DNS recursive resolver but could consider using DNS-over-Tor as presented in the report. For addressing the privacy in- fringement scenarios, organisations could configure their managed devices to forward all DNS queries over their trusted recursive resolver for avoiding identity leaks.

8.1 Scientific contribution Although there exist other studies of DNS Privacy techniques, this report distinguishes itself by being transparent of the survey method and having provided approaches to cate- gorise the DNS Privacy methods.

8.2 Current milestone The report presented the status of client implementations of different DNS Privacy tech- niques and progress on the internet standardisations. Although certain limitations of DNS Privacy methods exist as presented earlier, there is no doubt that DNS Privacy is cruising.

8.3 Future work Due to the limitation of the time, the proof of concept proposal made in this study has not been thoroughly examined. The actual implementation of the proposal, especially DNS-over-Tor in Phase 2 and examining privacy contribution of using multiple trusted resolver can be done as future studies. More specifically, future studies could explore the correlation of the privacy contributions and having different intervals for configuring Secure DNS servers (Trusted Recursive Resolvers).

28 References

[1] S. Farrell and H. Tschofenig, “Pervasive Monitoring Is an Attack,” RFC 7258, May 2014. [Online]. Available: ://rfc-editor.org/rfc/rfc7258.txt

[2] D. Herrmann, C. Gerber, C. Banse, and H. Federrath, “Analyzing characteristic host access patterns for re-identification of web user sessions,” in Information Se- curity Technology for Applications, T. Aura, K. Järvinen, and K. Nyberg, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 136–154.

[3] The Apache Software Foundation. (2019) Apache virtual host documentation. [Online]. Available: https://httpd.apache.org/docs/2.4/vhosts/

[4] P. Mockapetris, “Domain names - concepts and facilities,” Internet Requests for Comments, RFC Editor, STD 13, November 1987, http://www.rfc-editor.org/rfc/ rfc1034.txt. [Online]. Available: http://www.rfc-editor.org/rfc/rfc1034.txt

[5] R. Braden, “Requirements for internet hosts - application and support,” Internet Requests for Comments, RFC Editor, STD 3, October 1989.

[6] P. E. Hoffman, A. Sullivan, and K. Fujiwara, “DNS Terminology,” RFC 8499, Jan. 2019. [Online]. Available: https://rfc-editor.org/rfc/rfc8499.txt

[7] R. Elz, R. Bush, S. Bradner, and M. Patton, “Selection and operation of secondary dns servers,” Internet Requests for Comments, RFC Editor, BCP 16, July 1997.

[8] J. Postel, “Domain name system structure and delegation,” Internet Requests for Comments, RFC Editor, RFC 1591, March 1994, http://www.rfc-editor.org/rfc/ rfc1591.txt. [Online]. Available: http://www.rfc-editor.org/rfc/rfc1591.txt

[9] J. Damas, M. Graff, and P. A. Vixie, “Extension Mechanisms for DNS (EDNS(0)),” RFC 6891, Apr. 2013. [Online]. Available: https://rfc-editor.org/rfc/rfc6891.txt

[10] P. Špacek,ˇ “Will your dns break in 2019?” Oct 2018, RIPE 77 Workshop, Amsterdam. [Online]. Available: https://ripe77.ripe.net/presentations/7-flagday. pdf

[11] P. Špacek,ˇ Ólafur Guðmundsson, and O. Surý, “Minimal EDNS compliance requirements,” Internet Engineering Task Force, Internet-Draft draft-spacek- edns-camel-diet-01, May 2018, work in Progress. [Online]. Available: https: //datatracker.ietf.org/doc/html/draft-spacek-edns-camel-diet-01

[12] C. Contavalli, W. van der Gaast, D. C. Lawrence, and W. A. Kumari, “Client Subnet in DNS Queries,” RFC 7871, May 2016. [Online]. Available: https://rfc-editor.org/rfc/rfc7871.txt

[13] P. Kintis, Y. Nadji, D. Dagon, M. Farrell, and M. Antonakakis, “Understanding the privacy implications of ecs,” in Proceedings of the 13th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment - Volume 9721, ser. DIMVA 2016. Berlin, Heidelberg: Springer-Verlag, 2016, pp. 343–353. [Online]. Available: https://doi.org/10.1007/978-3-319-40667-1_17

[14] C. P. Pfleeger, S. L. Pfleeger, and J. Margulies, Security in computing, 5th ed. Upper Saddle River, NJ: Prentice Hall, 2015.

29 [15] S. Bortzmeyer, “DNS Privacy Considerations,” RFC 7626, Aug 2015. [Online]. Available: https://rfc-editor.org/rfc/rfc7626.txt

[16] P. Werneck, “Dns privacy,” in Network Architectures and Services, vol. 141. Tech- nische Universität München, 2014.

[17] J. van Heugten, “Privacy analysis of dns resolver solutions,” Master’s thesis, Uni- versity of Amsterdam, 2018.

[18] P. E. Hoffman and P. McManus, “DNS Queries over HTTPS (DoH),” RFC 8484, Oct. 2018. [Online]. Available: https://rfc-editor.org/rfc/rfc8484.txt

[19] R. Barnes, B. Schneier, C. Jennings, T. Hardie, B. Trammell, C. Huitema, and D. Borkmann, “Confidentiality in the Face of Pervasive Surveillance: A Threat Model and Problem Statement,” RFC 7624, Aug. 2015. [Online]. Available: https://rfc-editor.org/rfc/rfc7624.txt

[20] M. Thomas and D. Wessels. (2014, Oct) An analysis of tcp traffic in root server ditl data. [Online]. Available: https://indico.dns-oarc.net/event/20/contributions/ 270/attachments/249/465/rtt_Oarc_ThomasWessels_v1.pdf

[21] P. Mockapetris, “Domain names - implementation and specification,” Internet Requests for Comments, RFC Editor, STD 13, November 1987, http: //www.rfc-editor.org/rfc/rfc1035.txt. [Online]. Available: http://www.rfc-editor. org/rfc/rfc1035.txt

[22] M. Wachs, M. Schanzenbach, and C. Grothoff, “A censorship-resistant, privacy- enhancing and fully decentralized name system,” in International Conference on Cryptology and Network Security. Springer, 2014, pp. 127–142.

[23] E. Stoner, “Dns footprint of malware,” in Proceedings of 2010 OARC Workshop, vol. 2, 2010.

[24] R. Lemos, “Got malware? three signs revealed in dns traffic,” Dark Reading, informatech, May 2013. [On- line]. Available: https://www.darkreading.com/analytics/security-monitoring/ got-malware-three-signs-revealed-in-dns-traffic/d/d-id/1139680

[25] National Security Agency. (2014, Mar) Case studies of integrated cyber operation techniques. [Online]. Available: https://search.edwardsnowden.com/ docs/CaseStudiesofIntegratedCyberOperationTechniques2014-03-12_nsadocs_ snowden_doc

[26] N. Weaver, C. Kreibich, and V. Paxson, “Redirecting dns for ads and profit.” FOCI, vol. 2, pp. 2–3, 2011.

[27] A. Oulasvirta, A. Pihlajamaa, J. Perkiö, D. Ray, T. Vähäkangas, T. Hasu, N. Vainio, and P. Myllymäki, “Long-term effects of ubiquitous surveillance in the home,” in Proceedings of the 2012 ACM Conference on Ubiquitous Computing, ser. UbiComp ’12. New York, NY, USA: ACM, 2012, pp. 41–50. [Online]. Available: http://doi.acm.org/10.1145/2370216.2370224

30 [28] A. Cooper, H. Tschofenig, B. D. Aboba, J. Peterson, J. Morris, M. Hansen, and R. Smith, “Privacy Considerations for Internet Protocols,” RFC 6973, Jul 2013. [Online]. Available: https://rfc-editor.org/rfc/rfc6973.txt

[29] N. Hu, W. Deng, and S. Yao, “Issues and challenges of internet dns security,” Chi- nese Journal of Network and Information Security, vol. 3, no. 3, pp. 13–21, 2017.

[30] R. T. Braden, “Requirements for Internet Hosts - Communication Layers,” RFC 1122, Tech. Rep. 1122, Oct. 1989. [Online]. Available: https: //rfc-editor.org/rfc/rfc1122.txt

[31] R. H. Von Alan, S. T. March, J. Park, and S. Ram, “Design science in information systems research,” MIS quarterly, vol. 28, no. 1, pp. 75–105, 2004.

[32] “Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (text with EEA relevance),” OJ, vol. L 119, pp. 1–88, 2016-05-04.

[33] Internet Corporation for Assigned Names and Numbers. (2019) About WHOIS. [Online]. Available: https://whois.icann.org/en/about-whois

[34] Z. Wang, “Analysis of dns cache effects on query distribution,” The Scientific World Journal, vol. 2013, 2013.

[35] R. MacKinnon, “China’s censorship 2.0: How companies censor bloggers,” First Monday, vol. 14, no. 2, 2009.

[36] Medtronic. (2019) Installing and using the carelink uploader. [Online]. Available: https://www.medtronicdiabetes.com/customer-support/ carelink-software-support/installing-and-uploading

[37] ——. (2019, April) Medtronic and ibm watson health: Sur- facing new insights together. [Online]. Available: https://www. medtronic.com/content/dam/medtronic-com/transforming-healthcare/documents/ medtronic-ibm-watson-health-sugariq-diabetes-app_paper_tp_mdt_corpmark.pdf

[38] B. Sevi, T. Aral, and T. Eskenazi, “Exploring the hook-up app: Low sexual disgust and high sociosexuality predict motivation to use tinder for casual sex,” Personality and Individual Differences, vol. 133, pp. 17–20, 2018.

[39] W. C. Goedel and D. T. Duncan, “Geosocial-networking app usage patterns of gay, bisexual, and other men who have sex with men: survey among users of grindr, a mobile dating app,” JMIR public health and surveillance, vol. 1, no. 1, p. e4, 2015.

[40] M. Brinkmann. (2017, Jul) Disable the preloading of firefox au- tocomplete urls. [Online]. Available: https://www.ghacks.net/2017/07/24/ disable-preloading-firefox-autocomplete-urls/

[41] J. Roskind. (2008, Sep) Dns prefetching (or pre-resolving). [Online]. Available: https://blog.chromium.org/2008/09/dns-prefetching-or-pre-resolving.html

31 [42] H. Shulman, “Pretty bad privacy: Pitfalls of dns encryption,” in Proceedings of the 13th Workshop on Privacy in the Electronic Society, ser. WPES ’14. New York, NY, USA: ACM, 2014, pp. 191–200. [Online]. Available: http://doi.acm.org/10.1145/2665943.2665959 [43] M. Kirchler, D. Herrmann, J. Lindemann, and M. Kloft, “Tracked without a trace: Linking sessions of users by unsupervised learning of patterns in their dns traffic,” in Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security, ser. AISec ’16. New York, NY, USA: ACM, 2016, pp. 23–34. [Online]. Available: http://doi.acm.org/10.1145/2996758.2996770 [44] ITU-T Recommendation X. 800, “Security architecture for open systems intercon- nection for CCITT applications.” 1991. [45] Z. Hu, L. Zhu, J. Heidemann, A. Mankin, D. Wessels, and P. E. Hoffman, “Specification for DNS over Transport Layer Security (TLS),” RFC 7858, May 2016. [Online]. Available: https://rfc-editor.org/rfc/rfc7858.txt [46] K. T. Reddy, D. Wing, and P. Patil, “DNS over Datagram Transport Layer Security (DTLS),” RFC 8094, Feb 2017. [Online]. Available: https: //rfc-editor.org/rfc/rfc8094.txt [47] T. H. Bucuti and R. Dantu, “An opportunistic encryption extension for the dns protocol,” in 2015 IEEE International Conference on Intelligence and Security In- formatics (ISI). IEEE, 2015, pp. 194–194. [48] S. Dickinson, D. K. Gillmor, and K. T. Reddy, “Usage Profiles for DNS over TLS and DNS over DTLS,” RFC 8310, Mar. 2018. [Online]. Available: https://rfc-editor.org/rfc/rfc8310.txt [49] T. Saraj and M. Yousaf, “Design and implementation of a lightweight privacy ex- tension of dnssec protocol,” in 2017 13th International Conference on Emerging Technologies (ICET). IEEE, 2017, pp. 1–6. [50] C. Huitema, M. Shore, A. Mankin, S. Dickinson, and J. Iyengar, “Spec- ification of DNS over Dedicated QUIC Connections,” Internet Engineer- ing Task Force, Internet-Draft draft-huitema-quic-dnsoquic-06, Mar. 2019, work in Progress. [Online]. Available: https://datatracker.ietf.org/doc/html/ draft-huitema-quic-dnsoquic-06 [51] F. Denis and Y. Fu, “Dnscrypt,” 2015. [52] M. Dempsky, “DNSCurve: Link-Level Security for the Domain Name System,” Internet Engineering Task Force, Internet-Draft draft-dempsky- -01, Feb. 2010, work in Progress. [Online]. Available: https: //datatracker.ietf.org/doc/html/draft-dempsky-dnscurve-01 [53] S. Bortzmeyer, “DNS Query Name Minimisation to Improve Privacy,” RFC 7816, Mar 2016. [Online]. Available: https://rfc-editor.org/rfc/rfc7816.txt [54] A. Edmundson, P. Schmitt, N. Feamster, and A. Mankin, “Oblivious DNS - Strong Privacy for DNS Queries,” Internet Engineering Task Force, Internet-Draft draft- annee-dprive-oblivious-dns-00, Jul. 2018, work in Progress. [Online]. Available: https://datatracker.ietf.org/doc/html/draft-annee-dprive-oblivious-dns-00

32 [55] L. Pan, X. Zhang, A. Hu, X. Yuchi, and J. Wang, “Mitigating client subnet leakage in dns queries,” in 2018 16th Annual Conference on Privacy, Security and Trust (PST). IEEE, 2018, pp. 1–2. [56] M. Ambrosin, A. Compagno, M. Conti, C. Ghali, and G. Tsudik, “Security and privacy analysis of national science foundation future internet architectures,” IEEE Communications Surveys Tutorials, vol. 20, no. 2, pp. 1418–1442, Secondquarter 2018. [57] C. Grothoff, “The gnunet system,” Ph.D. dissertation, Université de Rennes 1, 2017. [58] D. E. Asoni, S. Hitz, and A. Perrig, “A paged domain name system for query pri- vacy,” in International Conference on Cryptology and Network Security. Springer, 2017, pp. 250–273. [59] A. Loibl and J. Naab, “Namecoin,” namecoin. info, 2014. [60] E. Rescorla and N. Modadugu, “Datagram Transport Layer Security,” RFC 4347, Tech. Rep. 4347, Apr 2006. [Online]. Available: https://rfc-editor.org/rfc/rfc4347. txt [61] J. Iyengar and M. Thomson, “QUIC: A UDP-Based Multiplexed and Secure Transport,” Internet Engineering Task Force, Internet-Draft draft- ietf-quic-transport-20, Apr. 2019, work in Progress. [Online]. Available: https://datatracker.ietf.org/doc/html/draft-ietf-quic-transport-20 [62] S. Nick. (2015, Feb) Do the chacha: better mobile perfor- mance with cryptography. [Online]. Available: https://blog.cloudflare.com/ do-the-chacha-better-mobile-performance-with-cryptography/ [63] A. Edmundson, P. Schmitt, and N. Feamster. (2018, Mar) Odns: Oblivious dns. [Online]. Available: https://odns.cs.princeton.edu/ [64] C. Grothoff, M. Wachs, M. Ermert, and J. Appelbaum, “Nsa’s morecowbell: knell for dns,” Technical report, Tech. Rep., 2017. [65] H. A. Kalodner, M. Carlsten, P. Ellenbogen, J. Bonneau, and A. Narayanan, “An empirical study of namecoin and lessons for decentralized namespace design.” in WEIS. Citeseer, 2015. [66] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup service for internet applications,” ACM SIGCOMM Computer Communication Review, vol. 31, no. 4, pp. 149–160, 2001. [67] M. Stiegler, “An introduction to petname systems,” in In Advances in Financial Cryptography Volume 2. Ian Grigg. Citeseer, 2005. [68] S. Bhople, “Server based dos vulnerabilities in ssl/tls protocols,” Master’s thesis, Technische Universiteit Eindhoven, 2012. [69] S. Bortzmeyer, “Next step for DPRIVE: resolver-to-auth link,” Internet Engineering Task Force, Internet-Draft draft-bortzmeyer-dprive-step-2-05, Dec. 2016, work in Progress. [Online]. Available: https://datatracker.ietf.org/doc/html/ draft-bortzmeyer-dprive-step-2-05

33 [70] U.K. Parliament. Investigatory Powers Act 2016 chapter 25. [Online]. Available: http://www.legislation.gov.uk/ukpga/2016/25/contents/enacted

[71] S. Siby, M. Juarez, N. Vallina-Rodriguez, and C. Troncoso, “Dns privacy not so private: the traffic analysis perspective,” 2018, the 18th Privacy Enhancing Tech- nologies Symposium, Barcelona.

[72] V. Ramasubramanian and E. G. Sirer, “Perils of transitive trust in the domain name system,” in Proceedings of the 5th ACM SIGCOMM Conference on Internet Measurement, ser. IMC ’05. Berkeley, CA, USA: USENIX Association, 2005, pp. 35–35. [Online]. Available: http://dl.acm.org/citation.cfm?id=1251086.1251121

[73] Z. Wang, J. Huang, and S. Rose, “Evolution and challenges of dns-based cdns,” Digital Communications and Networks, vol. 4, no. 4, pp. 235 – 243, 2018. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S2352864817300731

[74] S. Huque, “Query name minimization and authoritative dns server behavior,” May 2015, dNS-OARC 2015 Spring Workshop, Amsterdam. [Online]. Avail- able: https://indico.dns-oarc.net/event/21/contributions/298/attachments/267/487/ qname-min.pdf

[75] E. Cohen and H. Kaplan, “Proactive caching of dns records: Addressing a perfor- mance bottleneck,” Computer Networks, vol. 41, no. 6, pp. 707–726, 2003.

[76] J. Jung, E. Sit, H. Balakrishnan, and R. Morris, “Dns performance and the effec- tiveness of caching,” IEEE/ACM Transactions on networking, vol. 10, no. 5, pp. 589–603, 2002.

[77] B. Haberman, “Minutes ietf102: dprive,” IETF, Minutes minutes-102- dprive-00, July 2018. [Online]. Available: https://datatracker.ietf.org/doc/ minutes-102-dprive/

[78] J. Städje. (2018, Jan) Dns och dnssec utan facksnack. [Online]. Available: https://www.sunet.se/blogg/dns-och-dnssec-utan-facksnack/

[79] E. Nygren. (2018, Oct) Architectural paths for evolving the dns. [Online]. Avail- able: https://blogs.akamai.com/2018/10/architectural-paths-for-evolving-the-dns. html

[80] S. Ariyapperuma and C. J. Mitchell, “Security vulnerabilities in dns and dnssec,” in The Second International Conference on Availability, Reliability and Security (ARES’07). IEEE, 2007, pp. 335–342.

[81] G. Ollmann, “The phishing guide–understanding & preventing phishing attacks,” NGS Software Insight Security Research, 2004.

[82] L. Wang and J. Kangasharju, “Real-world sybil attacks in bittorrent mainline dht,” in 2012 IEEE Global Communications Conference (GLOBECOM), Dec 2012, pp. 826–832.

[83] E. Sit and R. Morris, “Security considerations for peer-to-peer distributed hash tables,” vol. 2429. Springer Verlag, 2002, pp. 261–269.

34 [84] Z. Trifa and M. Khemakhem, “Sybil nodes as a mitigation strategy against sybil attack,” Procedia Computer Science, vol. 32, pp. 1135 – 1140, 2014, the 5th International Conference on Ambient Systems, Networks and Technologies (ANT-2014), the 4th International Conference on Sustainable Energy Information Technology (SEIT-2014). [Online]. Available: http: //www.sciencedirect.com/science/article/pii/S1877050914007443

[85] J. R. Douceur, “The sybil attack,” in International workshop on peer-to-peer sys- tems. Springer, 2002, pp. 251–260.

[86] S. Daniel, “Dns-over-https - the good, the bad and the ugly,” Feb 2019, [Videorecording] FOSDEM 19, Brussels. [Online]. Available: https: //ftp.heanet.ie/mirrors/fosdem-video/2019/Janson/dns_over_http.mp4

[87] ——. (2018, Jun) Inside Firefox’s DoH engine. [Online]. Available: https: //daniel.haxx.se/blog/2018/06/03/inside-firefoxs-doh-engine/

[88] I. Clarke, O. Sandberg, B. Wiley, and T. W. Hong, “Freenet: A distributed anony- mous information storage and retrieval system,” in Designing privacy enhancing technologies. Springer, 2001, pp. 46–66.

[89] R. Dingledine, N. Mathewson, and P. Syverson, “Tor: The second-generation onion router,” Naval Research Lab Washington DC, Tech. Rep., 2004.

[90] S. Brack, R. Muth, S. Dietzel, and B. Scheuermann, “Anonymous datagrams over dns records,” in 2018 IEEE 43rd Conference on Local Computer Networks (LCN), Oct 2018, pp. 536–544.

[91] Tor project et al. (2017, May) UDP over Tor. [Online]. Available: https: //trac.torproject.org/projects/tor/ticket/7830

[92] J. Dickinson, S. Dickinson, R. Bellis, A. Mankin, and D. Wessels, “DNS Transport over TCP - Implementation Requirements,” RFC 7766, Tech. Rep. 7766, Mar 2016. [Online]. Available: https://rfc-editor.org/rfc/rfc7766.txt

[93] C. Wacek, H. Tan, K. S. Bauer, and M. Sherr, “An empirical evaluation of relay selection in tor.” in NDSS, vol. 13, 2013, pp. 24–27.

[94] A. Singla, B. Chandrasekaran, P. B. Godfrey, and B. Maggs, “The internet at the speed of light,” in Proceedings of the 13th ACM Workshop on Hot Topics in Networks, ser. HotNets-XIII. New York, NY, USA: ACM, 2014, pp. 1:1–1:7. [Online]. Available: http://doi.acm.org/10.1145/2670518.2673876

[95] T. Høiland-Jørgensen, B. Ahlgren, P. Hurtig, and A. Brunstrom, “Measuring latency variation in the internet,” in Proceedings of the 12th International on Conference on Emerging Networking EXperiments and Technologies, ser. CoNEXT ’16. New York, NY, USA: ACM, 2016, pp. 473–480. [Online]. Available: http://doi.acm.org/10.1145/2999572.2999603

[96] T. Peterson, “DNS over Transport Layer Security announcements using DHCP or Router Advertisements,” Internet Engineering Task Force, Internet-Draft draft-peterson-dot-dhcp-00, Apr. 2019, work in Progress. [Online]. Available: https://datatracker.ietf.org/doc/html/draft-peterson-dot-dhcp-00

35 [97] ——, “DNS over HTTP resolver announcement Using DHCP or Router Advertisements,” Internet Engineering Task Force, Internet-Draft draft-peterson- doh-dhcp-00, Apr. 2019, work in Progress. [Online]. Available: https: //datatracker.ietf.org/doc/html/draft-peterson-doh-dhcp-00

[98] S. J. Murdoch and R. Anderson, “Tools and technology of internet filtering,” Access denied: The practice and policy of global internet filtering, vol. 1, no. 1, p. 58, 2008.

[99] W. M. Shbair, T. Cholez, A. Goichot, and I. Chrisment, “Efficiently bypassing sni- based https filtering,” in 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM), May 2015, pp. 990–995.

[100] J. Haucap and U. Heimeshoff, “Google, facebook, amazon, ebay: Is the internet driving competition or market monopolization?” International Economics and Economic Policy, vol. 11, no. 1, pp. 49–61, Feb 2014. [Online]. Available: https://doi.org/10.1007/s10368-013-0247-6

[101] Statista. (2018, Feb) Current and planned usage of pub- lic cloud platform services running applications worldwide in 2018. [Online]. Available: https://www.statista.com/statistics/511467/ worldwide-survey-public-coud-services-running-application/

[102] A. P. Felt, R. Barnes, A. King, C. Palmer, C. Bentzel, and P. Tabriz, “Measuring HTTPS adoption on the web,” in 26th USENIX Security Symposium (USENIX Security 17). Vancouver, BC: USENIX Association, 2017, pp. 1323– 1338. [Online]. Available: https://www.usenix.org/conference/usenixsecurity17/ technical-sessions/presentation/felt

[103] Y. Xue, D. Wang, and L. Zhang, “Traffic classification: Issues and challenges,” in 2013 International Conference on Computing, Networking and Communications (ICNC), Jan 2013, pp. 545–549.

[104] “Destruction, alteration, or falsification of records in federal investigations and bankruptcy,” U.S.C. §1519, vol. 18, pp. 107–204, 2002.

[105] M. Bay, “The ethics of unbreakable encryption: Rawlsian privacy and the san bernardino iphone,” First Monday, vol. 22, no. 2, 2017. [Online]. Available: https://journals.uic.edu/ojs/index.php/fm/article/view/7006/5860

[106] C. Ellison and B. Schneier, “Ten risks of pki: What you’re not being told about public key infrastructure,” Comput Secur J, vol. 16, no. 1, pp. 1–7, 2000.

[107] J. Chau, “Application security–it all starts from here,” Computer Fraud & Security, vol. 2006, no. 6, pp. 7–9, 2006.

[108] J. D. Moon, Well-ordered society. Cambridge University Press, 2014, p. 874–877.

[109] J. Rawls, A theory of justice, ser. Oxford paperbacks ; 301. Oxford: Oxford Univ. Press, 1973.

[110] D. Sara. (2019, May) Dns privacy test servers. [Online]. Available: https: //dnsprivacy.org/wiki/display/DP/DNS+Privacy+Test+Servers

36 [111] K. Erik. (2018, Apr) Dns over tls support in android p developer pre- view. [Online]. Available: https://android-developers.googleblog.com/2018/04/ dns-over-tls-support-in-android-p.html

[112] V. Risk. (2019, Mar) Qname minimization and your privacy. [Online]. Available: https://www.opentech.fund/results/supported-projects/ bind9-qname-minimization/

[113] A. Johnson, C. Wacek, R. Jansen, M. Sherr, and P. Syverson, “Users get routed: Traffic correlation on tor by realistic adversaries,” in Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, ser. CCS ’13. New York, NY, USA: ACM, 2013, pp. 337–348. [Online]. Available: http://doi.acm.org/10.1145/2508859.2516651

[114] A. Holmes and M. Kellogg, “Automating functional tests using selenium,” in AG- ILE 2006 (AGILE’06). IEEE, 2006, pp. 6–pp.

[115] P. McManus. (2018, Jun) Improving dns privacy in firefox. [Online]. Available: https://blog.nightly.mozilla.org/2018/06/01/improving-dns-privacy-in-firefox/

[116] D. Stenberg. (2018, Sep) Doh in curl. [Online]. Available: https://daniel.haxx.se/ blog/2018/09/06/doh-in-curl/

37 A Appendix: Test script for validating DNS Privacy

The section presents a set of Python script that can potentially be used in verifying the proof of concept presented in this study. Sections follow with the code for processing a web site list, script to simulate the web traffic and the common base code for the above functions to work. Analysing and capturing the generated web traffic is not in the scope of the section.

A.1 Processing frequently visited web domain list The website list is fetched from Alexa top one million global charts and further classified depends on Top-Level-Domains (TLDs). Below is code for a script to convert the Alexa list into a dictionary format. #!/usr/bin/python3 from sys import exit import util def main():

if not util.is_there_csvfile(): print("A CSV of Alexa list does not exist!") if not util.fetch_alexa_list(): print("It could not fetch the file!") exit(1)

website_dict= util.read_csv_file(10) print(website_dict[’de’]) util.write_dict_to_json(website_dict) util.dict_to_bookmark(website_dict) if __name__=="__main__": main()

A.2 Script for automating the web traffic simulation Selenium is the web automation test tool [114], which is typically used to test web appli- cations. Selenium can be used to visit a list of websites for simulating DNS queries. In the script below, the selenium was incorporated with Firefox Gecko driver to control the web browser through a Python script. Python 3.5 and higher, Pip3 is required. It is anticipated that Python packages such as selenium and json are also installed on the system. A Gecko driver needs to be reachable in the OS’ PATH environment. This code assumes that the Firefox browser is installed on the local PC.

#!/usr/bin/python3 from selenium import webdriver from selenium.common.exceptions import WebDriverException import time

A import util def main(): doh_profile= __get_dns_over_https_profile() driver= webdriver.Firefox(firefox_profile=doh_profile)

website_dict= util.read_json_to_dict() if not website_dict: print("Dictionary could not be loaded!") exit(1)

for alexawebsite in website_dict[’de’]: try: driver.get("http://"+ alexawebsite) except WebDriverException as identifier: print(alexawebsite+" was not accessible!\n"+ identifier ) time.sleep(3)

driver.close() def __get_dns_over_https_profile(): firefox_profile= webdriver.FirefoxProfile() firefox_profile.set_preference(’network.trr.mode’, 2) firefox_profile.set_preference(’network.trr.uri’,’https://mozilla. cloudflare-dns.com/dns-query’)

return firefox_profile if __name__=="__main__": main()

A.3 Common based source Below is code for util.py which is a common base script. #!/usr/bin/python3 import json from collections import defaultdict import urllib.request import zipfile from glob import glob import os dir_name, _= os.path.split(os.path.abspath(__file__)) websites_dir= os.path.join(dir_name,’websites’) csv_filepath= os.path.join(websites_dir,’top-1m.csv’) json_filepath= os.path.join(websites_dir,’web_dic.json’) html_filepath= os.path.join(websites_dir,’bookmark.html’) class WebsitePair(object): def __init__(self, rank, url): self.rank= rank self.url= url

B splitted_url= url.split(’.’) if splitted_url[-1]: self.tld= splitted_url[-1] else: self.tld=’.’

def __str__(self): return self.url

# https://stackoverflow.com/questions/3768895/how-to-make-a-class- json-serializable def __repr__(self): return json.dumps(self, default=lambda o: o.__dict__, sort_keys =True, indent=4) def is_there_csvfile(): return os.path.exists(csv_filepath) def read_csv_file(max_entry=-1): """ This method readsa alexa rank csv file :param max_entry: maximum entry per TLD :return:a dictionary collection sorted by tld ifa file exists """

website_dict= defaultdict(list) if not is_there_csvfile(): return ""

with open(csv_filepath,’r’) as alexafile: for line in alexafile: data= line.rstrip().split(’,’) temp= WebsitePair(data[0], data[1]) if max_entry!=-1 and len(website_dict[temp.tld])< max_entry: website_dict[temp.tld].append(temp.url)

return website_dict def fetch_alexa_list(): """ The method downloadsa zip file of alexa top1 million statistics and unzips the content. :return: path of the extracted CSV file """ if not os.path.exists(websites_dir): os.mkdir(websites_dir) url="http://s3.amazonaws.com/alexa-static/top-1m.csv.zip" filename, headers= urllib.request.urlretrieve(url, filename=os. path.join(websites_dir,’top-1m .csv.zip’))

with zipfile.ZipFile(filename,’r’) as alexa_zipfile: alexa_zipfile.extractall(websites_dir)

return glob(os.path.join(websites_dir,’ *.csv’))

C def write_dict_to_json(website_dict): """ :param website_dict: :return: """ with open(json_filepath,’w’) as output_file: json.dump(website_dict, output_file) def read_json_to_dict(): website_dic= defaultdict(list)

if os.path.exists(json_filepath): with open(json_filepath,’r’) as file_reader: website_dic= json.load(file_reader)

return website_dic def dict_to_bookmark(website_dict): with open(html_filepath,’w’) as output_file: output_file.write(’\n’ ’\n ’ ’Bookmarks\n’ ’

Bookmarks Menu

\n’ ’

\n’) for tld in website_dict.keys(): output_file.write(’\t

’+ tld+’

\n’) output_file.write(’\t

\n’) for site in website_dict[tld]: output_file.write(’\t\t

’+ site+’\n’) output_file.write(’\t

\n’)

D B Appendix: Experiment of DNS-over-HTTPS

This section describes an experimental setup of verifying a DNS-over-HTTPS client.

B.1 Experiment background Identifying incoming and outgoing traffics around a recursive resolver is the interest, as recursive DNS resolver knows about stub resolver’s information and its query. The loca- tion choice of a recursive resolver leads to different value addition of privacy enhancement as described in Section 5.2.2. In each experiment, it is aimed to verify situations where privacy enhancements are anticipated.

B.2 Experiment setup The remote recursive resolver (i.e. public DNS server) is chosen to be used as the focus of the experiment laid on observing on the wire traffic between the stub client and the resolver (i.e. Phase 1). DoH clients that were available to test are Firefox web browsers later than version 62 [115], curl later than 7.62 [116] and DNSCrypt-proxy. Firefox’s DoH client is chosen to be evaluated, and DNS traffics are generated in two ways: 1. Automation of website visit using Selenium over Gecko driver. i.e. sequentially visiting a list of websites 2. Simultaneous visit of the list of websites from the bookmark. Following configuration are made on Firefox: 1. Set trr.mode as 2 i.e. Use DNS over HTTPS first. If fails, use the conventional DNS 2. Set trr.mode as 3 i.e. Use DNS over HTTPS only Per each experiment, the following steps are done to minimise possible caching-effect: # sudo systemd-resolve --flush-caches # sudo systemd-resolve --statistics $ wireshark -i enp4s0 -k & python3 visitwebsites.py

B.2.1 Hypothesis Regardless of the chosen way to generate web traffic and regardless of the mode two or three of Firefox DoH operations, all DNS traffics will not be sent over the clear-text. Therefore, tcpdump and Wireshark packet captures of each experiment set will not reveal DNS query by observing UDP 53 port.

B.3 Results When Selenium was used, configuration (1) with DoH first mode resulted in revealing all DNS queries, and DoH engine was not used. When Selenium was used, configuration (2) with DoH only resulted in not being able to fetch any websites. When bookmark-method was used, configuration (1) with DoH only resulted in some DNS traffics being revealed with conventional methods. When bookmark-method was used, configuration (2) with DoH only resulted in not being able to fetch any websites.

E B.3.1 Firefox’s DNS Lookup log While performing the experiments, the DNS related log files from Firefox were collected. The log in debug level contained lines of ‘D/nsHostResolver TRR lookup url.tld’ mes- sage followed by ‘OK’ or ‘FAILED’. Of the 638 lines, 407 entries lead to OK, and 231 entries indicated FAILED. Approximately 63.79% of the queries went through DNS-over- HTTPS.

B.4 Evaluation This section provides an interpretation of the observed results.

B.4.1 Possible interference caused by web automation Although AppendixA discussed the potential usefulness of the web automation tool, using Selenium with Gecko Firefox webdriver resulted that DNS-over-HTTPS were not being used. Since additional software is involved for a traffics simulation, there is higher a chance that an error in the automation tool may lead to misleading results, considering that bookmark-visit induced method did not show the same result.

B.4.2 Bootstrap procedure The reason that ‘DNS-over-HTTPS only mode’ failed to load any website is assumed as the DoH engine failed to initiate the connection to the DoH Server URI of ‘https: //mozilla.cloudflare-dns.com/dns-query’. It is interpreted as the experi- ments have not taken into the consideration of setting up the bootstrap procedures.

B.4.3 Unexplained causes of DoH query failures There is no clear explanation on the reason of having 36.2% of queries failed when DNS- over-HTTPS resolver was used. The author of the DoH engine has not given a clear expla- nation either for causes leading to the failures [86]. However, it is noteworthy to consider that simulation of web visits are made on simultaneous attempts; browsing twenty web- sites were initiated at the same time. The simultaneous visit could have been a factor of leading to an overload situation.

B.5 Conclusion of the DoH Experiments For the given experiment setups, using DoH over Firefox resulted in that some of DNS queries were leaked over conventional DNS resolver because DNS resolutions over its DoH engine failed.

F