<<

Linköping University | Department of Computer and Information Science Master thesis, 30 ECTS | Datateknik 2019 | LIU-IDA/LITH-EX-A--19/079--SE

Characterizing the HTTPS Trust Landscape – A Passive View from the Edge

Karaktärisering av HTTPS Förtroende-Landskap

Gustaf Ouvrier

Supervisor : Niklas Carlsson, Martin Arlitt Examiner : Niklas Carlsson

Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Upphovsrätt

Detta dokument hålls tillgängligt på - eller dess framtida ersättare - under 25 år från publicer- ingsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko- pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis- ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker- heten och tillgängligheten finns lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman- nens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to down- load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

© Gustaf Ouvrier Abstract

Our society increasingly relies on the Internet for common services like online banking, shopping, and socializing. Many of these services heavily depend on secure end-to-end transactions to transfer personal, financial, or other sensitive information. At the core of ensuring secure transactions are the TLS/SSL protocol and the “trust” relationships between all involved partners. In this thesis we passively monitor the HTTPS traffic between a campus network and the Internet, and characterize the certificate usage and trust relationships in this complex landscape. By comparing our observations against known vulnerabilities and problems, we provide an overview of the actual security that typical Internet users (such as the people on campus) experience. Our measurements cover both mobile and stationary users, consider the involved trust relationships, and provide insights into how the HTTPS protocol is used and the weaknesses observed in practice. Contents

Abstract iii

Contents iv

List of Figures vi

List of Tables vii

1 Introduction 1 1.1 Motivation ...... 1 1.2 Aim...... 2 1.3 Research Questions ...... 2 1.4 Delimitations ...... 2 1.5 Contributions ...... 3

2 Theory 4 2.1 Security Aspects ...... 4 2.2 Cryptographic Primitives ...... 5 2.3 Security ...... 10 2.4 Public Key Infrastructure ...... 13 2.5 Validating Certificate ...... 16 2.6 Certificate Issuance ...... 17

3 Method 19 3.1 Data Collection ...... 19 3.2 Data Processing ...... 19 3.3 Analyzing the Data ...... 21

4 Results 22 4.1 Summary of Dataset ...... 22 4.2 Ratio between HTTP and HTTPS ...... 23 4.3 Trust in Browsers ...... 24 4.4 Trust in Certificate Authorities ...... 26 4.5 Trust in Protocol Version and Suite Selection ...... 32 4.6 Session Quality Evaluation ...... 45

5 Discussion 47 5.1 Ratio between HTTP and HTTPS ...... 47 5.2 Trust in Browsers ...... 47 5.3 Trust in Certificate Authorities ...... 48 5.4 Trust in Protocol Version and Selection ...... 50 5.5 Methodology ...... 52 5.6 Wider Context ...... 53

iv 6 Conclusion 54 6.1 What are the Most Significant Trust Relationships in HTTPS Communication and How Trustworthy are They Actually in Practice? ...... 54 6.2 Are There Any Significant Differences between the Security of Mobile and Sta- tionary User Devices in HTTPS Communication? ...... 57 6.3 What Is the Quality of the Actual Security that Typical Users Experience when Accessing the Internet Using HTTPS? ...... 58 6.4 Future Work ...... 59

Bibliography 60

A Appendix 63

v List of Figures

1.1 Relationships and involved parties...... 1

2.1 Hash function...... 5 2.2 Symmetric-key ...... 6 2.3 code...... 7 2.4 Asymmetric-key cryptography...... 8 2.5 Digital signature scheme...... 8 2.6 Man in the middle attack scenario...... 9 2.7 Full TLS handshake...... 11 2.8 Abbreviated TLS handshake...... 13 2.9 Simplified scenario of PKIX in TLS protocol...... 14 2.10 Certificate landscape example...... 15 2.11 Schematic view of X.509 version 3 certificate format...... 16 2.12 Chrome browser certificate validation indicator...... 17

3.1 Log file processing...... 21

4.1 Number of established sessions plotted over time. Shows total number of sessions as well as the subsets of sessions using HTTP and HTTPS...... 23 4.2 Cumulative distribution function (CDF) of certificate validity period lengths. . . . 29 4.3 Complementary cumulative distribution function (CCDF) of number domain names per certificate...... 30 4.4 Clustered histogram showing the relation between offered and used protocol ver- sions. Each cluster shows the observed shares for the total number of sessions as well as the mobile and stationary subsets. The Y-axis is represented with a log- scale for better presentation of the small measurements...... 33 4.5 Top-15 selected by the in the cipher suite selection pro- cess. Each cluster shows a breakdown of the observed frequency for four subsets: sessions using mobile devices, stationary devices, protocol TLSv10, and protocol TLSv12...... 34 4.6 Top-15 offered encryption ciphers by the in the cipher suite selection pro- cess. Each cluster shows a breakdown of the observed frequency for four subsets: sessions using mobile devices, stationary devices, protocol TLSv10, and protocol TLSv12...... 35 4.7 Complementary cumulative distribution function (CCDF) of cipher suite list sizes offered by client and cumulative distribution function (CDF) of downgrades by the server...... 43 4.8 Cumulative distribution function (CDF) of RC4 Cipher when chosen by server. . . 45 4.9 Clustered histogram of the session quality evaluation based on the four-level se- curity classification...... 46

vi List of Tables

4.1 Dataset overview...... 22 4.2 Browser share...... 24 4.3 Chrome version distribution...... 25 4.4 version distribution...... 25 4.5 version distribution...... 25 4.6 version distribution...... 25 4.7 Top 10 organizations signing leaf certificates...... 26 4.8 Top-six certificates authorities signing EV certificates...... 26 4.9 Certificate signature algorithms grouped on type...... 27 4.10 Certificate public key grouped on type and key sizes...... 28 4.11 Domain name validation...... 31 4.12 Certificate validation...... 31 4.13 Key exchange algorithms used...... 37 4.14 Top-10 key exchange algorithms offered...... 37 4.15 Export-grade key exchange algorithms offered...... 39 4.16 Encryption algorithms used...... 39 4.17 Encryption algorithms offered...... 41 4.18 MAC algorithms used...... 42 4.19 MAC algorithms offered by client...... 42

5.1 Browser usage and version distribution...... 47

A.1 Protocol versions offered/used. Data points for Figure 4.4...... 63 A.2 Cipher suite used. Data points for Figure 4.5...... 64 A.3 Cipher suite offered. Data points for Figure 4.6...... 64 A.4 Offered key exchange algorithms. Full version of Table 4.14...... 65 A.5 List size and downgrades. The majority of the data points for Figure 4.7...... 66 A.6 RC4 cipher position when chosen by server. Data points for Figure 4.8...... 66

vii 1 Introduction

1.1 Motivation

We are living in an information society in which organizations and individual users increas- ingly rely on the end-to-end security and privacy offered by the HTTPS protocol. With HTTPS, regular Hypertext Transfer Protocol (HTTP) requests and responses are securely transferred over an end-to-end connection encrypted using (TLS) or its predecessor Secure Sockets Layer (SSL). With increased value of the information exchanged over the Internet, it is perhaps not sur- prising that HTTPS usage is increasing [21]. HTTPS can provide secure end-to-end transfers of money and other sensitive information, and is often used by authentication-based ser- vices such as online banking, shopping sites, and social networking services. With increased awareness of wiretapping and manipulation of network traffic, HTTPS has also become com- mon among services that have not traditionally used secure end-to-end connections.

Figure 1.1: Relationships and involved parties.

1 1.2. Aim

The “trust” relationships between all involved parties and the TLS/SSL protocol suite are the basis for ensuring secure transactions over HTTPS. To untangle the trust relationships, consider a user (Figure 1.1) accessing a using HTTPS. First, the user must trust the browser’s implementation of HTTPS. Additionally, a good browser must be up-to-date to protect against the latest known security vulnerabilities. Second, the browser (and implicitly the user) needs to trust the Certificate Authorities (CAs) in the browser’s root store. If a trusted CA is compromised or mistakenly starts gen- erating certificates for non-trusted servers, this significantly compromises the security that a browser provides. Unsurprisingly, there are differences in which CAs distinct browsers select to trust and different web services select different CAs for their certificates. Third, the browser needs to trust the server that it communicates with. This trust is often build around X.509 certificates signed by the CAs in the root store or by other trusted entities to which the CAs have delegated part of this responsibility. These chained trust relationships are further complicated by (chained) certificates often being valid for different time periods and being difficult to invalidate when trust relationships are broken. Finally, the user must trust the cipher suite negotiated between the browser and server during the TLS handshake in the HTTPS connection establishment. Ciphers have different cryptographic strengths, and many ciphers in use today have known vulnerabilities and can therefore pose a significant risk to the confidentiality of the information transferred between the two parties. Naturally, not all relationships are equally trustworthy. Hidden from a typical user, web sessions involve a diversity of servers, certificate authorities, certificates, and ciphers. This is why it is important to regularly check for security weaknesses in this complicated ecosystem.

1.2 Aim

The aim of this thesis is to characterize the trust landscape and the security risks observed in practice for each of the above described relationship types and discuss our findings in the context of known vulnerabilities. This goal will be achieved by passively collecting and ana- lyzing the HTTPS traffic between a campus network and the Internet. The traffic in this kind of network includes a great diversity of traffic from both mobile and stationary users, and will thereby provide an overview of the actual security that typical Internet users experience when browsing the web, as well as the trust relationships that are often invisible to the end user.

1.3 Research Questions

The aim of this thesis is formulated in the following research questions:

1. What are the most significant trust relationships in HTTPS communication and how trustworthy are they actually in practice?

2. Are there any significant differences between the security of mobile and stationary user devices in HTTPS communication?

3. What is the quality of the actual security that typical users experience when accessing the Internet using HTTPS?

1.4 Delimitations

This thesis will not evaluate the HTTPS and TLS protocols from a security perspective, but rather observe how the protocols are utilized in practice.

2 1.5. Contributions

1.5 Contributions

The main contribution of this thesis is a characterization of how HTTPS is used in practice, which includes a head-to-head comparison of the usage observed among mobile and station- ary users within the same campus network. Some of the major contributions of this thesis have been published in the following research articles:

1. Characterizing the HTTPS Trust Landscape: A Passive View from the Edge [23] Gustaf Ouvrier, Michel Laterman, Martin Arlitt, and Niklas Carlsson, IEEE Communica- tions Magazine, volume 55, issue 7, July 2017, pp. 36–42.

2. A first look at the CT landscape: Certificate Transparency logs in practice [15] Josef Gustafsson, Gustaf Ouvrier, Martin Arlitt, and Niklas Carlsson, In Proc. Passive and Active Measurement Conference (PAM), Sydney, Australia, Mar. 2017, pp. 87-99.

3 2 Theory

2.1 Security Aspects

Cryptography can provide security for many different aspects of a system. In this section, we will give a brief description of a few basic security aspects relevant to the thesis: confidential- ity, data integrity, and authentication.

2.1.1 Confidentiality Confidentiality is the property whereby information/communication is made private or se- cret from unauthorized parties. Using cryptography, this is achieved through encryption. Encryption uses a key to render data unintelligible to everyone except authorized entities. Depending on the type of cryptographic algorithm, the same key or a different related key is used to decrypt the data. Decrypting the data renders the data intelligible again. In or- der to provide confidentiality, the cryptographic algorithm must be designed and imple- mented in such a way that an unauthorized party cannot determine the keys used for en- cryption/decryption or be able to derive the plaintext directly without having access to the correct keys.

2.1.2 Data Integrity Data integrity is the property whereby unauthorized modification of informa- tion/communication can be detected. Modification of data includes: insertion, deletion, and substitution. Using Cryptography, this can be achieved through the use of digital signature, Hash, or message authentication code.

2.1.3 Authentication Authentication is the act of confirming the identity of something or someone. When it comes to computers this is normally the identity of a specific computer system or the identity of a user. Examples of authentication in a regular person’s life are plentiful; the simple act of unlocking your using a swipe pattern or PIN code, paying in a shop with a bank card and PIN code, or logging into a web service such as Facebook with a username and .

4 2.2. Cryptographic Primitives

One of the most common methods of authentication for a user is the requirement of a username and password combination when they login to a computer system or service. In this case, the username and password represents a piece of information that only an autho- rized user should know. However, there are many other ways in which authentication can be done. The different types of authentication can be divided into three separate categories, also known as factors of authentication. These factors describe the different properties that can be used in order to authenticate an entity.

• Knowledge Factor: A knowledge factor is a secret which only the authenticating party should know. They are often , PIN-codes or answers to personal questions.

• Ownership Factor: An ownership factor is something that only the authenticating party has access to. In computer systems this can for instance be a smart card or it could be a certificate used to authenticate web servers. They can be represented in many ways using many types of technology.

• Inherence Factor: An inherence factor should describe something the authenticating party is or does. For a user/person this usually means biometrics such as fingerprints or retina patterns, but it could also be a behavioral trait such as the way someone writes their signature.

2.2 Cryptographic Primitives

Cryptographic primitives can be seen as the basic building blocks of cryptographic systems. When designing a cryptographic system it is rarely built completely from scratch. This is both because it is very difficult to design cryptographically secure algorithms and because analyzing and verifying that an algorithm is secure is much harder than verifying that it works. Instead, well established concepts and methods are often reused in order to create a cryptographic system that has the required properties (not necessary to reinvent the wheel every time you design a new car). In this section the most widely used primitives are briefly described.

2.2.1 Cryptographic Hash Functions A cryptographic hash function is a cryptographic construct used in many information secu- rity applications and protocols such as digital signatures, integrity checking, and authenti- cation. Just like a normal hash function, it takes a message M of any length as input and creates a short, fixed length hash value, also known as digest D, as shown in Figure 2.1. The hash value can be seen as a fingerprint of the input data. Unlike a normal hash function the cryptographic hash function should also have the following properties:

Figure 2.1: Hash function.

• Pre-image resistance (One-Way): It should be easy to compute the hash value but com- putationally hard to recreate the original message from its hash value. I.e., given D it should be difficult to find M such as D=H(M).

• Weakly collision-free: Given a message and its corresponding hash value it should be computationally hard to find another different message which applying the hash

5 2.2. Cryptographic Primitives

function would result in the same hash value. I.e., given M1 it should be difficult to find M2 such as M1 ‰ M2 where H(M1) = H(M2).

• Strongly collision free: It should be computationally hard to find any two messages where applying the hash function would result in the same hash value. I.e., it should be difficult to find any pair M1 ‰ M2 where H(M1) = H(M2).

The last two properties are similar, with the third being more strictly defined. Not all cryptographic hash functions fulfill the third property. These properties ensure that digests created by a hash function can be trusted and that the hash function cannot be used to mali- ciously create or modify messages to have specific digests. Hash functions with these prop- erties are considered cryptographic hash functions. Common cryptographic hash functions include:

• MD5: Was once a widely used hash function, but is now considered cryptographically broken and usage is highly discouraged.

• SHA-1: Designed by NSA and published by the National Institute of Standards and Technology (NIST). Although cryptographically weak it is still widely used. It is being phased out in favor of SHA-2.

• SHA-2: The successor to SHA-1 and is the current recommendation by NIST. The SHA- 2 family consist of six hash functions: SHA-224, SHA-256, SHA-384, SHA-512, SHA- 512/224, and SHA-512/256.

• SHA-3: The newest SHA standard. It was the result of a competition that ended in 2012.

2.2.2 Symmetric-Key Cryptography Symmetric-key cryptography, also known as private key cryptography, refers to crypto- graphic algorithms that use a single shared secret to perform both encryption and decryption. Figure 2.2 shows a typical scenario of using symmetric-key cryptography. Alice uses the key K to encrypt the message M creating cipher . On the other end, Bob uses the same key K to decrypt the cipher C recreating M.

Figure 2.2: Symmetric-key cryptography.

Symmetric-key algorithms can be implemented in two different ways; as either stream or block ciphers. The difference between the two is how they process the data. Block ciphers work with fixed length blocks of data and encrypt/decrypt each block as a whole, while stream ciphers process individual bits or bytes one at a time. Cryptographic algorithms based on symmetric keys are in general fast and computation- ally cheap compared to other types, and are therefore the most common choice for encrypting the bulk data sent of communication channels. However, the reason for its speed is also its main drawback. Symmetric-key algorithms depend on the existence of a shared secret, but do not define how it can be established. Securely sharing a secret between two remote parties is a very difficult problem. Communication sent over a network such as the Internet can be intercepted and modified and it can therefore not be trusted to directly transfer a secret. The

6 2.2. Cryptographic Primitives secret can of course be agreed upon beforehand, but that is not feasible in most situations. Different ways of establishing a key over an insecure channel is described in Section 2.2.5. Common symmetric-key cryptographic algorithms include:

• RC4: Is one of the most popular stream ciphers. It is favoured due to its simplicity and speed. However, it has arguable weaknesses and has been repeatedly shown to be easy to implement in very insecure ways, as shown by WEP [13]. Recent research has also shown more vulnerabilities [2, 17, 22] and RFC 7465 [24] deprecates its use in TLS.

(DES): DES was once the predominant standard but has now been withdrawn as a standard by NIST. The use of all older versions of DES are highly discouraged and the strongest variant 3DES is only recommended for legacy systems for compatibility purposes.

• Advanced Encryption Standard (AES)-128, 192, 256, 384: AES is the established NIST standard. It is still considered strong encryption. NIST recommends AES-256 for new systems. AES is by itself a , but there are optional modes like the Cipher Block Chaining (CBC) mode which allows it to function as a .

2.2.3 Message Authentication Code A Message Authentication Code (MAC) algorithm takes both an arbitrarily long message and a secret key as input to create a short, fixed length value. This value is called the MAC and is attached to the message to protect its data integrity as well as ensure its authenticity.

Figure 2.3: Message authentication code.

Figure 2.3 shows a typical scenario of using MAC. Alice uses the key K to create a MAC of the message M. Both message M and MAC is sent. On the other end, Bob uses the key K to recreate the MAC in the same way as Alice did. Bob can then compare the two MACs and verify the authenticity and integrity of the message M. Since the MAC algorithm requires a shared secret key, every party that knows this secret can both create and verify the MAC. This means that a MAC can only be used to verify that the message has been transferred without modification from a party that also knows the secret key. In cases with many involved parties, a MAC cannot be used to verify which specific party created it. The use of a shared secret also means that MAC algorithms have the same key-distribution problems as symmetric-key cryptography. A common method of implementing MAC algorithms is to utilize cryptographic hash functions. This type of MAC is called an HMAC and is specified in RFC 2104 [18].

2.2.4 Asymmetric-Key Cryptography The difference between symmetric- and asymmetric-key cryptography is that while symmetric-key cryptography uses the same key for both encryption and decryption, asymmetric-key cryptography uses two different but mathematically related keys. The keys are referred to as the private key that is to be kept secret, and the public key that can be freely distributed to anyone. The existence of the public key is also why asymmetric-key

7 2.2. Cryptographic Primitives cryptography is also commonly known as public key cryptography. Asymmetric-key cryp- tography algorithms require a significantly higher amount of computation power and are therefore slow in relation to symmetric-key ciphers. This is why they are mostly used during connection establishment for authentication and key exchange purposes. Asymmetric-key cryptography can mainly be used in two different ways, either as a public-key encryption system or a digital signature scheme.

Figure 2.4: Asymmetric-key cryptography.

In a public-key encryption system the public key is used to encrypt a message that thence- forth can only be decrypted using the paired private key. This allows anyone in possession of the public key to send encrypted messages to the private key holder. Figure 2.4 shows a typical scenario of using a public-key encryption system. Alice uses the public part of the key K to encrypt the message M creating cipher C. On the other end, Bob uses the private part of the key K to decrypt the cipher C recreating M.

Figure 2.5: Digital signature scheme.

In contrast, a digital signature scheme works the other way around. The private key holder creates a digital signature by signing a message. The signature can then be verified using the paired public key. Figure 2.5 shows a typical scenario of using a digital signature scheme. Alice uses the private part of the key K to create a signature S of the message M. Both message M and signature S is sent. On the other end, Bob uses the public part of the key K to verify that signature S was indeed created from message M using the paired private key by comparing the result with message M. Just like a MAC, a digital signature ensures both the integrity and the authenticity of the message. Due to the use of asymmetric-key cryptography, it is possible to determine exactly who signed the message and therefore also establish non-repudiation. Digital signatures are central to many security schemes such as Public Key Infrastructures (PKI). Common asymmetric-key cryptography algorithms include:

• RSA: One of the first and still most popular asymmetric-key cipher. There are variants both for public-key encryption and digital signature systems. NIST currently recom- mends a minimum of 2048-bit. Smaller key sizes are discouraged.

• ElGamal: Another asymmetric-key cipher that is based on the Diffie-Hellman key ex- change. There exist variants both for public-key encryption and digital signature sys- tem as well as variants based on Elliptic Curve Cryptography (ECC). NIST currently recommends a minimum key size of 224-255-bit.

8 2.2. Cryptographic Primitives

• Digital Signature Algorithm (DSA): Also referred to as NIST’s Digital Signature Stan- dard (DSS). DSA is a digital signature scheme based on ElGamal and also exists in a ECC variant. NIST currently recommends a minimum key size of 2048-bit and 224-255- bit for the ECC variant.

2.2.5 Key Exchange For symmetric-key ciphers and MACs to work, both communicating parties must be in pos- session of a shared secret key. This can be accomplished by agreeing upon the key beforehand or by using a secure side channel (for example, a courier). However, this is not a feasible op- tion for online situations where two previously unknown entities want to communicate with each other. It is in such situations where key exchange algorithms come into play. A key ex- change algorithm allows two parties to collectively establish a shared secret over an insecure channel without it being exposed to a listening third party. Common key exchange algorithms include:

• RSA/DSA key exchange process: Based on a public-key encryption system. One party sends its public key to the other, which in turn uses it to respond with an encrypted message containing the newly generated secret key.

• Diffie-Hellman key agreement: Invented in 1976 by Whitfield Diffie and Martin Hell- man. It is not based on encryption and decryption, but instead relies on mathematical functions that enable the two parties to separately arrive at the same key by sending each other parts that are based on generated random values. Even though a third party can see both transmitted parts they cannot derive the secret key without any of the random values.

In the ideal situation, where both parties are communicating directly with each other and are who they say they are, key exchange algorithms work very well. However, what if a mali- cious third party intercepts the messages and completes the key exchange separately with the two original parties. The attacker can then continue to secretly relay, alter and possibly inject data into the communication between the two parties who still believe they are communicat- ing directly with each other. Such a scenario is commonly known as a man-in-the-middle (MITM) attack. In the context of key exchange the issue of authenticating the other party is called the identity authentication problem.

Figure 2.6: Man in the middle attack scenario.

Figure 2.6 shows a typical scenario of such a MITM attack. Alice attempts to perform a key exchange with Bob, but the messages are intercepted by the MITM who , in Alice’s place, initiates the key exchange with Bob using its own parameters. The MITM completes the

9 2.3. Transport Layer Security key exchanges separately with both Alice and Bob, and can thenceforth freely decrypt and encrypt any communication sent between them. Solutions to the identity authentication problem are typically based on one of two con- cepts: either a centralized authority solution, e.g., Public Key Infrastructure (PKI), or a where the responsibility is spread out between all users, e.g., OpenPGP [8].

2.3 Transport Layer Security

Transport Layer Security (TLS) and its predecessor Secure Socket Layer (SSL) are crypto- graphic protocols designed to provide privacy, data integrity and be- tween two communicating applications on the Internet, typically a client and a server. The protocols operate directly on top of a reliable transport layer protocol, almost always TCP, and support a large number of popular protocols including HTTPS, Inter- net Message Access Protocol (IMAP), Simple Mail Transfer Protocol (SMTP) and Extensible Messaging and Presence Protocol (XMPP).

2.3.1 History SSL/TLS are protocols with long histories. SSL was originally developed by in 1995 along with their flagship browser . The first SSL version was never publicly released, but it is known to have had serious security issues. Their second attempt, SSL 2.0, now deprecated [32], was also rapidly discarded due to a series of severe weaknesses. The first successful version, SSL 3.0 [14] in 1996, was a major rework and fixed a number of conceptual flaws in the earlier versions. Security improvements upon SSL 2.0included the addition of support for SHA-1 based ciphers and certificate authentication. SSL 3.0 is now also a deprecated protocol [19]. Further development and maintenance of the protocol was handed over to the Internet Engineering Task Force (IETF), which renamed it TLS to avoid legal issues. TLS 1.0 [3] was published in 1999 and is typically seen as a minor editorial update to SSL 3.0. However, the differences were significant enough to preclude interoperability with earlier versions. From a security standpoint, TLS 1.0 was more desirable than SSL 3.0 because TLS 1.0 added SHA-1 support while SSL 3.0 depended on the weak MD5 hash function for master key derivation. TLS 1.1 [10] was published in 2006 and mainly added protection against CBC attacks and support for IANA registration parameters. TLS 1.2 [11] was published in 2008 and included significant improvements like support for GCM and CCM modes of AES and the use more secure hash functions. The latest version TLS 1.3 [25] was published in 2018.

2.3.2 Record Protocol TLS is a layered protocol. Inside a TLS connection all messages are sent using the TLS record protocol. This protocol functions as an intermediary layer between the TCP connection and the sub-protocols of TLS. The record protocol defines the format for how messages are to be framed and it performs the operations that maintains the secure channel. A message using this format is called a TLS record. A record always contains information about content type, version and length of the record. Inside the record, messages from sub-protocols are con- tained and if a secure connection has been established a MAC is added. If a block cipher is used then potentially some extra padding is also added. When messages are transmitted, the record protocol performs the following four - tions:

• Fragmentation: Messages are divided into blocks smaller than 214 bytes (16KB) and multiple messages of the same type are potentially coalesced into a single record.

10 2.3. Transport Layer Security

• Compression: Optional compression of messages is performed.

• Message Authentication: A MAC is created using the HMAC algorithm and appended to the record.

• Encryption: The negotiated cipher is used to encrypt the message and MAC.

On the receiving end of the communication channel, the inverse operations are performed in the reverse order to recover the messages. The actual keys, cryptographic hash function and encryption method used to secure the record protocol is agreed upon during the hand- shake and therefore the initial messages of a TLS session are sent in the clear.

2.3.3 Handshake Protocol The handshake protocol operates on top of the record protocol and is used to establish a SSL/TLS session. The goal of the handshake is to agree upon which protocol version to use, which cryptographic algorithms to employ, cryptographic keys, and compression methods. During the handshake both parties also have the possibility to authenticate each other, al- though in a client-server situation typically only the server is ever authenticated.

Figure 2.7: Full TLS handshake.

The handshake consists of several messages sent back and forth between the two parties. Figure 2.7 illustrates the full handshake between a typical client and a server. The full hand- shake is generally initiated by the client sending a ClientHello message that states the client’s intention to use SSL/TLS.

• ClientHello: Contains prevalence ordered lists of supported protocol versions, crypto- graphic algorithms, and compression methods. It also contains the “Client Random” (a nonce), and optional extensions such as . In the case of session resumption the client can send a previously used session ID to resume the session (see abbreviated handshake below).

In SSL/TLS the different combinations of cryptographic algorithms are called cipher suites and are identified by a unique 16-bit symbolic string. For example, the cipher suite

11 2.3. Transport Layer Security

DHE_RSA_WITH_AES_256_CBC_SHA means that the record protocol will use HMAC- SHA1 and AES encryption in Cipher Block Chaining mode with a 256-bit key, and the key exchange is performed using standard Diffie-Hellman. Initially the cipher suite and compres- sion method is set to null, which means no encryption or no compression. If the server finds a combination of protocols and algorithms that it also supports, it will send the ServerHello message in response. Also, depending on the chosen cipher suite sev- eral other messages may additionally be sent.

• ServerHello: Contains the server chosen protocol version, session ID, cipher suite, and compression methods, as well as the “Server Random” (a nonce), and optional exten- sions that will be used for the connection.

• Certificate: An optional message that is used for server authentication and is therefore almost always sent since the server is rarely not authenticated. It contains the server’s certificate, which in turn contains the server public key (more details in the Certificate and Authentication section).

• ServerKeyExchange: An optional message only used for the more complicated Key Exchange algorithms, e.g., Diffie-Hellman.

• CertificateRequest: An optional message that the server can send to request that the client also needs to authenticate itself. Contains a list of “Root Certificates” that the server will use to check the validity of the client’s certificate.

• ServerHelloDone: Is a marker message that indicates that the server will not send any more messages, and the client can proceed.

The client then responds with:

• Certificate: An optional message that is used for client authentication and is only sent in response to the server’s CertificateRequest message. It contains the client’s certificate.

• ClientKeyExchange: This message is always sent and contains the client part of the actual key exchange. The message-content depends on the negotiated cipher suite.

• CertificateVerify: An optional message that contains a digital signature computed over all previous handshake messages. This message is only sent in response to the server’s CertificateRequest message. Its purpose is to prove to the server that the client really owns the public key in the certificate sent.

• ChangeCipherSpec: This message is actually not a handshake message, but has its own message type: change_cipher_spec and will therefore be sent in its own record. Its content is purely symbolic and signals that the client will from now on start to encrypt the messages using the negotiated settings.

• Finished: This message contains a cryptographic checksum computed over all the pre- vious handshake messages. Since it is sent after the ChangeCipherSpec message it is encrypted with the negotiated cipher suite and keys. The purpose of the message is to protect against alterations and to serve as proof that the server has talked to the same client all along.

The server completes the handshake by sending its own ChangeCipherSpec and Finished messages. At this point the client and server can begin exchanging other types of messages. The full handshake must be performed for all new connections, but if the client recently connected to the target server and would like to reuse the same parameters the abbreviated handshake can be used. In the case of an abbreviated handshake the client will send the previously used session ID in the ClientHello message. If the server also remembers the

12 2.4. Public Key Infrastructure

Figure 2.8: Abbreviated TLS handshake. parameters it may send the same session ID back in the ServerHello message and then move on directly to the ChangeCipherSpec and Finished messages as shown in Figure 2.8. The main advantages of using the abbreviated handshake are the fewer number of messages required and the fact that there are no costly asymmetric cryptographic computations. This greatly reduces the overall latency and is therefore frequently used by modern browsers and servers. A handshake is not required to be initiated by the client or limited to only a single hand- shake per connection. At any time, within an established SSL/TLS connection, the client can send a new ClientHello message or the server can send a HelloRequest message to initiate a new handshake. A typical scenario where this functionality is used is when the server after seeing the full request path requires that the client authenticates itself.

2.3.4 Change Cipher Spec Protocol The change cipher spec protocol is a minimal protocol that is used to changes in ci- pher strategies. The protocol consists of a single message containing a type that has a fixed value. In a standard handshake the protocol is used before the finished message. It indicates that following messages will be secured using a new cipher suite and keys that have been negotiated.

2.3.5 Application Protocol The application data protocol is used for sending the application data between the communi- cating parties after the connection has been established. The protocol simply frames the data by providing type, version and the message length.

2.3.6 Alert Protocol The alert protocol is used when the connected parties need to notify each other of problems during a connection. Alert messages can be sent at anytime. The alert message contains a level and a description. The level informs of the problems severity and can be either warning or fatal. A fatal alert always leads to the closing of the connection

2.4 Public Key Infrastructure

The authentication process performed on servers during the TLS Handshake is done utiliz- ing a Public Key Infrastructure (PKI) built with the ITU-T standard X.509, which the IETF

13 2.4. Public Key Infrastructure adopted for use in TLS. The IETF specifies their X.509 PKI (referred to as PKIX) in the stan- dard RFC 5280 [9] and its update RFC 6818 [33].

2.4.1 Certificate Authorities The PKIX essentially consists of trusted third parties, called Certificate Authorities (CAs), who are organizations responsible for issuing digital Certificates and administering their va- lidity. In the PKIX arrangement the CA agrees to vouch for the identity of the Server by issuing a certificate, which essentially is a cryptographic binding of the Server’s identity and public key. The binding is achieved using a digital signature scheme where the CA holds the private key and distributes the related public key to clients. If the Client trusts the CA then the server can authenticate itself to the client by providing the certificate, which the Client can use the CA’s public key to verify.

Figure 2.9: Simplified scenario of PKIX in TLS protocol.

Figure 2.9 shows a simplified usage scenario of the PKIX in the TLS/SSL handshake pro- tocol. The Server sends a message to the CA requesting that the CA vouches for the Server’s identity by issuing a certificate. The CA performs identity checks, validating the server iden- tity, before continuing to issue the certificate. Now a Client opens a connection to the server and initiates the TLS/SSL handshake. The server responds and provides the certificate. The Client then uses the CAs Public Key to verify the validity of the certificate and checks that the certificate’s identity matches the identity of the server it wants to connect to. If the au- thentication is successful the Client can use the public key contained inside the certificate to perform the key exchange and complete the handshake. In the PKIX landscape, a strict tree-like hierarchy is assumed where the CAs reside at the top. Each CA controls a handful of authority certificates, which are certificates with the privilege to issue further certificates, vouched for by them directly. These certificates are called root certificates and are used to issue further certificates, which can be either another authority certificate, called an intermediate certificate, or an end-entity certificate, called a leaf certificate. Every new intermediate certificate can issue further intermediate certificates resulting in a chain of trust, referred to as a certificate chain or path. Figure 2.10 shows a simple hypothetical certificate landscape example. R1, R2 and R3 are root certificates to the respective CAs, while I1, I2, I3 and I5 are intermediate certificates directly signed by the root certificates. L1-L8 are end-entities e.g., web sites, vouched for by

14 2.4. Public Key Infrastructure

Figure 2.10: Certificate landscape example. the corresponding CAs. The case of I3 signing I4 is called a cross signing and is useful in situations where one CA resides in the Root store while the other does not. The Root Store is a browser-dependent selection of Root Certificates that are automatically trusted when installing the browser in question. In the example only R1 and R2 reside in the Root Store so a website with certificates L1-L6 will be considered trusted by the browser while L7-L8 are untrusted. The reason for the Root store is to relieve common users, who might not even know about certificates, from having to specify which CAs they trust themselves. The pre-specified Root store is not locked and a user can remove or add trusted CAs at will. The use of intermediate certificates has its advantages. From a security standpoint it is preferable to keep the private key of a root certificate offline and keep an easily accessible directly signed intermediate certificate for online business signing purposes. It also helps to spread the workload of identity checking and signing procedures to so-called intermediate certificate authorities, especially for globally operating CAs who can delegate such tasks to local authorities. However, each intermediate certificate can be used to issue a valid certificate for any domain, so consequently every new intermediate certificate increases the number of points for attacks. If even a single CA or a subordinate intermediate, trusted by a Root store, is compromised the whole system becomes vulnerable. The attacker can issue valid certificates for any web site, circumventing the identity authentication, and consequently perform MITMAs despite of the whole PKIX implementation.

2.4.2 X.509 Certificate The certificate used in the PKIX is structured according to the X.509 Certificate standard spec- ified in RFC 5280 [9]. Figure 2.11 shows a schematic view of the certificate format.

• Version: Describes the version of the certificate (1, 2, or 3). Depending on version not all fields are present.

• Serial Number: Is a positive integer assigned by the CA to each certificate. Together with the issuer name, the serial number uniquely identifies the certificate.

• Signature Algorithm Identifier: Contains the identifier of the algorithm used by the CA to sign the certificate.

• Issuer: Identifies the entity that issued the certificate.

15 2.5. Validating Certificate

Figure 2.11: Schematic view of X.509 version 3 certificate format.

• Validity: Contains the time interval during which the certificate is valid.

• Subject: Identifies the entity associated with the public key stored in the certificate.

• Subject Public Key Info: Contains the value of the public key bound to the subject identity and the identifier of the algorithm with which the key is used.

• Issuer- and Subject Unique Identifier: Are used in situations where the issuer and/or subject names are reused over time. The fields are optional and requires certificate version 2 or 3.

• Extensions: Contains a sequence of possible certificate extensions. This field is optional and requires certificate version 3.

• Signature: Contains the identifier of the algorithm used to sign this certificate and the actual signature value. It must be the same identifier as in the Signature Algorithm Identifier field.

2.5 Validating Certificate

In the previous section we briefly outlined how a certificate is validated, but did not get into much detail about what is actually done. In reality, the standardized specifications and browsers actual implementations vary in many different ways. This is because browsers, not only need to implement the functionality, but are also required to support backwards com- patibility and a wide variety of sometimes erroneous behavior by both clients and servers. Browsers also develop new features and functionality separate from the standards on their own, creating further deviations.

2.5.1 Building the Certificate Chain After receiving a certificate in the SSL/TLS handshake, the client must build the certificate chain from end-entity certificate, through the possible intermediates, up to the root certificate. A certificate is not considered trusted unless this chain is complete, i.e., each certificate in the chain is successfully verified by the next until a trusted root certificate is reached. In the ideal

16 2.6. Certificate Issuance case the server will provide any and all intermediates as well as the end-entity certificate, but in practice this is not always the case and browsers instead depend on caches and extensions to help build the chain.

2.5.2 Verifying the certificates Along with building the certificate chain, the validity of each individual certificate in the chain must be validated. Each certificate is only valid for a specific period, as specified by the validity field in the certificate. The validity period is specified by a start date, “not before”, and an end date “not after”. Depending on what kind of certificate it is the length of the period can vary greatly. An end-entity certificate is usually only valid for several months to a few years while a root certificate are valid for much longer, sometimes decades. All certificates must also be checked for revocation to ensure that they have not been marked as untrusted. Certificates are revoked from use when CAs are compromised or when individual certificates are detected being misused. Mechanisms allowing for checking the revocation status of certificates include the standard Certificate Revocation List (CRL) [9] and the OCSP service [28], as well as a ’s Certificate Transparency (CT) system [20]. Apart from checking the expiration and revocation status of the certificates, specific name and path length constraints placed by the CAs on intermediates must also be checked. If a CA has placed a path length constraint on an intermediate certificate it limits how long the chain is allowed to be below them. Similarly, a name constraint can restrict an intermediate to only be allowed to issue certificates for specific subdomains. For example, if an intermediate certificate is restricted to the subdomain “*.example.com” and a path length of zero, it is not allowed to issue certificates for “*.other.com” or any further intermediates.

2.6 Certificate Issuance

One of the more important aspects of using certificates is the issuance process. To achieve the necessary trust in online transactions, the CAs are required to thoroughly investigate the identity of the server before issuing a certificate. However, the increasing market demand for certificates has led some commercial CAs to introduce simpler and cheaper kinds of certifi- cates that require less stringent identity checks.

2.6.1 Domain Validation Domain validation level certificates are the simplest types of certificates. They simply provide a baseline level of proof that you are communicating with the correct server. They are issued as soon as the CA confirms that the certificate requester is the actual owner of the target domain. The advantage of domain validation certificates is that they make secure online communication available to a wider market, but the fact that anyone can get them means that they hold no real weight.

Figure 2.12: Chrome browser certificate validation indicator.

A successfully verified domain validation level certificates is usually indicated with a “padlock” somewhere in the address bar. In the case of Google’s Chrome browser it is a green padlock right before “...” as shown to the right in Figure 2.12.

17 2.6. Certificate Issuance

2.6.2 Organization Validation Organizational validation level certificates are issued to companies and provides a higher level proof than the domain level validation. These certificates require that the ownership of the domain, as well as the company itself is verified by the CA before being issued. The advantage of organizational over domain level validation is that they not only guarantee domain ownership, but also provides a certain level of trust about the company as well. Some browsers indicate a successfully verified organization validated certificate by coloring the address bar.

2.6.3 Extended Validation Extended Validation (EV) certificates were introduced in 2007 as an initiative to provide a high standard certificate for organizations where is essential for the business, e.g., online banking, and to some degree restore the waning user trust in a certifi- cate. To obtain an EV certificate the company must go through an extensive vetting process and all details about the company must be verified. Not every CA is allowed to offer EV certificates; this is restricted to CAs who have passed an independent qualified audit review. While EV certificate may seem similar to the organization validation certificate, the key difference is the level of validation that is required. The EV certificate itself also includes extra identification information that further allows the certificate to be identified on an organiza- tional level during the certificate validation process. The guidelines for EV certificates are managed by the CA/Browser Forum. A successfully verified EV certificate usually, in addition to the “padlock”, includes the name of the company and the issuing CA in the colored display in the address bar. The left bar in Figure 2.12 shows how Google’s Chrome browser indicates EV certificate.

18 3 Method

The objective of this thesis is to observe how the HTTPS protocol is used in practice . For this purpose we used data collected passively from a network. In this section the method in which the data was collected, processed, and analyzed is described.

3.1 Data Collection

The data used in this thesis was exclusively collected from the University of Calgary, Canada, through passively monitoring the traffic between the campus network and the Internet. Pas- sive monitoring is a technique used to collect data from a monitored network by copying the traffic via a network tap. The network tap is an external hardware device that is inserted at a specific point in a network to mirror the traffic that passes through it, in this case the uni- versity’s multi-Gbps ingress/egress link. The traffic in this kind of network includes a group of more than 30,000 users (students, staff and professors). This provides good examples of how typical HTTPS communication generally looks like, both in terms accessed and devices used (e.g., , tablets, desktops, servers).

Privacy Concerns: We only gathered statistical data for the purpose of analyzing properties of the TLS/SSL communication. We do not conduct any analyses to identify the activities of individual users on the campus. The rules regarding the distribution of data collected at the university of Calgary are very strict and does not allow recorded data containing IP addresses of users to leave the university. All the data processing, as described in the next Section 3.2, was done on the University of Calgary’s servers and only the final aggregated log-files, containing exclusively statistical data, left the campus. Furthermore, any actionable information regarding security on the campus network was shared with the campus IT staff.

3.2 Data Processing

The network traffic was processed using the network security monitor Zeek version 2.4.1 (called Bro at the time of analysis) [31]. Zeek provides a comprehensive analysis framework for both general network traffic analysis and more specialized analysis of the TLS/SSL com-

19 3.2. Data Processing munication. Using the Zeek framework allowed us to log specific information about the non-encrypted part of the TLS/SSL handshake and all digital certificates sent.

3.2.1 Zeek Scripts The scripting language in Zeek uses an event-driven approach where writing scripts involves handling the events generated by Zeek as it processes network traffic. All events are placed into an ordered "event queue", allowing scripted event handlers to process the events on a first-come-first-serve basis. Typical events generated are for example when a new HTTP or HTTPS connection is initiated or closed. We use these events for creating new data storage objects and writing the data to a log file respectively. We developed several Zeek scripts for the purpose of recording data from HTTPS commu- nication. Each script produces log files, stored and compressed in intervals. The three main scripts are as follows: a script to summarize statistics of TLS/SSL communications, a script to record statistics of certificate usage, and script to record the ratio between HTTP/HTTPS traffic.

TLS/SSL Communications Script: A script that records a summary about every HTTPS session initiated and established. The information is stored in a log file with one entry per session. The summary record for each session includes:

Client Protocol Version: The highest TLS/SSL Protocol that the Client supports. Protocol Version Used: The TLS/SSL Protocol that the Server chooses to use for the session. Client list of Cipher Suites: The preference ordered list of cipher suites that the Client supports. Cipher Suite Used: The cipher suite that the Server chooses to use for the session. Certificate Chain Length: The Length of the Certificate Chain. User-Agent String: The user-agent string of the client used to distinguish between mo- bile/stationary clients and their browser versions. See information about how this is derived in section 3.2.2. Validation Status: Status of Session Validation. Status: Status of possible Heartbleed attacks during this session.

Certificate Statistics Script: A script that records a summary of from every individual cer- tificate sent to the browser during the authentication stage of the TLS handshake. The infor- mation is stored in log file with one entry per certificate. For each certificate the summary includes:

Signature Algorithm: The Algorithm used to sign the certificate. Public Key Algorithm: The algorithm used for the public key. Period Validity: The period validity status and validity period duration. Extended Validation: If the certificate is an EV certificate. Basic Constraints: If the certificate utilizes any basic constraints like for example the path length constraint. Subjects: The subject or subjects associated with the certificate. Issuer Certificate: The Authority Certificate used to sign the the certificate.

20 3.3. Analyzing the Data

Figure 3.1: Log file processing.

Ratio between HTTP and HTTPS Traffic Script: A script that periodically summarizes the share of HTTPS connections compared to HTTP connections. Information is stored in one hour batches.

3.2.2 Identifying Mobile and Stationary Devices Identifying mobile and stationary user devices based only on passive HTTPS information is non-trivial since the necessary information, the user-agent string, is sent in the encrypted part of the session establishment. In this thesis we leverage the fact that in typical web sessions a client, even when only visiting a single website, issue requests to many different servers; some accessed with HTTP and some with HTTPS. For HTTPS session classification the script therefore temporarily (in a 5 minute rolling window, not written to disk) keeps track of IP- to-user-agent mappings of observed HTTP sessions and compare the IP address of all new HTTPS sessions with recent HTTP sessions. Matches are assumed to originate from the same client and HTTPS sessions are classified as either mobile or stationary based on the corre- sponding user-agent string.

3.2.3 Local University of Calgary Servers To avoid analysis results skewed towards the behavior of servers local to the University of Calgary we decided to filter all such sessions based on IP prefix and focus on the traffic between local clients and remote servers (i.e., servers located outside the campus).

3.3 Analyzing the Data

Each Zeek script produces log files stored and compressed in intervals, thus producing many separate log files to encompass the whole time period. In order to analyze the data we were first required to process all the separate log files to compile one data file for each log-file type. This process is illustrated by Figure 3.1. After the aggregation process, analyzing scripts could be run on each data file individually as well as together for cross-data file analysis.

21 4 Results

In this chapter we describe the results from our analysis. Following a summary of our dataset, the chapter is divided into one section for each trust relationship, and ends with a section summarizing the observed overall HTTPS session quality. For each relationship we describe its relevance for the security of HTTPS, and present the results of our analysis. An important distinction that we sometimes make is whether we analyzed the distribu- tion of unique certificates or observed certificates. The difference is that the former refers to a distribution of each individual certificate regardless of how many times it has been observed. The latter refers to the full distribution of observed certificates and the result is consequently weighted towards the set of certificates observed more frequently.

4.1 Summary of Dataset

For this thesis we gathered data during a one week period (Oct. 11-17, 2015). Table 4.1 sum- marizes our datasets, broken down on a per-session and per-certificate basis. In total, we observed 232,640,189 HTTPS sessions. Of these sessions, 157,225,583 (67.58%) contained cer- tificates, while the rest (32.42%) were session resumptions using the abbreviated TLS hand- shake to resume a previously established HTTPS session (see section 2.3.3).

Table 4.1: Dataset overview. Sessions Observed Share Observed Mobile Observed Stationary With Certificates 157,225,583 (67.58%) 32,877,565 (70.08%) 74,432,271 (67.94%) Resumption 75,414,606 (32.42%) 14,036,068 (29.92%) 35,117,577 (32.06%) Total 232,640,189 46,913,633 109,549,848 Certificates Unique Observed Leaf 66,912 (98.89%) 319,612,494 (57.86%) Authorities 750 (1.11%) 232,774,694 (42.14%) Total 67,664 552,387,188

We further managed to identify 46,913,633 sessions from clients using mobile devices and 109,549,848 sessions from stationary. These individual subsets showed a similar ratio between sessions with certificates and resumptions. For the mobile sessions there were 32,877,565

22 4.2. Ratio between HTTP and HTTPS

Figure 4.1: Number of established sessions plotted over time. Shows total number of sessions as well as the subsets of sessions using HTTP and HTTPS.

(70.08%) and for the stationary sessions there were 74,432,271 (67.94%) that contained certifi- cates. In total, across all sessions, we observed 67,664 unique certificates. Together these cer- tificates were observed a total of 552,387,188 times, with the majority of sessions sending multiple certificates in the respective certificate chains. Of these, 750 (1.11%) where author- ity certificates, while the remaining 66,912 (98.89%) where leaf certificates. In contrast, the smaller set of authority certificates were observed in total 232,774,694 (42.14%) times while the much larger share of leaf certificates were observed 319,612,494 (57.83%) times. The skew in shares between unique and total observed certificates is due to the fact that many leaf certificates are signed by the same authority certificate.

4.2 Ratio between HTTP and HTTPS

While this thesis is primarily focused on HTTPS, an interesting aspect to look at is the ratio of sessions using HTTPS compared to HTTP. Inspecting the established connections also allows us to confirm whether the collected data seems plausible and thereby conclude that the data gathering process was successful. For this reason we recorded each established session with a timestamp. The Figure 4.1 shows the number of established sessions as well as the subsets of sessions using HTTP and HTTPS plotted over time. In our dataset, Oct 11 and Oct 17 were Sundays and Oct 12 a statutory holiday. With the exception of Oct 15, our data shows one spike for each day. For Oct 11, 12, 15 and 17 the ratio of HTTP and HTTPS sessions are very similar. For Oct 13, 14, and 16 the ratio of HTTPS sessions are higher than the ratio HTTP sessions.

23 4.3. Trust in Browsers

Table 4.2: Browser share. Name Observed Share Chrome 178,042,643 (51.48%) Safari 77,990,330 (22.55%) Firefox 65,638,870 (18.98%) Internet Explorer 23,255,519 (6.72%) Opera 890,904 (0.26%) SeaMonkey 29,251 (0.01%) 8,772 (0.00%)

4.3 Trust in Browsers

The browser plays a key role in the HTTPS landscape. It is perhaps the most explicit choice of trust for a user. The browser is responsible for managing the implementation of HTTPS as well as the selection of the CAs that are currently considered trusted. When a new security vulnerability is found it is important that a good browser immediately patches the implemen- tation to protect against it being exploited. For this reason it is important that browsers are kept up-to-date with the latest versions. With this in mind we investigated how up-to-date the browsers in practice actually are. When considering the browser version we looked at the latest officially released stable version and regard a browser to be behind if it does not have the latest security update. We do not take into account Beta or developer versions. The data considered in this analysis is taken from both the observed HTTP sessions as well as the subset of HTTPS sessions where we could identify the user agent string as described in Section 3.2.2.

Browser Distribution: Table 4.2 shows the distribution of different browsers observed in our dataset. The result is not surprisingly very skewed towards a small set of very popular browsers with the common factor of being developed and supported by large corporations. The most popular browser is Google’s Chrome, which is observed in 178,042,643 total ses- sions. This makes up 51.48% of all observed sessions. Chrome is followed by Apple’s Sa- fari, observed in 77,990,330 (22.55%) sessions, and ’s Firefox, observed in 65,638,870 (18.98%) sessions. The fourth most used browser is ’s Internet Explorer, observed in 23,255,519 (6.72%) sessions. We further observed the usage of three minor browsers, Opera, observed in 890,904 (0.26%) sessions, SeaMonkey , observed in 29,251 (0.01%) sessions, and Chromium, observed in 8,772 (0.00%) sessions.

Google Chrome: Table 4.3 shows the distribution of the top ten most observed Chrome browser versions sorted by number of updates behind. In October 2015, the latest stable ver- sion of Chrome was Chrome/46.0.2590. This was observed in 39,889,597 (22.40%) of sessions and is the second largest share. The most used version was Chrome/45.0.2454 observed in 115,154,160 (64.68%) sessions. This represents a browser that was one update behind the latest version. The third largest share, Chrome/42.0.2311, is observed in 7,414,204 (4.16%) sessions and is four updates behind. Two updates, Chrome/44.0.2403, and three updates, Chrome/42.0.2311, behind we observe in 3,240,668 (1.82%) and 7,414,204 (4.16%) sessions respectively. The oldest version seen in the dataset was Chrome/0.2.149, observed in 680 (0.00038%) sessions.

Apple Safari: Table 4.4 shows the distribution of the top ten most observed Safari browser versions sorted by number of updates behind. The latest version of Safari in October 2015 was Safari/9.0.1, which is observed in 486,346 (0.62%) sessions and is not located in the top ten. The most used version is Safari/9.0. This was one update behind and was observed in 32,378,275 (41.52%) sessions. The second largest share, Safari/8.0, is ten updates behind

24 4.3. Trust in Browsers

Table 4.3: Chrome version distribution. Table 4.4: Safari version distribution. Name Observed Share Behind Name Observed Share Behind Chrome/46.0.2490 39,889,597 (22.40%) 0 Safari/9.0 32,378,275 (41.52%) 1 Chrome/45.0.2454 115,154,160 (64.68%) 1 Safari/8.0.8 8,080,378 (10.36%) 2 Chrome/44.0.2403 3,240,668 (1.82%) 2 Safari/8.0.7 3,954,857 (5.07%) 3 Chrome/43.0.2357 1,951,253 (1.10%) 3 Safari/8.0.6 1,518,559 (1.95%) 4 Chrome/42.0.2311 7,414,204 (4.16%) 4 Safari/8.0.5 1,993,624 (2.56%) 5 Chrome/41.0.2272 859,049 (0.48%) 5 Safari/8.0.3 1,488,709 (1.91%) 7 Chrome/39.0.2171 913,737 (0.51%) 7 Safari/8.0 8,786,587 (11.27%) 10 Chrome/38.0.2125 1,662,113 (0.93%) 8 Safari/7.1.8 1,698,240 (2.18%) 11 Chrome/34.0.1847 1,326,076 (0.74%) 12 Safari/4.0 3,735,701 (4.79%) 17 Chrome/31.0.1650 1,672,425 (0.94%) 15 Safari/7.0 1,941,555 (2.49%) 26

Table 4.5: Firefox version distribution. Table 4.6: Internet Explorer version distribution. Name Observed Share Updates Behind Name Observed Share Updates Behind Firefox/41.0 43,911,471 (66.90%) 2 MSIE 11.0 511,508 (2.20%) 0 Firefox/40.0 11,983,891 (18.26%) 5 MSIE 10.0 10,645,456 (45.76%) 1 Firefox/39.0 912,329 (1.39%) 7 MSIE 9.0 3,296,946 (14.17%) 2 Firefox/38.0 2,136,386 (3.25%) 26 MSIE 8.0 1,907,360 (8.20%) 3 Firefox/37.0 290,093 (0.44%) 29 MSIE 7.0 5,063,839 (21.77%) 4 Firefox/36.0 403,193 (0.61%) 34 MSIE 6.0 1,682,342 (7.23%) 8 Firefox/34.0 2,453,535 (3.74%) 38 MSIE 5.5 6,625 (0.03%) 9 Firefox/33.0 266,921 (0.41%) 44 MSIE 5.0 141,241 (0.61%) 11 Firefox/22.0 338,179 (0.52%) 87 MSIE 4.0 5,279 (0.02%) 13 Firefox/12.0 395,862 (0.60%) 121

and is observed in 8,786,587 (11.27%) sessions. Two updates, Safari/8.0.8, and three updates, Safari/8.0.7, behind we observe in 8,080,378 (10.36%) and 3,954,857 (5.07%) sessions respec- tively. The oldest version seen in the dataset was Safari/1.0, observed in 35 (0.000045%) sessions.

Mozilla Firefox: Table 4.5 shows the distribution of the top ten most observed Firefox browser versions sorted by number of updates behind. In October 2015 the current release of Firefox was Firefox/41.0.2. This version was rarely observed (only 680 sessions), and was not located in the top ten. The most used version was Firefox/41.0 that was two updates behind and was observed in 43,911,471 (66.90%) sessions. The second most used version was Firefox/40.0, which is five updates behind and was observed in 11,983,891 (18.26%) sessions. Following the two largest shares there were even older versions that were seven and more updates behind. The oldest version seen in the dataset was Firefox/0.8, observed in 1604 (0.0024%) sessions. A difference between Firefox and the other browsers is the frequency in which security updates are released. Firefox releases a considerable larger number of security updates com- pared to the others. This is reflected in the overall higher number of updates behind the version share are for the Firefox browser.

Microsoft Internet Explorer: Table 4.6 shows the distribution of the top ten most observed Internet Explorer browser versions sorted by number of updates behind. The latest version of Internet Explorer available in October 2015 was MSIE 11.0. This was observed in 511,508 (2.20%) sessions and had the sixth largest share. The most used version was MSIE 10.0, ob- served in 10.645.456 (45.76%) sessions and was one version behind. The second most used version was MSIE 7.0, observed in 5.063.839 (21.77%) sessions, which was four versions be- hind. Two updates, MSIE 9.0, and three updates, MSIE 8.0, behind we observed in 3.296.946

25 4.4. Trust in Certificate Authorities

Table 4.7: Top 10 organizations sign- ing leaf certificates. Table 4.8: Top-six certificates authorities signing EV Organization Share certificates. Comodo CA Limited 22.94% Go Daddy 18.08% Name Unique Share Observed Share GeoTrust 16.27% Symantec Corporation 757 (22.88%) 1,125,757 (56.15%) DigiCert 9.39% DigiCert 496 (14.99%) 552,585 (27.56%) GlobalSign 6.83% Go Daddy 429 (12.97%) 68,742 (3.43%) Symantec Corporation 5.17% GeoTrust 437 (13.21%) 63,948 (3.19%) 3.46% Entrust 142 (4.29%) 54,514 (2.72%) Entrust 2.14% Comodo CA Limited 518 (15.66%) 53,724 (2.68%) Internet2 1.80% Verizon 1.75%

(14.17%) and 1,907,360 (8.20%) sessions respectively. The oldest version seen in the dataset was MSIE 3.0, observed in 313 (0.0013%) sessions. In contrast to the other browsers the user agent string for Internet Explorer provides no information about individual updates, but only the main version. An implication of this is that it obscures how out of date a user’s browser is and compared to the Firefox browser, this is reflected in the lower number of updates behind.

4.4 Trust in Certificate Authorities

Another cornerstone of secure SSL/TLS communication is the set of CAs entrusted to is- sue certificates. With a few exceptions, any organization with control of a signing certificate can issue certificates for any domain. If even a single signing certificate is compromised the whole system therefore becomes vulnerable [4]. To meet increasing market demands many identity checks have become less stringent over time. Extended Validation (EV) certificates were introduced to help restore the resulting waning user trust in certificates. EV certificates are intended to follow stricter issuing criteria needed by organizations where secure commu- nication is essential. For these reasons we investigated the certificate landscape to see how good the security of certificates are in practice. We looked at the distribution of certificate authorities, the cryp- tography in certificates, certificate validity durations, multiple domains and other certificates extensions. Furthermore, we validated the certificate chains in all observed sessions using the Mozilla root store.

Certificate Authorities Distribution: Although our dataset includes 750 authority certifi- cates, we find that the vast majority of non-self-signed leaf certificates are issued by only a handful of organizations. Table 4.7 shows the share of the top-ten organizations signing leaf certificates that we could identify. The top-three organizations are Comodo CA Limited with 22.94%, Go Daddy with 18.08%, and GeoTrust with 16.27%. These are followed by DigiCert with 9.39%, GlobalSign with 6.83%, Symantec Corporation with 5.17%, Thawte with 3.46%, Entrust with 2.14%, Internet2 with 1.8%, and Verizon with 1.75%.

Extended Validation Certificates: Of all observed certificates 4.29% are EV certificates and those are used in 6.27% of all observed sessions. Table 4.8 shows the top-six organizations signing EV certificates, considering both certificates on an individual basis and observed share in sessions. Of all observed unique EV certificates Symantec Corporation signed 22.88%, Comodo CA Limited 15.66%, DigiCert 14.99%, GeoTrust 13.21%, Go Daddy 12.97%, and Entrust 4.29%. In comparison, the usage share of EV certificates in observed sessions

26 4.4. Trust in Certificate Authorities

Table 4.9: Certificate signature algorithms grouped on type. Authority certificates Leaf certificates Signature Algorithm Unique Share Observed Share Unique Share Observed Share SHA256 (RSA) 321 (42.80%) 178,706,984 (76.77%) 48,348 (72.26%) 242,072,990 (75.74%) SHA1 (RSA) 377 (50.27%) 53,253,477 (22.88%) 16,686 (24.94%) 77,159,924 (24.14%) SHA256 (ECDSA) 1,738 (2.60%) 190,692 (0.06%) SHA1 (DSA) 5 (0.01%) 116,610 (0.04%) MD5 (RSA) 10 (1.33%) 1,695 (7 ¨ 10´6%) 97 (0.14%) 65,274 (0.02%) SHA384 (RSA) 34 (4.53%) 594,770 (0.26%) 6 (0.01%) 928 (3 ¨ 10´6%) SHA384 (ECDSA) 5 (0.67%) 190,268 (0.08%) 3 (4 ¨ 10´5%) 22 (7 ¨ 10´8%) SHA512 (RSA) 1 (0.13%) 27,416 (0.01%) 21 (0.03%) 5,228 (1 ¨ 10´5%) SHA1 (ECDSA) 2 (0.27%) 84 (4 ¨ 10´7%) 3 (4 ¨ 10´5%) 110 (3 ¨ 10´7%)

are as follows: Symantec Corporation 56.15%, DigiCert 27.56%, Go Daddy 3.43%, GeoTrust 2.72%, and Comodo CA Limited 2.68%.

4.4.1 Certificate Cryptography Every certificate is cryptographically signed by an authority certificate directly above it in the certificate chain. The signature value and the cryptographic algorithm type is stored in each certificate and is referred to as the certificate signature algorithm. It is critical to the security of the whole PKI that unbroken cryptographic algorithms, not susceptible to known weaknesses, are used. The danger here is that someone with malicious intent can create valid rouge certificates. If someone manages to create a rouge authority certificate trusted by a browser then they can sign any number of certificates that will all validate as trusted. This is illustrated by Sotirov et al. [30]. Every certificate also contains a public key value and the cryptographic algorithm used to create it. This cryptographic algorithm is referred to as the certificate public key algorithm. The public key value is used to perform a key exchange. It is important that an unbroken al- gorithm and a large enough key is used, because otherwise the communication is susceptible to MITM attacks, as discussed in Section 2.2.5. Certificates can be issued to be valid for different time periods, with the validity period being a potential security factor. Due to continuous advances in computer and cryptographic technologies, certificates valid for an extended time period can quickly become viable targets for attack. Using shorter validity periods is therefore a good practice. For these reasons we investigated the cryptographic algorithms and key-sizes used in certificates as well as the certificate validity period durations. For more detailed descriptions about these mechanics see Section 2.4.1.

Certificate Signature Algorithm Results: Table 4.9 provides a breakdown of signature al- gorithms used in observed certificates. For more information about the cryptographic algo- rithms mentioned below and in the table, see Section 2.2.1 about cryptographic hash functions and section 2.2.4 about asymmetric cryptography. We find that the majority of the certificates are signed using SHA256 with RSA encryp- tion. It is used in 42.80% of all unique authority certificates that are observed in 76.77% of sessions. Similarly, it signed 72.26% of all unique leaf certificates and was observed in 75.74% of sessions. The next largest share of certificates were signed using SHA1 with RSA encryption. It was responsible for signing 50.27% of all unique authority certificates observed in 22.88% of sessions, and 24.94% of all unique leaf certificates that were observed in 24.14% of sessions. The remaining authority certificates were signed using SHA384 with RSA encryption (4.53% of the certificates and 0.26% of the sessions), MD5 (1.33% of the certificates and

27 4.4. Trust in Certificate Authorities

Table 4.10: Certificate public key grouped on type and key sizes. Authority Certificates Leaf Certificates Algorithm (Key) Unique Share Observed Share Unique Share Observed Share RSA (2,048-bit) 668 (89.07%) 193,370,575 (83.07%) 58,831 (87.92%) 263,145,871 (82.33%) ECDSA (256-bit) 5 (0.67%) 190,268 (0.08%) 1802 (2.69%) 49,467,671 (15.48%) RSA (1,024-bit) 10 (1.33%) 246,425 (0.11%) 3757 (5.61%) 3,927,601 (1.23%) RSA (4,096-bit) 53 (7.07%) 38,775,992 (16.66%) 2325 (3.47%) 2,695,285 (0.84%) ECDSA (384-bit) 4 (0.53%) 190,552 (0.08%) 4 (0.01%) 28 (9 ¨ 10´8%)

7 ¨ 10´6% of the sessions), SHA384 with ECDSA encryption (0.67% of the certificates and 0.08% of the sessions), SHA1 with ECDSA encryption (0.27% of the certificates and 7 ¨ 10´7% of the sessions), and SHA512 with RSA encryption (0.13% of the certificates and 0.01% of the sessions). The remaining leaf certificates were signed using SHA256 with ECDSA encryption (2.60% of the certificates and 0.06% of the sessions), MD5 (0.14% of the certificates and 0.02% of the sessions), SHA512 with RSA encryption (0.03% of the certificates and 1 ¨ 10´5% of the ses- sions), SHA384 with RSA encryption (0.01% of the certificates and 3 ¨ 10´6% of the sessions), SHA1 with DSA encryption (0.01% of the certificates and 0.04% of the sessions), SHA384 with ECDSA encryption (4 ¨ 10´5% of the certificates and 7 ¨ 10´8% of the sessions), and SHA1 with ECDSA encryption (4 ¨ 10´5% of the certificates and 3 ¨ 10´7% of the sessions).

Certificate Public Key Algorithm Results: Table 4.10 provides a breakdown of public key algorithms used with the observed certificates. For more information about the cryptographic algorithms mentioned see Section 2.2.4 about asymmetric cryptography. The vast majority of all observed certificates used RSA with a 2048-bit key. 89.07% of unique authority certificates (observed in 83.07% of the sessions), 87.92% of unique leaf cer- tificates (observed in 82.33% of the sessions). ECDSA with 256-bit key, which is similar in cryptographic strength to RSA 2048-bit, made up most of the remaining share of leaf certificates in sessions but only a fraction of the au- thority certificates: 0.69% of unique authority certificates (observed in 0.08% of the sessions), and 2.69% of unique leaf certificates (observed in 15.48% of the sessions). RSA with 4096-bit key and ECDSA with 384-bit key are the cryptographically strongest public key type we observed. RSA 4096-bit made up a significant share of authority cer- tificates but only a small number of leaf certificates: 7.07% of unique authority certificates (observed in 16.66% of the sessions), and 3.47% of unique leaf certificates (observed in 0.84% of the sessions). ECDSA 384-bit however, was only observed in a minimal number of certifi- cates: 0.53% of unique authority certificates (observed in 0.08% of the sessions), and 0.01% of unique leaf certificates (observed in 9 ¨ 10´8% of the sessions). In addition, we observed a number of certificates using RSA with a cryptographically weak key size 1024-bit: 1.33% of unique authority certificates (observed in 0.11% of the ses- sions), and 5.61% of unique leaf certificates (observed in 1.23% of the sessions).

Certificate Validity Period Durations Result: Figure 4.2 shows the cumulative distribution function (CDF) of the validity periods lengths of observed leaf and authority certificates. Typically, authority certificates have longer validity periods (e.g., 4, 10, and 15 years) than leaf certificates (e.g., 1, 2, or 3 years), but we also identified certificates with validity periods of up to 37 years.

4.4.2 Path-Length Constraints An authority certificate can specify how many further signing certificates may appear below it in a certificate chain by specifying a path length constraint on certificates that it signs [9].

28 4.4. Trust in Certificate Authorities

Figure 4.2: Cumulative distribution function (CDF) of certificate validity period lengths.

This is useful for preventing an intermediate authority from further delegating the ability to issue certificates. The maximum path length of a certificate is decremented for each non-self- issued certificate in the path. It limits the length of a potential certificate chain and the trust delegation that is possible. Specifying a path length provides an extra measure of protection from misuse and helping to mitigate mistakes like issuing authority certificates instead of leaf certificates. A mistake of this kind happened in 2013 [7] and allowed fraudulently-issued certificates for Google domains . Despite providing the extra measure of protection, we observed that 26.7% of all authority certificates did not specify any path length constraints.

4.4.3 Wildcard and Multi-domain Certificates Wildcard certificates are certificates that are valid for all subdomains of a domain. For example a certificate valid for *.domain.com is valid for both subone.domain.com and subtwo.domain.com. While the wildcard feature is convenient for administrators, it also poses the risk of validating rogue or buggy hosts [27]. Another feature that allows multiple do- main names to be specified for certificate is the Subject Alternative Name (SAN) extension. While not inherently a security concern, having one single certificate for multiple different domains increases the attack surface since the private key for a certificate must be stored on each physical machine. For these reasons we investigate the usage of wildcard and multiple domain names in our data set. Furthermore we looked at domain name validation of observed sessions.

29 4.4. Trust in Certificate Authorities

Figure 4.3: Complementary cumulative distribution function (CCDF) of number domain names per certificate.

Wildcard and Multi-Domain Results: In our dataset, 35.68% of all unique certificates used the wildcard feature and these certificates where observed in 71.62% of the sessions. We also observed significant usage of certificates valid for multiple domains. Figure 4.3 shows the complementary cumulative distribution function (CCDF) of the number of do- mains per certificate. The data shows that 67.44% of all unique certificates are valid for at least 2 domains, 18.33% for at least 3 domains, 9.16% for at least 10 domains, 5.85% for at least 25 domains, 2.02% for at least 50 domains, and 0.31% for at least 100 domains. In all of the sessions we observed 42.27% with at least 2 domains, 28.56% with at least 3 domains, 19.03% with at least 10 domains, 11.32% with at least 25 domains, 8.98% with at least 50 domains, and 0.31% with at least 100 domains. The certificate we observed with the highest number of domains is a certificate valid for 866 domains.

Domain Matches Results: In every session the domain name of the server is validated by checking it against the domain name(s) specified in the subject field of a certificate. To get insight on how strict or lenient the rules for this process are in practice, we made our own comparison for all certificates that used the Server Name Indication (SNI) extension. SNI is an extension to the TLS protocol by which the client indicates which server name it is attempting to connect to. The results is presented in Table 4.11. For each session we checked what type of domain name match occurred. The results are presented using the following match types:

• Exact Match: The server domain name and the certificate domain name are an exact match. (e.g., Server: domain.com is equal to Certificate: domain.com)

30 4.4. Trust in Certificate Authorities

Table 4.11: Domain name validation. Domain Name Match Observed Share Observed Mobile Observed Stationary Wildcard Match 97031242 (41.71%) 20876537 (44.50%) 46620034 (42.56%) Not Enough Data 86708210 (37.27%) 16681076 (35.56%) 39690223 (36.23%) Exact Match 48178629 (20.71%) 9234995 (19.69%) 22936786 (20.94%) No Match 620629 (0.27%) 105741 (0.23%) 255433 (0.23%) WWW Mismatch 77785 (0.03%) 10455 (0.02%) 39953 (0.04%) Relaxed Wildcard Match 23694 (0.01%) 4829 (0.01%) 7419 (0.01%)

Table 4.12: Certificate validation. Name Total Share Mobile Share Stationary Share Certificate Valid 146683813 (94.80%) 31691187 (96.39%) 71040654 (95.44%) Certificate Invalid 6502478 (4.20%) 814909 (2.48%) 2738200 (3.68%) Self-signed Certificate in Chain 1035914 (0.67%) 252721 (0.77%) 430463 (0.58%) Self-signed Certificate 479025 (0.31%) 113572 (0.35%) 208359 (0.28%) Certificate has Expired 21819 (0.014%) 3903 (0.012%) 12089 (0.016%) Certificate is not yet Valid 4537 (0.003%) 1270 (0.004%) 2504 (0.003%)

• No Match: The server domain name and the certificate domain name do not match.

• Wildcard Match: The server domain name matches a wildcard certificate. (e.g., Server: sub.domain.com is equal to Certificate: *.domain.com)

• WWW Mismatch: The server domain name uses "www" annotation while the certifi- cate doesn’t or vice versa. (e.g., Server: www.domain.com is equal to Certificate: do- main.com or Server: domain.com is equal to Certificate: www.domain.com)

• Relaxed Wildcard Match: The same as Wildcard match but the * matches multiple sub domains. (e.g., Server: sub.sub.domain.com is equal to Certificate: *.domain.com)

• Not Enough Data: Not enough data to compare the server and certificate domain names.

In all observed sessions, only 20.71% where matched exactly: slightly less (19.69%) for mobile sessions and slightly more (20.94%) for Stationary sessions. The most common match was Wildcard Match with 41.71% of all sessions, 44.50% for mobile, and 42.56% for stationary. Disregarding the sessions lacking enough data, we observed three small shares of No Match, WWW Mismatch, and Relaxed Wildcard Match. We observed No Matches in 0.27% of all ses- sions, 0.23% of the mobile sessions and 0.23% of the stationary sessions. The corresponding percentages for WWW Mismatches were 0.03% (all sessions), 0.02% (mobile), and 0.04% (sta- tionary). Finally, the corresponding percentages for Relaxed Wildcard Matches were 0.01% (all sessions), 0.01% (mobile), and 0.01% (stationary).

4.4.4 Certificate Validation Using the Mozilla Root store, we validated the certificates chains during each non-resumption HTTPS session. The results are presented in Table 4.12. While most sessions were validated successfully (94.80%), a non-negligible share (4.2%) were not. Furthermore, about 1% of ses- sions contained self-signed certificates and in a few cases the certificate was not in its validity period. A reason for the invalid sessions may be that the Mozilla Root store specifically doesn’t trust those certificates but the results are significant nonetheless.

31 4.5. Trust in Protocol Version and Cipher Suite Selection

4.5 Trust in Protocol Version and Cipher Suite Selection

The security of an HTTPS session heavily depends on which TLS/SSL version and crypto- graphic algorithms are used. These details are negotiated during the TLS handshake where the browser, taking the role as client, first informs the server about the highest protocol version it supports and sends a list of supported ciphers suites ordered by preference. It is then the server’s responsibility to determine the final protocol version and select a cipher suite from this list. In this section we will present the result of protocol versions, cipher suites, key exchange ciphers, encryption ciphers, and MACs, as well as measurement of cipher suite list sizes, server cipher suite downgrades, and RC4 encryption cipher list position when chosen.

4.5.1 TLS/SSL Protocol Version The choice of protocol version is important since older versions lack or have less support for sufficiently secure ciphers and extensions, as well as a reduced amount of countermeasures against known attacks. The oldest protocol versions SSLv2 and SSLv3 are deprecated due to being flawed and broken. Using the latest protocol version should be the norm for every connection. With this in mind we analyzed the protocol version usage in our data. We investigated both the distribution of the protocol versions the client offered and the protocol versions the servers chose for the sessions. Analyzing how these distributions differ will give insight in whether ether group is lagging behind in upgrading protocol support. Using our mobile and stationary subsets we also checked whether these differed in a significant way. In this measurement we focused on the standard protocols SSLv2, SSLv3, TLSv10, TLSv11 and TLSv12. Our dataset did reveal the use of a variety of other non-standard protocols, although their shares were very small.

Protocol Version Results: The protocol version results are displayed in Figure 4.4. The graph presents the data as a clustered histogram where each cluster shows the observed shares of each measurement for that protocol. For each protocol we show how frequently the client offered it, what the server chose for the whole dataset as well as the mobile and sta- tionary subsets. The Y-axis is represented with a log-scale for better presentation of the small measurements. A table with the data points for the figure can be found in the Appendix. The dominant protocol version across all data sets is TLSv12 with 82.72% clients offering it and 80.76% servers choosing it for their sessions. The mobile subset shows higher shares of 87.32% clients offering it and 83.42% servers choosing it. The stationary subset shares lands in between with 85.26% clients offering it and 81.74% servers choosing it. The second newest protocol version TLSv11 however has comparatively little use across all data sets. In total, only 0.30% clients offered it and 0.32% choose it. The mobile and server subsets shows 0.28% and 0.25% clients offering it, respectively, with 0.31% and 0.28% servers choosing it. The second most used protocol is TLSv10 with 16.13% of the clients offering it and 18.84% of the servers choosing it for their sessions. In contrast to TLSv12, the mobile subset shows lower values: 12.20% of the clients offering it and 16.24% of the servers choosing it. The stationary subset shows 14.02% of the clients offering it and 17.93% of the servers choosing it. SSLv3 has a relatively meager presence with just 0.27% of the clients offering it and 0.08% of the servers choosing it. The mobile and stationary subsets both show even smaller shares: 0.03% and 0.05% of the clients offering it, respectively, with 0.03% and 0.05% of the servers choosing it. Likewise the even older protocol SSLv2 shows a meager, although slightly larger, share of 0.59% of the clients offering it, but with less than 0.01% of the servers choosing it. The mobile

32 4.5. Trust in Protocol Version and Cipher Suite Selection

Figure 4.4: Clustered histogram showing the relation between offered and used protocol ver- sions. Each cluster shows the observed shares for the total number of sessions as well as the mobile and stationary subsets. The Y-axis is represented with a log-scale for better presenta- tion of the small measurements. and stationary subsets shows the same pattern: a small share 0.02% and 0.43% of the clients offering it, respectively, with less than 0.01% of the servers choosing it.

4.5.2 Cipher Suite Selection The Cipher Suite is a name combination of the key exchange, encryption, and Message Au- thentication Code (MAC) hash algorithms selected in the negotiation during the TLS hand- shake, as discussed Section 2.3.3. The selection of a good cipher suite is important because it will determine the overall cryptographic strength of the security for a connection. A cipher suite may not be equally strong in all aspects. For example a cipher suite could have a very strong encryption and MAC, while the key exchange algorithm is flawed. In these situations a flaw in one of the ciphers could be enough to compromise a connection’s security. This is why it is important that each individual cipher included in a cipher suite is secure. In our study we obtained statistics on both the cipher suites most frequently offered by the clients as well as which cipher suites in the end were selected for the sessions by the servers. We further investigated the number of cipher suites offered by clients in each session and how frequently the servers chose to downgrade by not picking the clients top candidate. To gain additional insight in the cipher suite selection process, we also made a deeper investigation of the cases when the RC4 cipher, which is considered broken[24], appeared in our dataset.

33 4.5. Trust in Protocol Version and Cipher Suite Selection

Figure 4.5: Top-15 selected encryption ciphers by the server in the cipher suite selection pro- cess. Each cluster shows a breakdown of the observed frequency for four subsets: sessions using mobile devices, stationary devices, protocol TLSv10, and protocol TLSv12.

Server Selected Cipher Suite Results: The server selected cipher suite results are displayed in Figure 4.5. The graph presents the top-15 selected cipher suites as a horizontal clustered histogram where each cluster shows the result for the total dataset as well as four differ- ent subsets: mobile client sessions, stationary client sessions, sessions using protocol version TLSv10, and sessions using protocol version TLSv12. A table with data points for the figure can be found in the Appendix. The most prevalent cipher suite used in all sessions with a 25.17% share is TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256. This cipher suite uses ECDHE+RSA for key exchange and authentication, AES-128 in GCM mode for encryption, and SHA-256 for MAC. The mobile subset share is slightly lower (25.03%), while the stationary is higher (27.16%). The subset of TLSv10 sessions is minimal (0.00%) due to AES with GCM mode not being supported by that protocol version. In contrast TLSv12 is the highest share (31.17%). The second largest share of 14.77% is more or less the same cipher suite, with the only difference being in the key exchange algorithm ECDHE_ECDSA. This is for certificates that use an ECDSA key instead of an RSA key. This cipher suite is used slightly more frequently (15.16%) for mobile devices and lower (14.22%) for stationary. For TLSv10 the case is the same, and TLSv12 is still the highest category with a 18.29% share. The third most prevalent cipher suite is TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384 with a 10.00% total share, a 10.48% share for mobile sessions, a 9.42% share for stationary sessions, again a minimal 0.02% for TLSv10 sessions, and a 12.38% share for TLSv12 sessions. Looking at cipher suite usage for TLSv10, the most prevalent is TLS_ECDHE_RSA_WITH_AES_256_ CBC_SHA with a 30.05% share, which has a share of 8.43% across all sessions (6.33% for mo- bile, and 8.10% for stationary sessions), and 3.33% for TLSv12. These share are followed by TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA with a 25.92% TLSv10 share, a total share of 8.91% of the sessions (9.99% mobile and 8.62% stationary), and a 4.86% share for TLSv12.

34 4.5. Trust in Protocol Version and Cipher Suite Selection

Figure 4.6: Top-15 offered encryption ciphers by the client in the cipher suite selection process. Each cluster shows a breakdown of the observed frequency for four subsets: sessions using mobile devices, stationary devices, protocol TLSv10, and protocol TLSv12.

The third largest share for TLSv10 is TLS_RSA_WITH_RC4_128_MD5 with a 9.50% share, a total share of 5.89% (7.43% mobile and 5.38% stationary), and a 3.33% share for TLSv12. The remaining cipher suites in the top-15 share are all below 5%.

Client Offered Cipher Suites Results: The client offered cipher suite results are displayed in Figure 4.6. The graph presents the top-15 offered cipher suites as a horizontal clustered histogram where each cluster shows the results for the total dataset as well as four differ- ent subsets: mobile client sessions, stationary client sessions, sessions using protocol version TLSv10, and sessions using protocol version TLSv12. Each subset of client offered Cipher Suites is compared to the corresponding subset of used Cipher Suites. The data shows more than 100% for the TLSv12 subset due to clients offering TLSv12 more often than servers choos- ing to use it. A table with data points for the figure can be found in the Appendix. Both TLS_RSA_WITH_AES_128_CBC_SHA and TLS_RSA_WITH_AES_256_CBC_SHA, which only have different encryption ciphers AES_128_CBC and AES_256_CBC, are offered in close to 100% of all sessions, with 99.97% and 99.82% respectively for the total dataset. The mobile and stationary subsets show similar values of 99.20% and 99.65% as well as 98.87% and 99.35% respectively. The subset of TLSv10 sessions is significantly lower at 78.46% and 77.30% respectively, while the TLSv12 subset is higher at 104.56% and 104.54%. The TLSv12 numbers are above 100% because TLSv12 is offered more frequently than it is chosen. The third and fourth offered cipher suites have the same encryption and MAC ciphers as the first two but with a different key exchange cipher TLS_ECDHE_RSA. In total this pair of cipher suites are offered in 98.62% and 98.24% of all sessions respectively (98.49% and 97.73% for mobile sessions, and 98.47% and 98.22% for stationary sessions). In the TLSv10 subset the frequency is significantly lower at 72.90% and 72.61% respectively, while in the TLSv12 subset it is higher at 104.59% and 104.20%.

35 4.5. Trust in Protocol Version and Cipher Suite Selection

Similarly, the fifth and sixth most frequently offered cipher suites have the same en- cryption and MAC ciphers as the first two pairs but with a different key exchange cipher TLS_ECDHE_ECDSA. In total this pair of cipher suites are offered in 97.33% and 97.27% of all sessions respectively (97.81% and 97.79% for the mobile sessions, and 97.51% and 97.45% for stationary sessions). In the subset of TLSv10 sessions these cipher suites are again offered with significantly lower frequencies of 67.57% and 67.30% respectively, while in the TLSv12 subset the frequency is higher at 104.24% and 104.23%. TLS_RSA_WITH_3DES_EDE_CBC_SHA is offered in 91.56% of all sessions. Slightly lower share of 86.84% for mobile sessions and higher share of 92.26% for stationary sessions. In the subset of TLSv10 sessions the frequency is again much lower at 71.61% and in TLSv12 sessions it is higher at 95.44%. The eighth, TLS_DHE_RSA_WITH_AES_128_CBC_SHA, and ninth, TLS_DHE_RSA_WITH_AES_256_ CBC_SHA, most frequently offered cipher suites are again a pair where only the encryption cipher differs, between AES_128_CBC and AES_256_CBC. In total this pair of cipher suites are offered in 70.64% and 70.38% of all sessions respectively (71.58% and 71.32% of mobile sessions, and 71.12% and 70.89% for stationary sessions). In the subset of TLSv10 sessions the frequency is significantly lower at only 48.25% and 47.24% respectively, while the frequency for TLSv12 sessions is again higher at 75.18% and 75.16%. The TLS_EMPTY_RENEGOTIATION_INFO_SCSV is the tenth most frequently offered ci- pher suite. It does not specify any specific ciphers, but is part of the TLS Renegotiation Indi- cation Extension [26]. It is offered in 69.08% of all sessions, in significantly higher (81.75%) frequency of mobile sessions, lower (64.44%) for stationary sessions, in 44.86% of TLSv10 sessions, and in 74.14% of TLSv12 sessions. The eleventh, TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, twelfth, TLS_ECDHE_RSA_ WITH_AES_128_GCM_SHA256, and fourteenth, TLS_RSA_WITH_AES_128_GCM_SHA256, most frequently offered cipher suites are the same encryption and MAC ciphers, AES_128_GCM_SHA256, but with different key exchange ciphers: TLS_ECDHE_ECDSA, TLS_ECDHE_RSA, and TLS_RSA. This group of ciphers suites are offered in 68.89%, 62.62%, and 61.50% of all sessions respectively. The mobile subset differs very little with share of 67.34%, 64.10%, and 63.61% respectively. The case is the same for the stationary subset with shares of 70.76%, 63.74%, and 61.37%. In TLSv10 sessions these cipher suites are unsur- prisingly offered in only 5.25%, 5.25%, and 5.16% of the sessions, since AES in GCM mode isn’t supported for TLSv10. In contrast, the TLSv12 subset shows much higher frequency of 84.07%, 76.30%, and 74.94%. The last two cipher suites in the top-15 most frequently offered are thirteenth TLS_RSA_WITH_RC4_128_SHA and fifteenth TLS_RSA_WITH_RC4_128_MD5. These have the same key exchange and encryption ciphers TLS_RSA_WITH_RC4_128, but differs on MAC SHA and MD5. In total this pair of cipher suites are offered in 61.62% and 54.59% of all sessions respectively. In the mobile subset the frequencies are higher at 69.49% and 58.71% respectively, while in the stationary subset the frequencies are lower at 57.53% and 51.35% respectively. These cipher suites are most frequently offered in the TLSv10 subset at shares of 71.85% and 70.00% respectively. But less frequently in the TLSv12 subset with shares of 58.39% and 50.12% respectively.

4.5.3 Key Exchange Algorithms Results Server Selected Key Exchange Algorithms Results: A breakdown of all key exchange al- gorithms used in our dataset is presented in Table 4.13. Overall the most common key exchange with a 62.11% total share is ECDHE_RSA, where ECCHE is the key exchange algorithm and RSA is used for authentication. The mobile, sta- tionary, and TLSv12 subsets doesn’t deviate significantly with shares of 63.36%, 62.35%, and 63.37% respectively. The TLSv10 subset share is a few percentage points lower at 57.13%.

36 4.5. Trust in Protocol Version and Cipher Suite Selection

Table 4.13: Key exchange algorithms used. # Cipher Total Mobile Stationary TLSv10 TLSv12 1 TLS_ECDHE_RSA 62.11% 63.36% 62.35% 57.13% 63.37% 2 TLS_ECDHE_ECDSA 20.32% 20.77% 19.79% 8.29% 23.11% 3 TLS_RSA 16.31% 14.72% 16.62% 29.61% 13.15% 4 TLS_DHE_RSA 1.25% 1.15% 1.23% 4.97% 0.37% 5 TLS_DHE_DSS 0.0016% 0.0026% 0.0014% 0.0023% 0.0014% 6 TLS_DH_ANON 0.00037% 0.00013% 0.000055% 0.0020% 0.0000005% 7 TLS_RSA_EXPORT 0.00019% 0.000038% 0.000079% 0.00018% < 10´7% 8 TLS_ECDH_ANON 0.000064% 0.000011% 0.0000073% 0.000054% 0.000067% 9 TLS_DHE_RSA_EXPORT 0.000012% < 10´7% < 10´7% 0.000058% < 10´7% 10 SSLv20_CK_RC2_128_CBC 0.0000040% < 10´7% < 10´7% < 10´7% < 10´7% 11 TLS_PSK 0.0000009% 0.0000021% 0.0000009% < 10´7% < 10´7% 12 TLS_ECDH_ECDSA 0.0000009% < 10´7% 0.0000009% < 10´7% 0.0000011% 13 SSLv20_CK_RC4_128 0.0000009% < 10´7% 0.0000018% < 10´7% < 10´7% 14 TLS_NULL 0.0000004% < 10´7% < 10´7% < 10´7% < 10´7% 15 SSL_RSA_FIPS 0.0000004% < 10´7% < 10´7% 0.0000023% < 10´7%

Table 4.14: Top-10 key exchange algorithms offered. # Cipher Total Mobile Stationary TLSv10 TLSv12 1 TLS_RSA 626.55% 651.51% 607.03% 442.41% 665.26% 2 TLS_ECDHE_ECDSA 515.91% 560.64% 502.51% 249.63% 578.79% 3 TLS_ECDHE_RSA 506.27% 555.10% 490.55% 265.63% 563.13% 4 TLS_DHE_RSA 334.38% 343.65% 329.29% 194.05% 364.56% 5 TLS_ECDH_ECDSA 169.97% 205.84% 151.89% 145.56% 175.21% 6 TLS_ECDH_RSA 169.96% 205.85% 151.90% 145.64% 175.19% 7 TLS_DHE_DSS 160.96% 137.96% 154.37% 188.88% 151.02% 8 TLS_DH_RSA 29.43% 37.75% 24.29% 0.41% 36.16% 9 TLS_DH_DSS 29.43% 37.75% 24.29% 0.41% 36.16% 10 TLS_RSA_EXPORT 9.50% 9.71% 8.29% 36.20% 1.71%

The second most common key exchange with a 20.32% total share is ECDHE_ECDSA. Again very similar results for mobile sessions with a 20.77% share and stationary sessions with a 19.79% share. In the TLSv10 subset the share is lower at (8.29%), while in the TLSv12 subset the share is higher at 23.11%. The two first key exchanges make up the majority, more than 82%, of the whole dataset and the third, RSA with a 16.31% total share, make up most of the remaining dataset. A single RSA in the cipher suite means that RSA is used for both key exchange and authentication. In the TLSv10 subset the data spikes to a 29.61% share, which makes up for the lower shares in top two. In the stationary subset the result is very similar to the total with a 16.62% share, while in the mobile subset the share is a lower 14.72%. In TLSv12 sessions the share is even lower at 13.15%. The last key exchange above 1% share is DHE_RSA with a 1.25% total share, 1.15% mobile share, 1.23% stationary share, 4.97% TLSv10 share, and 0.37% TLSv12 share.

Client Offered Key Exchange Algorithms Results: A breakdown of the top-10 offered key exchange algorithms is presented in Table 4.14. A table with the complete set of offered algo- rithms can be found in the Appendix (Table A.4). Each measurement is compared with the corresponding subset in the used category. Furthermore, each occurrence of a key exchange algorithm in the list of offered cipher cipher suites is counted. Consequently, a value of 200% means that the algorithm is offered on average two times per observed session.

37 4.5. Trust in Protocol Version and Cipher Suite Selection

The top-3 most frequently offered key exchange algorithms are offered on average in more than five cipher suites per session. The most frequently offered key exchange algorithm is simple TLS_RSA, which is offered more than six times per session at frequency of 626.55% for the total dataset. More frequently (651.51%) for mobile sessions, while less frequently (607.03%) for stationary sessions. Even less frequently (442.41%) for TLSv10 sessions, but more frequently (665.26%) for TLSv12 ses- sions. The second most frequently offered key exchange algorithm is TLS_ECDHE_ECDSA, which is offered five times per session at a frequency of 515.91% for the total dataset. This algorithm appears more frequently (560.64%) for mobile sessions, while less frequently (502.51%) for stationary sessions. Significantly less frequently (249.63%) for TLSv10 session and more frequently (578.79%) for TLSv12 sessions. The third most frequently offered key exchange algorithm is TLS_ECDHE_RSA, which is offered five times per session at frequency of 506.27% for the total dataset. This algorithm appears more frequently (555.10%) for mobile sessions, while less frequently (490.55%) for stationary sessions. The algorithm further appears significantly less frequently (265.63%) for TLSv10 sessions, while more frequently (563.13%) for TLSv12 sessions. A drop-off from the top-3, the fourth most frequently offered key exchange algorithm TLS_DHE_RSA is offered more than three times per session at frequency of 334.38% for the total dataset. Only a slight divergence for the mobile and stationary subsets with frequen- cies of 343.65% and 329.29% respectively. Significantly less frequently (194.05%) for TLSv10 sessions, while slightly higher (364.56%) for TLSv12 sessions. The drop-off continues to the fifth through seventh most frequently offered key ex- change algorithms. Starting with the fifth most frequently offered key exchange algorithm TLS_ECDH_ECDSA at a frequency of 169.97% for the total dataset. This algorithm sees a higher frequency (205.84%) for mobile sessions, while slightly lower (151.89%) for stationary sessions. In the TLSv10 subset the frequency is 145.56% and in the TLSv12 it is 175.21%. TLS_ECDH_RSA is more or less offered in tandem with TLS_ECDH_ECDSA at a fre- quency of 169.96% for the total dataset, 205.85% for mobile, 151.90% for stationary, 145.64% for TLSv10, and 175.19% for TLSv12. The seventh most frequently offered key exchange algorithm TLS_DHE_DSS is again very similar to the TLS_ECDH_ECDSA at a frequency of 160.96% for the total dataset. This algo- rithm is significantly less frequent (137.96%) for mobile sessions, while lower (154.37%) for stationary sessions. It is more frequent (188.88%) for TLSv10 sessions and less (151.02%) for TLSv12 sessions. TLS_DH_RSA and TLS_DH_DSS are offered in tandem with the exact same frequency of 29.43% for the total dataset. It was observed at a higher frequency (37.75%) for mobile sessions and lower (24.29%) for stationary sessions. It was seen with a frequency of only with a frequency of 0.41% for TLSv10 sessions, while more frequent (36.16%) for TLSv12 sessions. The tenth most frequently offered algorithm is TLS_RSA_EXPORT, an export grade ci- pher. It is presented in more detail in the next paragraph.

Client Offered Export Grade Key Exchange Algorithms Results: Table 4.15 shows a break- down of export grade key exchange algorithms in offered cipher suites. The tenth most frequently offered and most frequently offered Export grade key exchange algorithm is TLS_RSA_EXPORT, which is offered on average at a frequency of 9.50% for the the total dataset (9.71% for mobile and 8.29% for stationary). In the subset of TLSv10 sessions the frequency is significantly higher at 36.20%, while in the TLSv12 subset it is much lower at only 1.71%. TLS_DHE_RSA_EXPORT and TLS_DHE_DSS_EXPORT are more or less offered in tan- dem with each other as the eleventh and twelfth most frequently offered key exchange ci- phers with frequencies of 3.76% and 3.75% respectively. In the mobile subset the frequency is

38 4.5. Trust in Protocol Version and Cipher Suite Selection

Table 4.15: Export-grade key exchange algorithms offered. # Cipher Total Mobile Stationary TLSv10 TLSv12 10 TLS_RSA_EXPORT 9.50% 9.71% 8.29% 36.20% 1.71% 11 TLS_DHE_RSA_EXPORT 3.76% 4.22% 3.25% 15.17% 0.64% 12 TLS_DHE_DSS_EXPORT 3.75% 4.21% 3.25% 15.13% 0.64% 22 TLS_RSA_EXPORT1024 0.39% 0.095% 0.28% 1.46% 0.0000011% 27 TLS_DHE_DSS_EXPORT1024 0.14% 0.038% 0.13% 0.69% < 10´7% 28 TLS_DH_ANON_EXPORT 0.12% 0.013% 0.045% 0.0042% 0.14% 33 TLS_DH_RSA_EXPORT 0.0068% 0.0090% 0.0055% 0.000077% 0.0082% 34 TLS_DH_DSS_EXPORT 0.0067% 0.0089% 0.0054% 0.000087% 0.0082% 38 TLS_KRB5_EXPORT 0.00095% < 10´7 0.0010% 0.0038% 0.00025%

Table 4.16: Encryption algorithms used. # Cipher Total Mobile Stationary TLSv10 TLSv12 1 AES_128_GCM 40.67% 40.76% 42.23% < 10´7% 50.36% 2 AES_256_CBC 23.65% 20.57% 23.02% 40.29% 19.74% 3 AES_128_CBC 20.16% 20.84% 20.05% 44.45% 14.33% 4 RC4_128 7.64% 9.28% 6.92% 13.58% 6.21% 5 AES_256_GCM 4.76% 6.01% 4.41% < 10´7% 5.90% 6 CHACHA20_POLY1305 2.48% 2.19% 2.66% < 10´7% 3.08% 7 3DES_EDE_CBC 0.63% 0.34% 0.69% 1.64% 0.38% 8 CAMELLIA_256_CBC 0.0041% 0.0013% 0.0059% 0.020% 0.000035% 9 NULL 0.0018% 0.00043% 0.0027% 0.0097% 0.0000005% 10 IDEA_CBC 0.00069% 0.000015% 0.0011% 0.0036% < 10´7% 11 DES_CBC 0.00047% 0.00061% 0.00039% 0.0024% 0.000015% 12 RC4_40 0.00019% 0.000038% 0.000079% 0.00017% < 10´7% 13 CAMELLIA_128_CBC 0.00013% 0.00023% 0.00013% 0.00067% 0.0000005% 14 DES40_CBC 0.000014% < 10´7% < 10´7% 0.000063% < 10´7% 15 RC2_CBC_40 0.0000018% < 10´7% < 10´7% 0.0000047% < 10´7% 16 SEED_CBC 0.0000013% < 10´7% 0.0000018% 0.0000023% 0.0000011%

slightly higher at 4.22% and 4.21%, while in the stationary it is lower at 3.25%. In the subset of TLSv10 sessions the frequency is considerably higher at 15.17% and 15.13%, while in the TLSv12 subset it is again much lower at 0.64% for both. Other export grade key exchange algorithms with smaller offered frequencies in- clude TLS_RSA_EXPORT1024 at 0.39%, TLS_DHE_DSS_EXPORT1024 at 0.14%, and TLS_DH_ANON_EXPORT at 0.12%. These are mostly offered in the stationary and TLSv10 subsets. A very small number of sessions also had the client offer TLS_DH_RSA_EXPORT at frequency of 0.0068%, TLS_DH_DSS_EXPORT at 0.0067%, and TLS_KRB5_EXPORT at 0.00095%.

4.5.4 Encryption Cipher Results Server Selected Encryption Cipher Results: The encryption ciphers used in observed ses- sions are presented in Table 4.16. The majority of sessions uses AES based encryption algorithms. The most prevalent is AES_128_GCM with a 40.67% total share. The mobile share of 40.76% and stationary share of 42.23% are very similar, while the TLSv12 subset share of 50.36% is significantly higher. In the TLSv10 subset the share is 0% due to AES in GCM mode not being supported. The second and third most prevalent ciphers are AES_256_CBC with a share of 23.65% and AES_128_CBC with a share of 20.16%. The stronger variant tAES_256_CBC is used in fewer

39 4.5. Trust in Protocol Version and Cipher Suite Selection

(20.57%) mobile sesions and similar (23.02%) stationary sessions, while for AES_128_CBC both subsets are similar, mobile with a share of 20.84% and stationary with a share of 20.05%. AES in CBC mode is used for the majority of TLSv10 sessions: 40.29% for AES_256_CBC and 44.45% for AES_128_CBC. Since most TLSv12 sessions use AES in GCM mode, the TLSv12 subset shares are lower: 19.74% for AES_256_CBC and 14.33% for AES_128_CBC. AES_256_GCM is the fifth most used cipher with a share of 4.76%, higher (6.0053%) for mobile sessions, while lower (4.41%) for stationary, again 0% share for TLSv10 sessions, and higher share of 5.9% for TLSv12 sessions. The fourth most most prevalent encryption cipher is RC4_128 with a 7.64% total share. In the mobile subset the share is higher at 9.28%, while in the stationary subset it is lower at 6.92%. It is used significantly more (13.58%) for TLSv10 sessions than (6.21%) TLSv12 sessions. CHACHA20_POLY1305 is used in 2.48% of all sessions, 2.19% of mobile sessions, 2.66% of stationary sessions, again 0% of TLSv10 sessions, and 3.077% of TLSv12 sessions. 3DES_EDE_CBC is is used in 0.63% of all sessions, 0.34% of mobile sessions, 0.69% of stationary sessions, 1.64% of TLSv10 sessions, and 0.38% of TLSv12 sessions sessions.

Client Offered Encryption Cipher Results: The encryption ciphers offered by clients in our dataset is presented in Table 4.17. In total, the most frequently offered encryption ciphers are AES_128_CBC with a fre- quency of 710.27% and AES_256_CBC with a frequency of 706.49%. These are more frequently offered for mobile sessions with frequencies of 759.29% and 755.53% respectively, with sim- ilar numbers for TLSv12 sessions with frequencies of 766.71% and 763.54% respectively. In the stationary subset the frequencies are lower at 682.08% and 678.30%. The TLSv10 sessions have even lower frequencies of 460.78% and 455.59%. 3DES_EDE_CBC is the third most frequently offered encryption cipher with a frequency of 302.11%. Again a higher (334.26%) frequency for mobile sessions, while lower (282.55%) for stationary. The TLSv10 and TLSv12 subsets do not significantly vary from the total dataset either with frequencies of 306.02% and 298.02% respectively. The fourth most frequently offered encryption cipher is RC4_128 with a frequency of 273.95%, i.e., on average as part of 2.74 cipher suites per session. The frequency is higher (329.22%) for mobile sessions, while a little lower (251.52%) for stationary. Again, the TLSv10 and TLSv12 do not significantly vary from the total dataset either with frequencies of 288.89% and 267.02% respectively. In contrast to AES in CBC mode, AES in GCM mode that is not supported in TLSv10, is the fifth and sixth most frequently offered encryption cipher. In total AES_128_GCM is offered with a frequency of 265.66%, with small deviations for mobile and stationary (267.32% and 265.99%, respectively). Surprisingly, in the TLSv10 subset the frequency is only 18.43%, while in the TLSv12 subsets the frequency is non-surprisingly higher at 324.58%. The more secure version AES_256_GCM is offered with a frequency of 154.74%. It is slightly higher (180.69%) for mobile and lower (140.01%) for stationary. A very similar frequency was observed for TLSv10 sessions of 18.02% and again higher for TLSv12 sessions of 187.32%. CHACHA20_POLY1305 is offered with a frequency of 65.06% for the total dataset, 52.68% for mobile and 73.19% for stationary. The algorithm was offered with a minimal (0.076%) frequency for TLSv10 sessions, while larger (80.54%) for TLSv12 sessions. Followed by DES_CBC with a frequency of 30.75% (37.02% for mobile and 25.64% for Stationary). The frequency is significantly higher (50.58%) for TLSv10 sessions, while lower (24.56%) for TLSv12 sessions. Another variant of DES is DES40_CBC which is the eleventh most frequently offered encryption cipher with 11.44% frequency for the total dataset (12.76% for mobile sessions and 9.86% for stationary). Significantly higher (45.50%) frequency for TLSv10 sessions, while lower (2.01%) for TLSv12 sessions. Yet another variant DES_CBC_40 with only 0.00043% frequency and is the 26th most frequently offered.

40 4.5. Trust in Protocol Version and Cipher Suite Selection

Table 4.17: Encryption algorithms offered. # Cipher Total Mobile Stationary TLSv10 TLSv12 1 AES_128_CBC 710.27% 759.29% 682.08% 460.78% 766.71% 2 AES_256_CBC 706.49% 755.53% 678.30% 455.59% 763.54% 3 3DES_EDE_CBC 302.11% 334.26% 282.55% 306.02% 298.02% 4 RC4_128 273.95% 329.22% 251.52% 288.89% 267.02% 5 AES_128_GCM 265.66% 267.32% 265.99% 18.43% 324.58% 6 AES_256_GCM 154.74% 180.69% 140.01% 18.02% 187.32% 7 CHACHA20_POLY1305 65.06% 52.68% 73.19% 0.076% 80.54% 8 DES_CBC 30.75% 37.02% 25.64% 50.58% 24.56% 9 CAMELLIA_256_CBC 13.18% 8.07% 13.30% 14.31% 12.01% 10 CAMELLIA_128_CBC 13.18% 8.07% 13.30% 14.31% 12.01% 11 DES40_CBC 11.44% 12.76% 9.86% 45.50% 2.01% 12 SEED_CBC 8.47% 5.41% 6.92% 16.95% 6.11% 13 RC4_40 4.08% 4.33% 3.46% 15.80% 0.72% 14 NULL 2.26% 0.90% 0.49% 8.67% 0.76% 15 CAMELLIA_256_GCM 2.11% 1.44% 3.24% 0.00025% 2.62% 16 CAMELLIA_128_GCM 2.11% 1.44% 3.24% 0.00026% 2.62% 17 IDEA_CBC 1.77% 0.93% 1.55% 2.38% 1.54% 18 RC2_CBC 1.65% 1.10% 1.53% 5.28% 0.42% 19 RC2_CBC_40 1.63% 1.08% 1.52% 5.21% 0.42% 20 RC4_56 0.21% 0.061% 0.15% 0.78% 0.0000011% 21 RC2_CBC_56 0.018% 0.018% 0.015% 0.070% < 10´7% 22 AES_256_CCM 0.014% 0.011% 0.0071% 0.00017% 0.017% 23 AES_128_CCM 0.014% 0.011% 0.0071% 0.00016% 0.017% 24 AES_256_CCM_8 0.0037% 0.0051% 0.0027% 0.000091% 0.0046% 25 AES_128_CCM_8 0.0037% 0.0051% 0.0027% 0.000080% 0.0046% 26 DES_CBC_40 0.00043% < 10´7% 0.00051% 0.0019% 0.000083% 27 28147_CNT 0.00032% 0.00039% 0.00029% 0.00014% 0.00035% 28 ARIA_128_CBC 0.00014% < 10´7% < 10´7% 0.00029% 0.000025% 29 ARIA_256_CBC 0.00013% < 10´7% < 10´7% 0.00028% 0.000024% 30 ARIA_256_GCM 0.00012% < 10´7% < 10´7% 0.00027% 0.000025% 31 ARIA_128_GCM 0.00012% < 10´7% < 10´7% 0.00026% 0.000023% 32 FORTEZZA_CBC 0.000012% < 10´7% < 10´7% 0.000023% 0.0000022%

Another offered encryption cipher is the SEED_CBC, the twelfth most frequently offered with a frequency of 8.47%. Both the mobile and stationary subsets are lower than the total with frequencies of 5.41% and 6.92% respectively. In the TLSv10 subset the frequency is sig- nificantly higher at 16.95%, while in the TLSv12 subset the frequency remains at 6.11%. Sim- ilarly, but with lower frequency, the IDEA_CBC is offered with a frequency of 1.77%, 0.93% for mobile sessions, 1.55% for stationary sessions, 2.38% for TLsv10 sessions, and 1.54% for TLSv12. While the RC4_128 is the most popular, the family of RC ciphers continues with RC4_40 with a frequency of 4.08%. Almost exclusively and significantly higher frequency (15.80%) for TLSv10 sessions, while in the TLSv12 subset the frequency is only 0.72%. RC2_CBC and RC2_CBC_40 is offered at a frequency of 1.63% to 1.65%. Again mostly TLSv10 sessions with frequencies 5.21% to 5.21%. Followed by RC4_56 which is offered at a frequency of 0.21%, 0.15% for stationary sessions, and 0.78% for TLSv10. But only 0.061% for mobile and insignif- icant 0.0000011% for TLSv12. The last RC cipher offered is RC2_CBC_56 at a frequency of 0.018% for all sessions. Another family of encryption ciphers are offered as the ninth, tenth, fif- teenth, and sixteenth most frequently encryption ciphers. CAMELLIA_256_CBC and CAMEL- LIA_128_CBC are both offered at a frequency of 13.18%. Lower frequency 8.07% for mobile

41 4.5. Trust in Protocol Version and Cipher Suite Selection

Table 4.18: MAC algorithms used. # Cipher Total Mobile Stationary TLSv10 TLSv12 1 SHA256 48.36% 47.73% 50.01% 0.000089% 59.88% 2 SHA1 30.98% 28.35% 30.78% 90.48% 16.79% 3 SHA384 14.77% 16.49% 13.83% 0.018% 18.28% 4 MD5 5.89% 7.43% 5.38% 9.50% 5.04% 5 NULL 0.0000004% < 10´7% < 10´7% < 10´7% < 10´7%

Table 4.19: MAC algorithms offered by client. # Cipher Total Mobile Stationary TLSv10 TLSv12 1 SHA1 1528.59% 1627.17% 1460.13% 1450.86% 1531.38% 2 SHA256 677.40% 715.14% 661.94% 116.22% 811.38% 3 SHA384 302.03% 353.63% 277.02% 64.10% 358.91% 4 MD5 63.02% 64.92% 58.57% 91.18% 51.44% 5 CCM 0.021% 0.012% 0.0089% 0.00016% 0.026% 6 CCM_8 0.0074% 0.010% 0.0054% 0.00017% 0.0091% 7 NULL 0.0020% 0.00045% 0.0028% 0.010% 0.0000016% 8 IMIT 0.00032% 0.00039% 0.00029% 0.00014% 0.00035% 9 RMD 0.000059% < 10´7% < 10´7% 0.00014% 0.0000016% 10 GOSTR3411 0.000014% < 10´7% < 10´7% 0.000035% 0.0000005% sessions. In the TLSv10 and TLSv12 subsets the frequencies are about one percent higher an lower at 14.31% and 12.01%. The other two ciphers in the same family are less popu- lar. CAMELLIA_256_GCM and CAMELLIA_128_GCM are offered with a total frequency of 2.11%, lower 1.44% for mobile sessions, while higher for stationary and TLSv12 sessions of 3.24% and 2.62% respectively, and insignificant frequency for TLSv10. AES cipher in the less common CCM mode is offered very infrequently. AES_256_CCM and AES_128_CCM are only offered with a frequency of 0.014%, mostly for mobile and TLSv12 sessions with frequencies of 0.011% and 0.017% respectively. AES_256_CCM_8 and AES_128_CCM_8 are only offered with a frequency of 0.0037%, mostly 0.0046% for TLSv12 sessions. Another rarely used and offered cipher is the ARIA family. ARIA_128_CBC, ARIA_256_CBC, ARIA_256_GCM, and ARIA_128_GCM are only offered with a frequency of 0.00012% to 0.00014%. Mostly in the TLSv10 subset with frequencies 0.00026% to 0.00029%.

4.5.5 MAC Cipher Results Server Selected MAC Cipher Results: The MAC ciphers used in observed sessions are pre- sented in Table 4.18. The largest share (48.36%) of all sessions and the majority (59.88%) of TLSv12 sessions use SHA256 as the MAC cipher. In the mobile and stationary subsets there are small variations with shares of 47.73% and 50.01%, respectively. The second most used MAC is SHA1 with a total share of 30.98% and a dominant majority (90.48%) of TLSv10 sessions. In the TLSv12 subset the share is significantly lower, at 16.79%.. Again small variations in the mobile and stationary subsets with shares of 28.35% and 30.78% respectively. The strongest SHA variant SHA384 is used in 14.77% of all sessions, mostly present in TLSv12 sessions with a 18.28% share. Somewhat larger share of 16.49% for mobile sessions, while slightly lower share of 13.83% for stationary. The fourth most frequently used MAC is MD5 with total share of 5.89%. Most frequently used in mobile sessions with a 7.43% share and TLSv10 sessions with a 9.50% share. Lastly, the NULL cipher, i.e., no MAC, is used in only a few instances.

42 4.5. Trust in Protocol Version and Cipher Suite Selection

Figure 4.7: Complementary cumulative distribution function (CCDF) of cipher suite list sizes offered by client and cumulative distribution function (CDF) of downgrades by the server.

Client Offered MAC Cipher Results: The MAC ciphers offered by clients in our dataset is presented in Table 4.19. In total, the most frequently offered MAC cipher is SHA1 with a frequency of 1528.59% ( i.e., on average as part of 15.28 offered cipher suites per session). In the mobile subset the frequency is slightly higher (1627.17%), while in the stationary and TLSv10 subsets it is lower (1460.13% and 1450.86%, respectively). The second most frequently offered MAC is SHA256 with a frequency of 677.40%, with small variations in the mobile and stationary subsets. In the TLSv10 subset the frequency (116.22%) is significantly lower, while in the TLSv12 the frequency (811.38%) is higher. SHA384 is offered at a frequency of 302.03%, with similar subset frequencies as SHA256. In the mobile and TLSv12 subset the frequencies are higher (353.63% and 358.91% respectively), while in the stationary and TLSv10 it is lower (277.02% and 64.10%, respectively). The last MAC with a significant offered frequency is MD5 with 63.02%. In the mobile and TLSv10 subsets the frequency is a higher (64.92% and 91.18%, respectively), while in the in the stationary and TLSv12 it is lower (58.57% and 51.44%, respectively). Other MAC ciphers offered with minimal frequencies are CCM, CCM_8, NULL, IMIT, RMD, and GOSTR3411.

4.5.6 Cipher Suite Offered List Sizes and Downgrades Results During the TLS session establishment the client sends a list of supported cipher suites ordered by prevalence. A large list indicates prioritization of compatibility over security. A large list typically means a list of cipher suites that has not omitted weak or broken ciphers. This is why it is interesting to look at list sizes offered by the clients.

43 4.5. Trust in Protocol Version and Cipher Suite Selection

From the list of cipher suites offered by the client, the server chooses one cipher suite to use for the session. Since the list is ordered by prevalence, we refer to the situation where the server doesn’t pick the client’s first choice as a downgrade. It is important to note that the client is not required to order the list from most secure to least secure, this is just a working assumption. Figure 4.7 shows both the list size data as a complementary cumulative distribution func- tion (CCDF) and the downgrade data as a cumulative distribution function (CDF). A table with the complete set data points for the graph can be found in the Appendix (Table A.5).

Cipher Suite Offered List Sizes Result: In more than 99% of both mobile and stationary sessions the client offers a list larger than 10 cipher suites. At least 16 cipher suites is offered in more than 90.31% of the mobile sessions and 81.52% of the stationary. Mobile clients offer larger lists of cipher suites than stationary. A list larger than 20 cipher suites are offered in more than 70.69% of the mobile sessions, and in a significantly lower 55.17% of the stationary sessions. A list larger than 25 cipher suites are offered in more than 59.14% of mobile sessions and in 47.03% of stationary. In more than 30.45% of mobile sessions and 22.99% of stationary sessions a list size of at least 36 cipher suites is offered. This drops down rapidly to 7.31% of mobile sessions and 6.09% of stationary for a list size of more than 37 ciphers. A list of at least 70 cipher suites are offered in more than 4.80% of mobile sessions and 3.14% of stationary, and at least 76 cipher suites are offered in more than 0.22% of mobile sessions and 0.27% of stationary.

Downgrades Results: A client’s first choice is picked by the servers in 15.02% of mobile sessions and in 18.12% of stationary. A client’s first or second choice is picked by the servers in 26.89% of mobile sessions and 37.71% of stationary. When including a third cipher suite to the list, the percentages increase to 37.41% of mobile and 44.68% of stationary. The percentages increase in small increments to at most a five step downgrade in 44.62% of mobile sessions and 54.59% of stationary, and at most a ten step downgrade made by the servers in 76.88% of mobile sessions and 79.77% of stationary. At most a 20 step downgrade is made by the servers in 86.26% of mobile sessions and 89.73% of stationary, and at most a 35 step downgrade is made by the servers in 96.80% of mobile sessions and 97.76% of stationary. The downgrades made continues up to 79.

RC4 Cipher in Cipher Suites Investigation Results: To gain additional insight into the cipher suite downgrades, we looked closer at in which position the RC4 encryption cipher was when chosen by the server. This result is presented as a cumulative distribution function (CDF) in Figure 4.8. The RC4 cipher is rarely chosen in positions 1-4 of the clients offered list of cipher suites. For mobile clients the data jumps to 24.97% for up to position 6 in the offered list. It continues on a shelf before it jumps to 29.40% for up to position 12. The data slowly increases to 32.91% for the RC4 cipher being in positions 1-25 of the offered list when chosen. The data goes up to 61.47% for up to position 26, 71.08% for up to position 31, and 99.22% for up to position 37. The highest position in a list when chosen is 93. For stationary clients the graph is overall lower than for mobile. The RC4 cipher is chosen in up to position 7 in the offered list in 5.84% of sessions. It continues on a shelf before it jumps to 17.42% for up to position 12. The percentages increases in small increments to 23.88% for position 1-25 in the offered list. The data goes up to 48.95% for up to position 26, 68.31% for up to position 31, and 97.88% for up to position 37.

44 4.6. Session Quality Evaluation

Figure 4.8: Cumulative distribution function (CDF) of RC4 Cipher when chosen by server.

4.6 Session Quality Evaluation

In the final part of the analysis we evaluate the observed HTTPS sessions to provide an overview of the security typical users experience when accessing the Internet.

Four-Level Classification: A four-level security classification is defined for the evaluation ranging from "Weak", "Acceptable", "Good" to "Strong" sessions. A session was, in 2015, classified as "Acceptable" if it complies with all of the following criteria:

• Uses protocol version TLSv1.0 or better

• Uses a NIST approved encryption cipher with at least 112-bit security strength

• Has a version-3 leaf certificate with a validity period of at most 25 months (2 years and a grace period)

• Uses a signature algorithm SHA1 or better

• Uses public keys with at least 112-bit security strength

Sessions that do not satisfy all of these minimum requirements are classified as "Weak". The reason for requiring a list of minimum criteria is due to the "Weak"est link aspect. It doesn’t matter much that a session has 128-bit encryption if it at the same time uses export grade key exchange. "Good" sessions further require that the certificate uses a stronger signature algorithm than SHA1 and either a key exchange algorithm with “perfect ” or encryption with at least 128-bit security strength.

45 4.6. Session Quality Evaluation

Figure 4.9: Clustered histogram of the session quality evaluation based on the four-level security classification.

Finally, for "Strong" sessions, we further restrict the protocol version to TLS v. 1.1 or better, the certificate validity period to at most 13 months, and the public key to use at least 128-bit security strength.

Session Quality Evaluation Results: Figure 4.9 summarizes the results of the session qual- ity evaluation using the four-level classification. The measurements shows that a majority 53.8% of the sessions were classified as "Good" and an additional 18.9% as "Acceptable". Furthermore a very insignificant 0.031% share as "Strong" sessions. This results, accordingly, in total 72.7% as "Acceptable" or better. While the bulk of sessions have at least "Acceptable" quality, the many "Weak" (27.3%) sessions cause concern. The mobile and stationary subsets shows 27.75% and 26.90% "Weak", 19.45% and 18.38% "Acceptable", 52.78% and 54.69% "Good", and finally 0.026% and 0.031% "Strong".

46 5 Discussion

In this chapter we discuss the results. The chapter is divided into one section for each trust relationship as well as a section discussing the ratio between HTTP and HTTPS, a section dis- cussing limitations of the methodology, and a section discussing the thesis is a wider context. In each section we provide insight in the form of reflections.

5.1 Ratio between HTTP and HTTPS

In this section we will discuss the data collection ratios. See section 4.2 for results. The data, with the exception of October 15, shows what we would expect of normal Inter- net usage. We can consequently conclude that our data was gathered successfully. It was surprising to see the ratio of HTTPS sessions being on par or greater than the ratio of HTTP sessions. We did expect a large portion of the established sessions to use HTTPS but not this large. A big reason could be that frequently use website like Facebook and Google use HTTPS. It also shows that many web sites take advantage of the availability of less strict CAs issuing certificates.

5.2 Trust in Browsers

In this section we will summarize and discuss the result of trust in browsers. See section 4.3 for the results. Table 5.1 shows a summary of browser analysis. The overall browsers distribution is skewered towards the three major browsers: 51.48% Google’s Chrome, 22.55% Apple’s Safari and 18.98% Mozilla’s Firefox. Microsoft’s Internet Explorer has a significantly lower share of only 6.72% and various other browsers have marginal shares.

Table 5.1: Browser usage and version distribution. Browser Observed Share Up-to-date One behind Two behind More behind Chrome 51.48% 22.40% 64.68% 1.82% 11.12% Safari 22.55% 0.62% 41.52% 10.36% 47.5% Firefox 18.98% 0.00% 0.00% 66.90% 33.1% .72% 2.20% 45.76% 14.17% 37.87%

47 5.3. Trust in Certificate Authorities

We observed 22.40% of sessions using up-to-date Chrome browsers being up-to-date with the majority 64.68% being one security update behind. For Safari we observed 0.62% being up-to-date, 41.52% one update behind and 10.36% two updates behind. For Firefox we ob- served almost no session (0.00%) that where one or fewer security updates behind, and the majority (66.90%) was two updates behind. For Internet Explorer, we saw only 2.20% even using MSIE 11.0, 45.76% using MSIE 10.0, 13.48% using MSIE 9.0, and the remaining percent- age are spread out on even older versions. Overall the percentage of up-to-date browser are rather low, which highlights a significant delay in the rollout of new security updates. It may be easy to point the finger at the browser companies, but it may very well be the fault of users who choose to not install updates.

5.3 Trust in Certificate Authorities

In the this section we will discuss the the result of trust in certificates authorities and provide some reflections. See section 4.4 for results.

Certificate Authorities: We find that the vast majority of leaf certificates are signed by a a handful of organizations. Part of this skew is due to rich-get-richer effects as buyers often select popular CAs as these may be less likely to be removed from root stores [4]. Some trust in Symantec may be inherited through the acquisition of ’s authen- tication business unit in 2010. Unfortunately, even the most highly used (and trusted) CAs can be compromised or make mistakes that degrade the overall security. For example, it was discovered that Symantec had issued test certificates for 76 domains they did not own (in- cluding for Google domains) and another 2,458 unregistered domains [29]. Google has since demanded that Symantec logs its certificates in publicly auditable Certificate Transparency (CT) logs [15]. If a signing certificate that has signed a substantial portion of the Internet is compromised it would significantly undermine the security and privacy of Web communications. Given past compromises of CAs, this raises a concern about the resilience of HTTPS/TLS to future compromises of even a few CAs.

Extended Validation Certificates: When considering EV certificates, an interesting obser- vations is that the skew is both higher and towards different CAs than for regular certificates, with Symantec Corporation and DigiCert making up the majority 83.8% of the observed ses- sions but less that 38% of the observed unique certificates. When compared to the active scans of Holz et al. [16] in 2011, we can note an overall small increase in the usage of EV certificates.

Certificate Signature Algorithms: Referring to the result in section 4.4.1. Despite being susceptible to known attacks and CAs no longer signing new certificates with SHA1, the cipher with RSA encryption is the second most used signature algorithm in our dataset. SHA1 is responsible for signing 50.27% of the authority certificates and 24.94% of the leaf certificates. The recommended signature algorithm [5], SHA256 with RSA encryption, has replaced SHA1 more so far for leaf certificates 72.26% than authority certificates 42.80%. When looking at EV Certificates the result is slightly better with 84.6% SHA256 and 15.2% SHA1 where SHA1 is observed in 25.3% of all sessions. Although we observed improvements compared to the 98.7% share of SHA1 that Durumeric et al. [12] observed in 2013, there is still a long way to go. While decisions by Mozilla, Microsoft, and Google, for example, to phaseout SHA1 (e.g., not showing a padlock symbol or to various degrees blocking SHA1 usage) may speed up this process, there have been setbacks in the outphasing as some of the browser companies have softened their decisions, including Mozilla re-enabling support for SHA1 in Firefox. Two

48 5.3. Trust in Certificate Authorities

Service providers such as Facebook and have also suggested a delay in the phaseout of SHA1, due to concern that millions of users with older devices would lose access to their services. A significantly more serious observation is the 10 (1.33%) authority certificates and 97 (0.14%) leaf certificates still using MD5, almost seven years after Sotirov et al. [30] successfully created a rogue CA certificate.

Certificate Public Key Algorithms: Referring to the result in section 4.4.1. While the vast majority of certificates uses public key algorithms with recommended key sizes, a non- negligible number of authority 1.33% and leaf 5.61% certificates uses weak 1,024-bit RSA keys, which the NIST recommended to stop using in 2013 [5]. The number of certificates with stronger than the minimum recommended key sizes is disappointingly low. Only 7.60% of unique authority certificates observed in 16.74% of ses- sions uses larger key sizes. For leaf certificates it’s even lower with only 3.48% of unique and 0.84% of sessions. Even when inspecting exclusively EV certificates, which you would assume to be using stronger keys, we did not find any significantly larger share of strong keys either. Although the positive observation about EV certificates was that not a single EV certificates used weak key sizes.

Certificate Validity Period Durations: Referring to the result in section 4.4.1. With NIST currently predicting that 112-bit security will be acceptable through 2030[5], certificates with weaker security put aside, there is no immediate need for concern. However, the persisting presence of certificates with weaker than recommended security might be a telling sign that using shorter validity periods is a better practice. Barring the most observed certificates, we do find a variety of certificates with validity periods of up to 37 years which is far beyond their predicted security.

Wildcard and Multi-Domain Certificates: Referring to the result in section 4.4.3. When regarding the wildcard feature we note that while only about a third of all certificates use the feature, these certificates are observed in a much larger share of sessions. This clearly indicates that the most frequently visited websites utilizes the wildcard feature. We observe that the largest share 49.12% of observed unique certificates have two do- main names, typically a combination of domain.com and *.domain.com or domain.com and www.domain.com. The first instance is very popular and it is easy to understand why, it covers both the primary domain and any sub domains. The second instance is less popular but in a strict sense more secure since it covers only two domains. However, while certificates with two domain names have the largest share of observed unique certificates they only make up 13.7% of observed sessions. The largest share 57.7% in observed sessions are single domain name certificates which only make up 32.56% of unique certificates. While it is good to see that single domain name certificates are the most frequently used certificates, the number of certificates with a high number domain names is significant. The top-five certificates with high number of domain names belongs to GlobalSign (514 and 510 domains), Google (two different certificates with 503 domains), and Technische Universitaet Muenchen (429 domains). In the top-10 we observed another four universities and another Google certificate.

Domain Matches Discussion: From the data we can derive that the name validation rules are rather lenient. While Exact Matches and Wildcard Matches are the most common, WWW Mismatches and Relaxed Wildcard Matches occurs in a significant number of sessions. Secu- rity and convenience are for the most part two opposite forces. In the case of domain matches we see push towards more convenience and consequently less security.

49 5.4. Trust in Protocol Version and Cipher Suite Selection

5.4 Trust in Protocol Version and Cipher Suite Selection

In this section we will discuss the result of the trust in protocol versions and cipher suite selection and provide insight in the form of reflections. See section 4.5 for results.

Protocol Version: Referring to the result in section 4.5.1. The distribution of the most fre- quently offered and selected protocol versions is almost completely divided between TLSv10 and TLSv12 with total observed shares 18.84% and 80.76% respectively. While in most cases the server chooses the version the client offers, we find that in 2% of the sessions the server downgrades to a lower version. In both subsets we observe a larger share of TLSv12 sessions compared to the total dataset, with mobile 83.42% sessions being slightly more frequent than stationary 81.74%. An interesting observation is the relatively insignificant share of TLSv11. In the dataset it is only offered by the client in 0.30% and selected in 0.32% of the sessions. With the protocol versions development being a progression from TLSv10 to TLSv11 to TLSv12, you would assume a step wise larger share up to the newest protocol. But instead we have observed a very low presence of TLSv11, which could imply a trend of transitioning directly to TLSv12 from TLSv10, or possibly that those devices that still only supports TLSv10 did not upgrade to TLSv11 when it was released either. The deprecated protocols SSLv2 and SSLv3 also have an insignificant share. SSLv2 is offered in 0.59% of sessions, but only selected in less than 0.001%. SSLv3 is similarly offered in only 0.27% and selected in 0.08% of sessions. While it is good to see that these protocols have a marginal presence, they should not be used at all because of the security risks.

Offered and Selected Key Exchange Ciphers: Referring to the result in section 4.5.3. In general, we find that the vast majority of the sessions uses computationally secure algorithms for key exchange. Based on the best attacks known, equal key sizes for EDH, DSS, and RSA give comparable levels of security [6]. Elliptic-curve based ciphers have long been recommended by security experts (and still are). It should however, be noted that ECDHE-based solutions in particular have recently come under scrutiny due to the influence that the (NSA) of the United States has in their design [1]. Key exchange ciphers using DHE-based algorithms provides forward secrecy (i.e., previ- ously recorded sessions would not be compromised by long-term keys being compromised in the future), while those that rely on the certificates public key, e.g. TLS_RSA, for both key exchange and authentication do not. Our data show that most sessions provide forward se- crecy, but the third largest share 16.31% TLS_RSA do not. Those make up 29.61% of sessions using protocol TLSv10. In contrast to which key exchange is used, we find that TLS_RSA is the most frequently offered option with frequency of 626.55% (i.e on average in more than six cipher suites per session). In 2015 alone, two attacks were discovered targeting the key exchange algorithm. The FREAK and Logjam attacks exploit bugs in the TLS/SSL implementation to downgrade ses- sions of servers that still support RSA-EXPORT and DHE-EXPORT grade ciphers, respec- tively [1]. Such downgrades allow an attacker to passively eavesdrop on the session. In total, we identified 428 (0.00019%) instances of TLS_RSA_EXPORT and 27 (0.000012%) instances of TLS_DHE_RSA_EXPORT being used. When looking at the subsets we can see that it is almost exclusively mobile devices and sessions using protocol version TLSv10. We did not observe any session using protocol version TLSv12 that also used RSA_EXPORT. When look- ing at ciphers suites offered by clients, we observe an alarmingly high frequency of offered cipher suites with an export grade key exchange. A cipher suite with TLS_RSA_EXPORT is offered as an option with a frequency of 9.50% for the total dataset and 36.20% for the TLSv10 subset. Positively, the TLSv12 subset shows a much lower frequency of 1.71%. A reason for

50 5.4. Trust in Protocol Version and Cipher Suite Selection this could be that devices that have not been updated to support the newer protocol versions, have not been updated to support better cipher suites either.

Offered and Selected Encryption Ciphers: Referring to the result in section 4.5.4. While the majority of session uses known secure ciphers, like AES_128 and AES_256 which are currently the most commonly used encryption algorithms, we discovered many cases where ciphers with weaker encryption are both offered and selected. For example, despite being prohibited from use in TLS [24] and viable attacks against RC4 being published in 2013 [2] by AlFardan et al, the RC4_128 cipher is still offered 54.7% and used 7.64% in many sessions. On a positive note, the overall numbers are down from what Holz et al. [16] observed in 2011. When comparing the subsets, we find that the RC4_128 cipher is more prevalent in mobile 9.28% sessions than stationary 6.92%, and TLSv10 13.58% sessions than TLSv12 (6.21%). Other even weaker ciphers are also used, although in significantly fewer sessions. For ex- ample: DES_CBC in 0.00047% of all sessions, RC4_40 in 0.00019%, DES40_CBC in 0.000014%, and RC2_CBC_40 in 0.0000018%. 3DES variants are acceptable compatibility ciphers and 3DES_EDE_CBC is used in 0.63% of all sessions. Non surprisingly slightly more prevalent for TLSv10 sessions at 1.64%. While we can conclude that the majority of sessions uses known secure encryption ci- phers, there are still a significant number of sessions with weak encryption. When comparing with older measurements we can in the case of the RC4 cipher see good trend.

Offered and Selected MACs: Referring to the result in section 4.5.5. Compared to other ciphers, fewer concerns have been raised regarding the used MACs for data integrity and authenticity during TLS sessions. This result is consistent with the relatively shorter lifetime of session MACs (compared to certificate MACs) and the relatively strong (session) MACs that we observed, including SHA256 (48.36%), SHA1 (30.98%), and SHA384 (14.77%). While MD5 (5.89%) is no longer acceptable in situations where collision resistance is required, such as for digital signatures, it is not urgent to stop using MD5 in HMAC-MD5 schemes [32]. MD5 is almost exclusively used together with the RC4 cipher. For sessions using TLSv10, which do not support SHA-2 family MACs, the vast major- ity 90.48% of sessions uses SHA1 and the remaining 9.50% using MD5. Looking at TLSv12 sessions, we see the majority 59.88% of sessions using SHA256 and 18.28% using SHA384. But there are still a considerable number of TLSv12 sessions using SHA1 and MD5 (shares of 16.79% and 5.04% respectively). Is this the fault of badly configured clients of servers? If we look at the result of offered MACs by clients we can see that SHA1 is offered at a frequency of 1450.86% for TLSv10 sessions and 1531.38% for TLSv12 sessions, and MD5 is offered at a frequency of 91.18% for TLSv10 sessions and 51.44% for TLSv12 sessions. So the frequency increases for SHA1 but decreases for MD5. This could be due to SHA-1 replacing MD5 as legacy compatible cipher. However, with SHA1 being cryptographically weakened and bet- ter options being available it should decrease in usage along with TLSv10. When comparing mobile and stationary in the case of MACs, we find that the difference is small. Interestingly, mobile clients have slightly larger shares of the most secure cipher SHA384 and of the least secure cipher MD5 compared to stationary clients (16.49% and 7.43% compared to 13.83% and 5.38% respectively). Conversely, stationary clients has slightly more sessions with SHA256 and SHA1 than mobile clients (50.01% and 30.78% compared to 47.73% and 28.35% respectively).

Offered Cipher Suite List Sizes and Downgrades: Referring to the result in section 4.5.6. Throughout our dataset, the clients offer a large list of cipher suites during the TLS negoti- ation. In more than 99 percent of all sessions the clients offer at least nine cipher suites and often substantially more. In the majority of cases there at least 20 cipher suites offered with

51 5.5. Methodology the most common list size being either 26 or 37. The number of offered ciphers suites are also found to be noticeably higher for mobile devices compared to stationary. As these lists of cipher suite often include weak or even broken ciphers, these results suggest that clients often prioritize compatibility (by giving servers many options) over secu- rity. Offering fewer options can help limit the choices and thereby hinder badly configured servers to select insecure ciphers. The lists are supposed to be prevalence ordered by the clients, which would make you assume that the clients would order the list from most secure to least secure, but that is rarely the case in our dataset. Regarding downgrade, we find that servers typically do not pick the clients’ top candidate and sometimes perform substantial downgrades. For example, the most preferred option is only chosen in 36 percent of all sessions, and in 20 percent of the cases an option outside the top 10 is selected. There is noticeable difference between mobile and stationary clients where a less preferred option is chosen generally more often for mobile devices than stationary. To gain additional insights into the downgrades and prevalence order of the lists, we looked closer at in what position the RC4 cipher was when chosen by the server. In more than 80 percent of the cases it is outside the top 10, and in 60 percent of the cases it is out- side the top 25. This observation suggests that the choice to use RC4 may be the result of poorly configured servers. But that conclusion does not absolve the clients from fault, since they should not be offering the RC4 cipher as an option at all. We observe noticeable earlier positions of the RC4 ciphers when chosen for mobile devices, compared to for the stationary devices. Suggesting that mobile clients orders the RC4 cipher higher in there lists. Another observation is that servers appear to be prioritizing user experience over security by not turning away clients not supporting strong security options.

5.5 Methodology

In this section we discuss the impact of potential limitations in the methodology (Chapter 3).

Data Collection: The data collected for this thesis was conducted in one place (university of Calgary, Canada) and over a single week. While we are confident that the monitored campus network provides a good examples of typical modern HTTPS communication, there is still undeniably a concern that the data is biased towards this specific area and does not actually reflect generalized behavior. Likewise, you can also raise concerns regarding the length of the collection period. A week is certainly long enough to provide an adequate amount of data for the kind of analysis performed in this thesis. But it does not necessarily provide enough data to draw sweeping conclusion of general behavior. Regarding the timing of the data collection, it was done at a time when we could be sure that there would be regular activity on the network. In our case, in the middle of a semester. There might be reasons, unknown to us, why this particular time might have been unfavourable. This aspect however, is negligible.

Passive vs Active Monitoring: In this thesis we relied solely on passive monitoring to col- lect our data. Another option is to use active monitoring which involves actively scanning the IP address space and initiate HTTPS sessions with the servers. This technique has the advantage of allowing data to be collected from the whole IP address space and the behavior as well as the cipher support of the servers can be determined. In contrast passive monitoring only provides information about the servers actually contacted by users on the network, but the advantage is that you get data about the typical behavior and cipher support of both the servers as well as the browsers.

52 5.6. Wider Context

Identifying Mobile and Stationary Devices: The scheme we used to identify sessions with mobile and stationary devices worked in our estimation very well. Tests made in controlled settings and sample tests in the whole dataset were both very accurate. This led us to con- clude that we successfully had identified subsets of mobile/stationary sessions. However, there is still a possibility that the process when applied to the whole large network were less accurate. If we, in our study, had discovered significant differences in the security of mobile and stationary sessions the true accuracy of the identification scheme would have become of increased importance.

5.6 Wider Context

Our society increasingly relies on web-based services and solutions. People read news and interact on social media online, shops online, do their banking online, etc ... Businesses, for example, increasingly utilizes online conferencing and provides employees with ways to work from home. These services and solutions depends on secure end-to-end communication that provides privacy, data integrity, and mutual authentication. While our technology for handling information and communication advances at an ever increasing pace, a critical question to ask is whether our capability to secure them is keep- ing up. In the vast and complex systems of technology, laws, and regulations that ensures for private persons and business alike, this thesis is meant to assist in answering the question by providing characterization of how HTTPS is used in practice. Mainly by highlighting and providing concrete examples of current shortcomings in the different trust relationships. For example: the low up-to-date browser versions, the signif- icant skew of CAs, slow transitions from broken or weakened ciphers and protocols, bad practice in certificate validity periods that affect weak keys and their long term security, and the lack of strong security for extra sensitive information like financial transactions and online banking.

53 6 Conclusion

In this chapter we restate the purpose of the thesis in the form of the three research questions, explain how and to what extent each aim was achieved. The chapter is divided into one section for each research question and a section for possible future work.

6.1 What are the Most Significant Trust Relationships in HTTPS Communication and How Trustworthy are They Actually in Practice?

In the quest to answer this research question, we identified four trust relationships of conse- quence: Trust between User and Browser, Trust between Browser and Certificate Authori- ties, Trust between Browser and Server, and the Trust between User and the Cipher Suite Negotiation. For each of these trust relationships we selected measuring parameters that where avail- able to us from the passive monitoring system at Calgary and analyzed the result.

Trust between User and Browser: To evaluate the trust between user and browser, we recorded the user agent data string of each session. This allowed us to determine how up- to-date the browsers where on security updates. The result was actually surprising with low up-to-date (Chrome 22.40%, Safari 0.62%, Firefox 0.00%, .20%) distribu- tion for all the common browsers. Most browsers were one update behind (Chrome 64.68%, Safari 41.52%, Firefox 0.00%, Internet Explorer 45.76%) and the majority of Firefox browsers being two updates behind (Chrome 1.82%, Safari 10.36%, Firefox 66.90%, Internet Explorer 14.17%). This result shows that regardless of which browser you use, you have to be wary of current security threats and cannot fully rely on your browser being updated to handle them.

Trust between Browser and Certificate Authorities:: The trust relationship between browser (and implicitly the user) and certificate authorities was evaluated by analyzing the distribution of certificate authorities among the observed certificates, the use of path length constraint extension among observed authority certificates, the strength of the cryptography in observed certificates, and the distribution as well as the cryptography of EV certificates among observed certificates and the CAs that issued them.

54 6.1. What are the Most Significant Trust Relationships in HTTPS Communication and How Trustworthy are They Actually in Practice? Regarding the CA distribution, a significant skew towards a handful of CAs where ob- served. The top three CAs in our dataset are Comodo CA Limited (22.94% of sessions), Go Daddy (18.08%), and GeoTrust (16.27%). This skew can partly be contributed to the "rich- get-richer" effect, but when an ever increasing portion the internet is signed by only a few signing certificates, it makes these certificates into critical security points. Which if compro- mised would significantly undermine the security and privacy of internet communication. Given the numerous past compromised of CAs, this raises significant security concerns. The path length constraints extension limits the length of a potential certificate chains and the trust delegation that is possible. Despite providing an extra measure of protection from misuse and helping to mitigate mistakes like issuing authority certificates instead of leaf certificates, we observed that 26.7% of all authority certificates did not specify any path length constraints. When considering the Cryptography in Certificates we looked at both the signature algo- rithm and the public key algorithm. Regarding the signature algorithm, we made two inter- esting discoveries. First, despite being susceptible to known attacks, SHA1 is still responsible for signing 50.27% of the authority certificates and 24.94% of the leaf certificates. Second and significantly more serious, we observed 10 authority certificates 1.33% and 97 leaf cer- tificates 0.14% still using the broken cipher MD5. Similarly, we also made two interesting observations about the public key algorithms. First, while the vast majority of certificates uses recommended key sizes, a non-negligible number of authority 1.33% and leaf 5.61% cer- tificates uses weak 1,024-bit RSA keys which the NIST recommended to stop using in 2013. Second, the share of certificates with stronger keys was disappointingly low. Only 7.60% of unique authority certificates observed in 16.74% of sessions and only 3.48% of unique leaf certificates observed in 0.84% of sessions. The EV certificate distribution was even more skewed than regular certificates, but to- wards different CAs. Symantec Corporation with 56.2% share and DigiCert with 27.6% share of all observed EV sessions and together 37.9% of all unique EV certificates. The cryptogra- phy in EV certificates is slightly better. The signature algorithm share of SHA1 is comparably only 15.2% with 25.3% share of sessions. However, even for exclusively EV certificates the share of stronger keys for the public key algorithm was very low. Positively, not a single EV certificates was observed with weak key sizes.

Trust between Browser and Server: The trust relationship between browser (and implicitly the user) and server was evaluated by analyzing the usage of wildcard and multiple domain in certificates, making independent domain match, analyzing the validity period durations of certificates, and by independently validating the certificate chains for each observed session. The wildcard and multi-domain features are very popular. It is easy to understand why, in one word: convenient. The main issue from a security standpoint is that the attack surface increases with the number of domains covered by a certificate. Wildcard certificates were used in 71.62% of sessions and in 35.68% of unique certificates. Regarding multiple-domain, 67.44% of all unique certificates are valid for at least 2 domains observed 42.27% of sessions, 9.16% for at least 10 domains observed in 19.03% of sessions, and 2.02% for at least 50 domains observed in 8.98% of sessions. The highest number of domains observed was certificate valid for 866 domains. The result suggests that websites often prioritize convenience or cost over security. To get insight in how the process of validating the domain works in practice, we made an independent domain matching for each observed session. This yielded mostly Wildcard matches (41.71%) and Exact matches (20.71%), but also a noticeable number of "no matches" (0.27%), "www mismatches" (0.03%), and "relaxed wildcard matches (0.01%). The result showed that the name validation rules are rather lenient. Due to continuous advances in computer and cryptographic technologies, certificates valid for an extended time period can quickly become viable targets for attack. Using shorter

55 6.1. What are the Most Significant Trust Relationships in HTTPS Communication and How Trustworthy are They Actually in Practice? validity periods is therefore a good practice. In our dataset the typical validity period where (4, 10, 15 years) for authority certificates and shorter (1, 2, 3 years) for leaf certificates. Barring the most observed certificates, a variety of certificates had even long lifespans with validity periods of up to 37 years which is far beyond their predicted security. For each observed non-resumption session, the certificate chains where independently validated using the Mozilla root store. While most sessions were validated successfully (94.8%), a non-negligible share (4.2%) were not. About 1% of sessions contained self-signed certificates, and in a few cases the certificate was outside its validity period (21,819 expired and 4,537 not yet valid). A reason for the sessions that where invalid may be that the Mozilla Root store specifically does not trust those certificates.

Trust between User and the Cipher Suite Negotiation: The trust relationship between user and the cipher suite negotiation was evaluated by analyzing the negotiation for selecting TLS/SSL protocol version, the negotiation for selecting cipher suite with individual results for the component ciphers (key exchange, encryption, and MAC), the list sizes of offered cipher suites and the downgrade made by servers when choosing it. Regarding SSL/TLS protocol version, the distribution is divided between TLSv10 (18.84%) and TLSv12 (80.76%), with marginal presence of TLSv11 (0.32%), and the depre- cated protocols SSLv3 (0.08%), and SSLv2 (< 0.001%). In most cases the server chooses the clients offer, but in approximately (2%) of sessions the server downgrades to a lower version. The relatively insignificant share of TLSv11 is interesting and implies a trend of transitioning directly to TLSv12 from TLSv10, or possibly that those devices that still only supports TLSv10 did not upgrade to TLSv11 when it was released either. While the marginal presence of dep- recated versions is good, those versions should not be present at all because of the security risks. In general, the observed sessions uses computationally secure key exchange ciphers. The vast majority uses either elliptic curve Diffie-Hellman TLS_ECDHE_RSA (62.11%) or TLS_ECDHE_ECDSA (20.32%), which both provides "perfect forward secrecy". There is also a significant share which do not, e.g. the third largest share TLS_RSA (16.31%). TLS_RSA is more prevalent in TLSv10 (29.61%) than TLSv12 (13.15%), and it is still the most frequently offered key exchange cipher (as part of more than six cipher suites per session). While elliptic curves have long been recommended by security experts (and still are), it should be noted that ECDHE-based solutions, in particular, have recently come under scrutiny due to the influence that the National Security Agency (NSA) of the United States has in their design. Furthermore, in 2015 alone, two attacks were discovered targeting the key exchange. The FREAK and Logjam attacks exploit bugs in the TLS/SSL implementation to down- grade sessions of servers that still support RSA-EXPORT and DHE-EXPORT grade ci- phers, respectively. In our dataset, 428 instances of TLS_RSA_EXPORT and 27 instances of TLS_DHE_EXPORT where observed. Export grade key exchange cipher are also still offered frequently, e.g. TLS_RSA_EXPORT is offered as part of cipher suite in (9.50%) of observed session. While the majority of session uses known secure ciphers, the most common being AES- 128 and AES-256, we observed many instances where weaker ciphers where both offered and selected. Despite being prohibited from use, the RC4_128 cipher is still offered in 54.7% of all sessions and used 7.64%. Other even weaker ciphers also have a marginal presence, e.g. DES_CBC (0.00047%), RC4_40 0.00019%), and DES40_CBC (0.000014%). 3DES variants are acceptable compatibility ciphers and 3DES_EDE_CBC is used in 0.63% of all sessions. While we can conclude that the majority of sessions uses known secure encryption ciphers, there are still a significant number of sessions with weak encryption. When comparing with older measurements we can overall see a good trend. Compared to key exchange and encryption ciphers, fewer concerns have been raised regarding the used MACs in TLS. The observed MACs are distributed between SHA256

56 6.2. Are There Any Significant Differences between the Security of Mobile and Stationary User Devices in HTTPS Communication? (48.36%), SHA1 (30.98%), SHA384 (14.77%) and MD5 (5.89%). With SHA1 being crypto- graphically weakened we should see a decline in its use. This is evidently the case with SHA1 having 90.48% for TLSv10 sessions and only 16.79% for TLSv12. Regarding MD5, it is no longer acceptable in situations where collision resistance is required, but it is not urgent to stop using MD5 in HMAC-MD5 schemes. In our data MD5 is almost exclusively used together with the RC4 encryption cipher. Throughout our dataset, the clients offer large lists of cipher suites. In more than 99% of all sessions at least nine cipher suites are offered and often substantially more. In the majority of cases there at least 20 cipher suites offered with the most common list size being either 26 or 37. As these list often include weak ciphers, these results suggest that clients often prioritize compatibility (by giving servers many options) over security. Unfortunately, servers typically do not pick the clients’ top candidate and sometimes perform substantial downgrades. The most preferred option is only chosen in 36% of all sessions, and in 20% of the cases an option outside the top 10 is selected. To gain further insights into the downgrades, we took a closer look at in what position the RC4 cipher was when chosen by the server. This showed that in more than 80% of the cases it was outside the top 10 ciphers suites offered, and in 60% of the cases it is outside the top 25. This suggests that the choice to use RC4 may be the result of poorly configured servers.

Conclusion: In conclusion, using the available parameters, we have managed to provide relevant insight into the trustworthiness of the trust relationships affecting the security of on- line users. We have highlighted risks associated with the lack of adherence to best practices, including the slow outphasing of weak protocols and similarly slow adoption of new ver- sions. For example, while modern browsers may be quick with security updates and patches, users typically use far from the latest versions. There is a significant skew in the organiza- tions signing certificates, with differences between regular leaf certificates and EV certificates suggesting substantial differences in websites’ trust in different CAs. Wildcard and multi- domain certificates are very popular, suggesting that websites often prioritize convenience or cost over security. Many certificates are valid for extended durations and do not limit path lengths to prevent authority certificates from further delegating signing ability. We have also seen that some clients offer broken ciphers (e.g. RC4), and servers sometimes choose them over better options.

6.2 Are There Any Significant Differences between the Security of Mobile and Stationary User Devices in HTTPS Communication?

Comparing the security between mobile and stationary clients was at first deemed difficult due to the issue of the relevant information being communicated in the encrypted part of the TLS session establishment. But through a correlation scheme and dispensation regarding the use of IP addresses (they where only stored temporarily in a rolling window and not part of the dataset) from Calgary we managed to get data that allowed us to compare mobile and stationary results in our dataset. In general, the per-session comparison between mobile and stationary users did not reveal many instances of significant discrepancy. However, there where a few interesting cases. For example:

Protocol Versions: Stationary devices offers lower level protocols more often than mobile devices: 14.02% of stationary devices offers TLSv10, compared to 12.20% of mobile devices. Similarly, 0.43% of stationary devices offers SSLv2, compared to only 0.02% of mobile devices. No significant differences, but it shows that mobile devices are slightly more up to date with TLS protocol support.

57 6.3. What Is the Quality of the Actual Security that Typical Users Experience when Accessing the Internet Using HTTPS? Export Grade Key Exchange: Regarding export grade key exchange ciphers, we found that in the majority of cases mobile devices offers them more often than stationary devices. For example, TLS_RSA_EXPORT is offered with a frequency of 9.71% for mobile and 8.29% for stationary, and TLS_DHE_RSA_EXPORT/TLS_DHE_DSS_EXPORT are offered with a fre- quency of 4.22%/4.21% for mobile and 3.25%/3.25% for stationary. Again no significant dif- ferences, but is interesting regardless.

Encryption Ciphers: The perhaps largest difference between mobile and stationary devices were observed in the encryption cipher category. The now deprecated encryption cipher RC4_128 is offered with a frequency of 329.22% (i.e. on average as part of more than three offered cipher suites) per mobile session and 251.52% stationary. Consequently, RC4_128 is used in 9.28% of mobile sessions and 6.92% stationary. Furthermore, even weaker ciphers like DES_CBC and RC4_40 are offered with a frequency of 37.02% and 4.33%, respectively, per mobile session compared to only 25.64% and 3.46% per stationary session.

MACs: For MACs, we interestingly found that mobile devices where observed more fre- quently with the most secure cipher SHA384 (16.49%) and the least secure MD5 (7.43%) compared to stationary (for which the corresponding shares were 13.83% and 5.38%, respec- tively).

Offered Cipher Suite List: Throughout the dataset, the number of offered ciphers suites are noticeably higher for mobile devices compared to stationary. There is also a noticeable difference in the case of downgrades where a less preferred option is chosen generally more often for mobile devices than stationary.

Conclusion: In conclusion, we are convinced that we through comparisons across the whole dataset have shown, to a high degree of certainty, that there are not any significant differences between the security of mobile and stationary user devices in HTTPS communication. There are however several cases where the data diverge noticeably, but not significantly enough to warrant a "yes" response to this research question.

6.3 What Is the Quality of the Actual Security that Typical Users Experience when Accessing the Internet Using HTTPS?

The quality of the actual security is an all encompassing term, meaning all factors related to the security of a HTTPS session needs to considered. In many cases a vulnerability in a single factor compromises the security of the whole session. For this reason we designed a four- level classification (Weak, Acceptable, Good, and Strong), where each level above "Weak" has a set of requirements that a session needs to fulfill to be considered. Any session that did not fulfil all the requirements for "Acceptable", "Good", or "Strong" where considered "Weak". The "Strong" class is meant to illustrate where we observed the upper bound region of the session security in practice to be.

Result: The result of the session quality evaluation showed a significant 27.3% share of ses- sions classified as "weak", 18.9% classified as Acceptable. The majority 53.8% of the sessions were classified as "Good", and a very insignificant 0.03% share classified as "Strong".

Stationary vs Mobile: In alignment with the other measurements, the session quality eval- uation did not show any significant differences between sessions with mobile and stationary client devices. Mobile sessions have small but noticeably larger share of "Weak" (27.75% com- pared to 26.90%) and "Acceptable" (19.45% compared to 18.38%) than stationary.

58 6.4. Future Work

Conclusion: In conclusion, we are convinced that we with the session based four-level clas- sification adequately answered the question of what typical security quality regular users experience when accessing the internet. we debated whether to make a more elaborate clas- sification with more levels and labels, but in the end we decided that fewer easily distin- guishable labels better conveyed the result for this research question. Our per-session quality evaluation showed that a significant portion have weak security quality. These results high- light the fact that many browsers and servers prioritize user experience over security.

6.4 Future Work

Combining Passive with Active Monitoring: The work of observing and analyzing com- munication systems in practice is in itself a never ending quest and should done regularly. During this thesis we had to limit the scope of the work and we found that there where cases when having access to active monitoring data in addition to the passive would have been beneficial. For example, with active measurements on all servers observed during the passive measurement you could get insight into the operation of those servers and pinpoint which are badly configured. Alternatively, finding that the browsers are actually at fault.

Methods for Ensuring Faster Transition of Ciphers and Protocols: One of the highlight conclusions of this thesis is the slow outphasing of weak protocols and the slow adoption of new TLS versions. It would be interesting to investigate ideas for ensuring faster transi- tions. One idea, would be to have an indicator in the browser that signals to the user what the security quality is for the connection. This would incentives the server administrators, especially of information sensitive services, to make sure their websites conformed to a good security standard. The metric used for evaluating the security would, however, need to be continuously updated.

Identify other Types of Clients: For this thesis, we limited the analysis to identification of browser versions as well as mobile and stationary user devices for the clients. In a future analysis, identifying other types (e.g., HTTPS libraries in various programming languages) of clients would be an interesting analysis.

Classify Sessions based on Businesses Type and Compare the Security: In this thesis, we compared the security between mobile and stationary user devices. It would be interest- ing to compare the security between different types of businesses (e.g., finance, e-commerce, banking, cloud services, entertainment, media, ...) and see if certain business domains take security measures statistically more serious than the others.

59 Bibliography

[1] David Adrian, Karthikeyan Bhargavan, Zakir Durumeric, Pierrick Gaudry, Matthew Green, J Alex Halderman, Nadia Heninger, Drew Springall, Emmanuel Thomé, Luke Valenta, et al. “Imperfect forward secrecy: How Diffie-Hellman fails in practice”. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. 2015. [2] Nadhem J AlFardan, Daniel J Bernstein, Kenneth G Paterson, Bertram Poettering, and Jacob CN Schuldt. “On the Security of RC4 in TLS.” In: Proceedings of the Usenix Security. 2013. [3] Christopher Allen and Tim Dierks. “The TLS protocol version 1.0”. In: RFC 2246, IETF (1999). [4] Hadi Asghari, Michel Van Eeten, Axel Arnbak, and Nico ANM van Eijk. “Security eco- nomics in the HTTPS value chain”. In: Twelfth Workshop on the Economics of Information Security (WEIS 2013), Washington, DC. 2013. [5] Elaine Barker and Quynh Dang. “Recommendation for Part 1: Gen- eral Revision 4”. In: NIST Special Publication 800-57 (Jan. 2016). URL: https : / / nvlpubs . nist . gov / nistpubs / SpecialPublications / NIST . SP . 800 - 57pt1r4.pdf. [6] Elaine Barker and Allen Roginsky. “Transitions: Recommendation for Transitioning the Use of Cryptographic Algorithms and Key Lengths”. In: NIST Special Publication 800- 131A (Jan. 2011). URL: http://csrc.nist.gov/publications/nistpubs/800- 131A/sp800-131A.pdf. [7] Kurt Baumgarter. TURKTRUST CA Problems. Incident report. 2013. URL: https:// securelist.com/blog/incidents/34893/turktrust-ca-problems-21/. [8] Jon Callas, Lutz Donnerhacke, Hal Finney, and Rodney Thayer. “OpenPGP message format”. In: RFC 2440, IETF (1998). [9] David Cooper, Stefan Santesson, Stephen Farrell, Sharon Boeyen, Rusell Housley, and William Polk. “Internet X. 509 public key infrastructure certificate and certificate revo- cation list (CRL) profile”. In: RFC 5280, IETF (2008). [10] Tim Dierks and Eric Rescorla. “The transport layer security (TLS) protocol version 1.1”. In: RFC 4346, IETF (2006).

60 Bibliography

[11] Tim Dierks and Eric Rescorla. “The transport layer security (TLS) protocol version 1.2”. In: RFC 5246, IETF (2008). [12] Zakir Durumeric, James Kasten, Michael Bailey, and J Alex Halderman. “Analysis of the HTTPS certificate ecosystem”. In: Proceedings of the Internet measurement conference. 2013. [13] Scott Fluhrer, Itsik Mantin, and Adi Shamir. “Weaknesses in the key scheduling algo- rithm of RC4”. In: Proceedings of the International Workshop on Selected Areas in Cryptog- raphy. 2001. [14] Alan Freier, Philip Karlton, and Paul Kocher. “The secure sockets layer (SSL) protocol version 3.0”. In: RFC 6101, IETF (2011). [15] Josef Gustafsson, Gustaf Ouvrier, Martin Arlitt, and Niklas Carlsson. “A first look at the CT landscape: Certificate Transparency logs in practice”. In: Proceedings of the Inter- national Conference on Passive and Active Network Measurement. 2017. [16] Ralph Holz, Lothar Braun, Nils Kammenhuber, and Georg Carle. “The SSL landscape: a thorough analysis of the x. 509 PKI using active and passive measurements”. In: Pro- ceedings of the Internet measurement conference. 2011. [17] Takanori Isobe, Toshihiro Ohigashi, Yuhei Watanabe, and Masakatu Morii. “Full plain- text recovery attack on broadcast RC4”. In: Proceedings of the International Workshop on Software Encryption. 2013. [18] Hugo Krawczyk, Ran Canetti, and Mihir Bellare. “HMAC: Keyed-hashing for message authentication”. In: RFC 2104, IETF (1997). [19] Adam Langley, Alfredo Pironti, Richard Barnes, and Martin Thomson. “Deprecating Secure Sockets Layer Version 3.0”. In: RFC 7568, IETF (2015). [20] , Adam Langley, and Emilia Kasper. “Certificate transparency”. In: RFC 6962, IETF (2013). [21] David Naylor, Alessandro Finamore, Ilias Leontiadis, Yan Grunenberger, Marco Mellia, Maurizio Munafò, Konstantina Papagiannaki, and Peter Steenkiste. “The cost of the s in HTTPS”. In: Proceedings of the ACM International on Conference on emerging Networking Experiments and Technologies. 2014. [22] Toshihiro Ohigashi, Takanori Isobe, Yuhei Watanabe, and Masakatu Morii. “How to recover any byte of plaintext on RC4”. In: Proceedings of the International Conference on Selected Areas in Cryptography. 2013. [23] Gustaf Ouvrier, Michel Laterman, Martin Arlitt, and Niklas Carlsson. “Characterizing the HTTPS trust landscape: a passive view from the edge”. In: IEEE Communications Magazine 55.7 (2017), pp. 36–42. [24] Andrey Popov. “Prohibiting RC4 Cipher Suites”. In: RFC 7465, IETF (2015). [25] Eric Rescorla. “The Transport Layer Security (TLS) Protocol Version 1.3”. In: RFC 8446, IETF (2018). [26] Eric Rescorla, Marsh Ray, Steve Dispensa, and Nasko Oskov. “Transport layer security (TLS) renegotiation indication extension”. In: RFC 5746, IETF (2010). [27] P. Saint-Andre, S. Santesson, and J. Hodges. “Representation and Verification of Domain-Based Application Service Identity within Internet Public Key Infrastructure Using X.509 (PKIX) Certificates in the Context of Transport Layer Security (TLS)”. In: RFC 6125, IETF (2011). [28] S Santesson, M Myers, R Ankney, A Malpani, S Galperin, and C Adams. “X. 509 internet public key infrastructure online certificate status protocol-ocsp”. In: RFC 6960, IETF (2013).

61 Bibliography

[29] Ryan Sleevi. “Sustaining digital certificate security”. In: Google blog post: https://googleonlinesecuritys. blogspot. com/2015/12/sustaining-digital-certificate-security. 28 (2015). [30] Alexander Sotirov, Marc Stevens, Jacob Appelbaum, Arjen K Lenstra, David Molnar, Dag Arne Osvik, and Benne de Weger. “MD5 considered harmful today, creating a rogue CA certificate”. In: 25th Annual Chaos Communication Congress. CONF. 2008. [31] The Zeek Network Security Monitor. URL: https://www.zeek.org. [32] Sean Turner and Tim Polk. “Prohibiting secure sockets layer (SSL) version 2.0”. In: RFC 6176, IETF (2011). [33] Peter Yee. “Updates to the Internet X. 509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile”. In: RFC 6818, IETF (2013).

62 A Appendix

This chapter presents complementary tables with additional data points beyond summary tables in prior sections or the exact data points used for some of the key figures in priors sections.

Table A.1: Protocol versions offered/used. Data points for Figure 4.4. Protocol Version Dataset Total Mobile Stationary SSLv2 Offered 0.59% 0.16% 0.43% Used 0.00% 0.00% 0.00% SSLv3 Offered 0.27% 0.03% 0.05% Used 0.08% 0.03% 0.05% TLSv10 Offered 16.13% 12.20% 14.02% Used 18.84% 16.24% 17.93% TLSv11 Offered 0.30% 0.28% 0.25% Used 0.32% 0.31% 0.28% TLSv12 Offered 82.72% 87.32% 85.26% Used 80.76% 83.42% 81.74%

63 Table A.2: Cipher suite used. Data points for Figure 4.5. # Cipher Suite Total Mobile Stationary TLSv10 TLSv12 1 TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 25.17% 25.03% 27.16% 0.00% 31.17% 2 TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 14.77% 15.16% 14.22% 0.00% 18.29% 3 TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384 10.00% 10.48% 9.42% 0.02% 12.38% 4 TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA 8.91% 9.99% 8.62% 25.92% 4.86% 5 TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA 8.43% 6.33% 8.10% 30.05% 3.33% 6 TLS_RSA_WITH_RC4_128_MD5 5.89% 7.43% 5.38% 9.50% 5.04% 7 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 4.40% 5.78% 3.93% 0.00% 5.45% 8 TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA 3.99% 4.33% 3.91% 8.28% 2.88% 9 TLS_RSA_WITH_AES_256_CBC_SHA 3.72% 2.89% 4.06% 8.34% 2.61% 10 TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 3.44% 3.80% 3.36% 0.00% 4.26% 11 TLS_RSA_WITH_AES_128_CBC_SHA 2.57% 1.62% 2.87% 7.36% 1.45% 12 TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256 1.50% 1.21% 1.61% 0.00% 1.86% 13 TLS_RSA_WITH_AES_256_CBC_SHA256 1.08% 0.52% 1.00% 0.00% 1.34% 14 TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 0.98% 0.98% 1.06% 0.00% 1.21% 15 TLS_RSA_WITH_RC4_128_SHA 0.97% 0.86% 0.83% 2.94% 0.48%

Table A.3: Cipher suite offered. Data points for Figure 4.6. # Cipher Suite Total Mobile Stationary TLSv10 TLSv12 1 TLS_RSA_WITH_AES_128_CBC_SHA 99.97% 99.20% 99.65% 78.46% 104.56% 2 TLS_RSA_WITH_AES_256_CBC_SHA 99.82% 98.87% 99.35% 77.30% 104.54% 3 TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA 98.62% 98.49% 98.47% 72.90% 104.59% 4 TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA 98.24% 97.73% 98.22% 72.61% 104.20% 5 TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA 97.33% 97.81% 97.51% 67.57% 104.24% 6 TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA 97.27% 97.79% 97.45% 67.30% 104.23% 7 TLS_RSA_WITH_3DES_EDE_CBC_SHA 91.56% 86.84% 92.26% 71.61% 95.44% 8 TLS_DHE_RSA_WITH_AES_128_CBC_SHA 70.64% 71.58% 71.12% 48.25% 75.18% 9 TLS_DHE_RSA_WITH_AES_256_CBC_SHA 70.38% 71.32% 70.89% 47.24% 75.16% 10 TLS_EMPTY_RENEGOTIATION_INFO_SCSV 69.08% 81.75% 64.44% 44.86% 74.14% 11 TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 68.89% 67.34% 70.76% 5.25% 84.07% 12 TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 62.62% 64.10% 63.74% 5.25% 76.30% 13 TLS_RSA_WITH_RC4_128_SHA 61.62% 69.49% 57.53% 71.85% 58.39% 14 TLS_RSA_WITH_AES_128_GCM_SHA256 61.50% 63.61% 61.37% 5.16% 74.94% 15 TLS_RSA_WITH_RC4_128_MD5 54.59% 58.71% 51.35% 70.00% 50.12%

64 Table A.4: Offered key exchange algorithms. Full version of Table 4.14. # Cipher Total Mobile Stationary TLSv10 TLSv12 1 TLS_RSA 626.55% 651.51% 607.03% 442.41% 665.26% 2 TLS_ECDHE_ECDSA 515.91% 560.64% 502.51% 249.63% 578.79% 3 TLS_ECDHE_RSA 506.27% 555.10% 490.55% 265.63% 563.13% 4 TLS_DHE_RSA 334.38% 343.65% 329.29% 194.05% 364.56% 5 TLS_ECDH_ECDSA 169.97% 205.84% 151.89% 145.56% 175.21% 6 TLS_ECDH_RSA 169.96% 205.85% 151.90% 145.64% 175.19% 7 TLS_DHE_DSS 160.96% 137.96% 154.37% 188.88% 151.02% 8 TLS_DH_RSA 29.43% 37.75% 24.29% 0.41% 36.16% 9 TLS_DH_DSS 29.43% 37.75% 24.29% 0.41% 36.16% 10 TLS_RSA_EXPORT 9.50% 9.71% 8.29% 36.20% 1.71% 11 TLS_DHE_RSA_EXPORT 3.76% 4.22% 3.25% 15.17% 0.64% 12 TLS_DHE_DSS_EXPORT 3.75% 4.21% 3.25% 15.13% 0.64% 13 TLS_SRP_SHA_RSA 2.12% 2.53% 1.45% 5.77% 1.20% 14 TLS_SRP_SHA_DSS 2.12% 2.53% 1.45% 5.77% 1.20% 15 TLS_ECDH_ANON 2.09% 0.12% 0.24% 8.52% 0.48% 16 TLS_DH_ANON 0.70% 0.049% 0.26% 0.032% 0.85% 17 SSLv20_CK_RC4_128 0.55% 0.14% 0.39% < 10´7% < 10´7% 18 SSLv20_CK_DES_192_EDE3_CBC 0.54% 0.14% 0.39% < 10´7% < 10´7% 19 TLS_PSK 0.43% 0.47% 0.38% 0.013% 0.52% 20 SSLv20_CK_RC4_128_EXPORT40 0.42% 0.13% 0.38% < 10´7% < 10´7% 21 SSLv20_CK_DES_64_CBC 0.42% 0.13% 0.38% < 10´7% < 10´7% 22 TLS_RSA_EXPORT1024 0.39% 0.095% 0.28% 1.46% 0.0000011% 23 SSL_RSA_FIPS 0.31% 0.14% 0.33% 0.96% 0.017% 24 SSLv20_CK_RC2_128_CBC 0.28% 0.045% 0.27% < 10´7% < 10´7% 25 SSLv20_CK_RC2_128_CBC_EXPORT40 0.27% 0.040% 0.27% < 10´7% < 10´7% 26 TLS_SRP_SHA 0.19% 0.0077% 0.076% 0.018% 0.20% 27 TLS_DHE_DSS_EXPORT1024 0.14% 0.038% 0.13% 0.69% < 10´7% 28 TLS_DH_ANON_EXPORT 0.12% 0.013% 0.045% 0.0042% 0.14% 29 SSLv20_CK_IDEA_128_CBC 0.039% 0.019% 0.033% < 10´7% < 10´7% 30 TLS_DHE_PSK 0.017% 0.023% 0.013% 0.00041% 0.021% 31 TLS_RSA_PSK 0.015% 0.021% 0.012% 0.00035% 0.019% 32 TLS_ECDHE_PSK 0.0100% 0.014% 0.0075% 0.00029% 0.012% 33 TLS_DH_RSA_EXPORT 0.0068% 0.0090% 0.0055% 0.000077% 0.0082% 34 TLS_DH_DSS_EXPORT 0.0067% 0.0089% 0.0054% 0.000087% 0.0082% 35 TLS_NULL 0.0020% 0.00045% 0.0028% 0.010% 0.0000016% 36 TLS_PSK_DHE 0.0015% 0.0020% 0.0011% 0.000035% 0.0018% 37 TLS_KRB5 0.0011% < 10´7% 0.0015% 0.0057% 0.000016% 38 TLS_KRB5_EXPORT 0.00095% < 10´7% 0.0010% 0.0038% 0.00025% 39 TLS_GOSTR341001 0.00030% 0.00039% 0.00029% 0.00010% 0.00034% 40 SSL_RSA 0.000078% < 10´7% 0.00012% 0.00013% < 10´7% 41 TLS_GOSTR341094 0.000032% < 10´7% < 10´7% 0.000068% 0.0000082% 42 SSL_FORTEZZA_KEA 0.000023% < 10´7% < 10´7% 0.000049% 0.0000038%

65 Table A.5: List size and downgrades. The majority of the data points for Figure 4.7. List Size Offered by Client Downgrade by Server # Mobile Stationary # Mobile Stationary Table A.6: RC4 cipher position when chosen by server. 0 15.02% 18.12% Data points for Figure 4.8. 1 100.00% 100.00% 1 26.89% 37.71% RC4 position when chosen 2 100.00% 100.00% 2 37.41% 44.68% Position Mobile Stationary 3 99.93% 99.95% 3 38.89% 47.10% 1 0.00% 0.01% 4 99.89% 99.90% 4 43.61% 52.79% 2 0.21% 0.87% 5 99.89% 99.90% 5 44.62% 54.59% 3 0.53% 2.25% 6 99.88% 99.89% 6 50.14% 58.98% 4 0.53% 2.39% 7 99.77% 99.87% 7 55.40% 63.77% 6 24.97% 2.39% 8 99.25% 99.19% 8 62.34% 69.16% 7 25.01% 5.84% 9 99.24% 99.14% 9 73.71% 77.09% 8 25.01% 5.84% 10 99.14% 99.06% 10 76.88% 79.77% 9 25.01% 5.85% 11 95.97% 90.40% 11 78.39% 81.14% 11 25.06% 5.92% 12 93.68% 84.31% 12 79.49% 82.74% 12 29.40% 17.42% 13 93.28% 83.60% 13 81.30% 85.10% 13 30.01% 17.44% 14 91.42% 82.20% 14 82.98% 86.68% 14 30.04% 17.46% 15 90.31% 81.52% 15 84.13% 88.04% 15 30.11% 18.64% 16 86.08% 74.44% 16 84.31% 88.33% 16 30.31% 19.05% 17 73.73% 57.53% 17 84.50% 88.53% 18 30.45% 19.38% 20 70.69% 55.17% 20 86.26% 89.73% 19 30.65% 20.77% 21 70.41% 54.66% 21 86.63% 90.36% 20 31.32% 21.57% 22 62.59% 50.31% 22 86.67% 90.42% 21 32.91% 23.12% 24 59.74% 47.67% 24 87.08% 90.86% 24 32.91% 23.12% 25 59.14% 47.03% 25 90.92% 93.52% 25 32.91% 23.88% 26 35.96% 26.87% 26 91.20% 93.79% 26 61.47% 48.95% 27 35.54% 26.33% 27 91.36% 93.92% 27 61.76% 52.04% 28 34.88% 25.81% 28 95.26% 96.51% 28 61.83% 52.35% 29 34.65% 25.55% 29 95.74% 96.85% 29 61.83% 52.64% 31 34.54% 25.44% 31 95.97% 97.10% 30 65.35% 66.65% 33 33.82% 24.98% 33 96.63% 97.63% 31 71.08% 68.31% 35 30.45% 22.99% 35 96.80% 97.76% 33 72.84% 69.07% 37 7.31% 6.09% 37 99.87% 99.85% 34 72.84% 69.21% 38 6.90% 5.77% 38 99.87% 99.85% 35 72.91% 69.25% 41 6.86% 5.74% 41 99.89% 99.87% 36 72.91% 69.32% 43 6.58% 5.65% 43 99.90% 99.87% 37 99.22% 97.88% 44 6.45% 5.58% 44 99.90% 99.87% 40 99.50% 98.05% 46 6.14% 5.28% 46 99.90% 99.89% 44 99.50% 98.05% 48 6.07% 5.23% 48 99.94% 99.93% 45 99.50% 98.47% 49 5.57% 4.83% 49 99.94% 99.93% 50 99.50% 98.48% 50 5.51% 4.79% 50 99.94% 99.93% 56 99.50% 98.48% 54 5.44% 4.74% 54 99.94% 99.95% 58 99.50% 98.48% 60 5.35% 4.21% 60 99.95% 99.95% 60 99.52% 98.48% 66 5.02% 3.35% 66 99.99% 99.99% 64 99.52% 99.89% 69 4.80% 3.14% 69 100.00% 100.00% 68 99.61% 99.89% 75 0.22% 0.27% 78 100.00% 100.00% 75 100.00% 99.93% 76 0.17% 0.23% 79 100.00% 100.00% 86 100.00% 99.94% 80 0.05% 0.08% 93 100.00% 100.00% 87 0.04% 0.07% 91 0.01% 0.03% 98 0.00% 0.02% 322 0.00% 0.00%

66