Measuring and Applying Invalid SSL Certificates: the Silent Majority

Measuring and Applying Invalid SSL Certificates: The Silent Majority Taejoong Chung∗ Yabing Liu∗ David Choffnes∗ Dave Leviny Bruce M. Maggsz Alan Mislove∗ Christo Wilson∗ ∗Northeastern University y University of Maryland z Duke University and Akamai Technologies ABSTRACT 1. INTRODUCTION SSL and TLS are used to secure the most commonly- Secure Sockets Layer (SSL) and Transport Layer Se- used Internet protocols. As a result, the ecosystem of curity (TLS)1 are responsible for securing Internet traf- SSL certificates has been thoroughly studied, leading fic for a variety of common protocols (HTTP, SMTP, to a broad understanding of the strengths and weak- IMAP, etc.). Coupled with a Public Key Infrastructure nesses of the certificates accepted by most web browsers. (PKI), SSL provides authenticated identities via certifi- Prior work has naturally focused almost exclusively on cate chains and private communication via encryption. \valid" certificates|those that standard browsers ac- The web's SSL certificate ecosystem has been stud- cept as well-formed and trusted|and has largely disre- ied extensively [13, 14, 17, 25, 30, 53], with the broad garded certificates that are otherwise \invalid." Surpris- goal of better understanding how resilient websites and ingly, however, this leaves the majority of certificates browsers are to attacks on end-to-end authentication unexamined: we find that, on average, 65% of SSL cer- and confidentiality. As such, the vast majority of these tificates advertised in each IPv4 scan that we examine studies (we discuss some exceptions in x3) naturally fo- are actually invalid. cus on the valid certificates found on the web, that is, In this paper, we demonstrate that despite their inva- the certificates that are well-formed, are within their lidity, much can be understood from these certificates. validity periods, have a certificate chain that verifies Specifically, we show why the web's SSL ecosystem is at each level, and are rooted in a widely-trusted set of populated by so many invalid certificates, where they root certificates [8]. The prior studies focused almost originate from, and how they impact security. Using a exclusively on valid certificates because, after all, if a dataset of over 80M certificates, we determine that most certificate is not valid, one cannot confidently attribute invalid certificates originate from a few types of end- it to the websites under study. user devices, and possess dramatically different proper- In this paper, we take another look at the SSL cer- ties than their valid counterparts. We find that many of tificate ecosystem, focusing not only on the valid cer- these devices periodically reissue their (invalid) certifi- tificates, but also the invalid ones. Using a dataset cates, and develop new techniques that allow us to track of over 80M certificates collected from 222 full IPv4 these reissues across scans. We present evidence that scans over three years, we find, surprisingly, that almost this technique allows us to uniquely track over 6.7M de- 88% of certificates we observe across all these IPv4-wide vices. Taken together, our results open up a heretofore scans are invalid. In other words, most prior studies of largely-ignored portion of the SSL ecosystem to further the SSL certificate ecosystem have focused on a mere study. 12% of the overall space of SSL certificates. The broad goal of this paper is to understand why so much of the web's PKI consists of invalid certificates, to evaluate from where these certificates originate, to understand the security implications of these certificates, and to Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or demonstrate the value in this long-overlooked portion distributed for profit or commercial advantage and that copies bear this notice of the certificate ecosystem. and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is We make the following contributions. First, we per- permitted. To copy otherwise, or republish, to post on servers or to redistribute to form a study of all invalid SSL certificates collected from lists, requires prior specific permission and/or a fee. Request permissions from full IPv4 port 443 scans over three years. We find that [email protected]. invalid certificates have considerable differences from IMC 2016, November 14–16, 2016, Santa Monica, CA, USA c 2016 Copyright held by the owner/author(s). Publication rights licensed to 1TLS is the successor of SSL, but both use the same cer- ACM. ISBN 978-1-4503-4526-2/16/11. $15.00 tificates. We refer to \SSL certificates," but our findings apply DOI: http://dx.doi.org/10.1145/2987443.2987454 equally to both. their valid counterparts in terms of validity periods, life- certificates, starting from a root certificate through zero times, expiration dates, and sharing of public keys. or more intermediate certificates, to a leaf certificate. Second, we evaluate the origins of these invalid certifi- Each certificate is signed with the private key corre- cates, and find that they largely originate from users' sponding to the certificate in the higher level, except end-devices, including wireless access points, printers, the self-signed root certificate. When a client connects VoIP phones, and cable modems (for which the cer- to an HTTPS-secured site, it must verify that the certificates are used to enable \secure" remote administra- tificate advertised by the server is valid. tion). Moreover, we find that invalid certificates tend On the Internet, X.509 [8] is the most commonly to be advertised by many fewer hosts, and that they are used certificate management standard. X.509 certifi- advertised in a very different portion of the IP address cates typically include a subject and public key, a serial space than valid certificates. number (unique for the issuer), a validity period, ac- Third, we find that many of these devices periodi- ceptable usage of the key, and ways to check whether cally reissue new (invalid) certificates, and we evalu- the certificate has been revoked [30]. ate the behavior of such reissues. Tracking reissues of invalid certificates proves to be considerably more dif- Invalid certificates. The X.509 RFC [8] defines a cer- ficult than for valid ones, as invalid certificates often tificate as invalid if a client is unable to validate it at have non-unique Common Names or modify their Common some point in time. There are multiple reasons that Name on each reissue (whereas a valid website generally a client could find a certificate to be invalid: it could maintains its domain name as its Common Name in all of be outside of its validity period, it could have been re- its certificates). We develop a set of techniques that al- voked by its CA, its subject could be incorrect, its sig- low us to link multiple invalid certificates that are likely nature could be wrong, and so on. Because our dataset to come from a single device. spans years (x4), we define a certificate as invalid if no Fourth, applying our techniques to link together dif- client with a standard set of root certificates would ever ferent certificates, we demonstrate that invalid certifi- be able to validate it (i.e., we ignore expiry warnings). cates can be used as a means to track millions of user The most common reason for invalidity that we have devices as they change IP addresses. Our techniques observed is certificates signed by an unknown or un- offer a complementary view to those provided by other trusted root; if the client does not trust the root of a device-tracking schemes [1, 2, 40, 47] in that our tech- certificate chain, it transitively does not trust the rest niques do not require us to recruit users, and can be of the chain. Specifically, in our dataset, we found that performed at scale; we show that we are able to track 88.0% of invalid certificates are self-signed (i.e., the root 6.7M unique end-user devices for over a year. of the chain is the leaf certificate itself) and a further We make all of our code and data publicly available 11.99% are signed by a different, untrusted certificate to the research community at (i.e., the root of the chain is some other certificate that is not in the set of trusted root certificates).2 https://securepki.org Internet-connected devices. Internet-connected de- The remainder of this paper is organized as follows: vices are widely popular today, including end-user we provide background in x2 and an overview of related routers, printers, cable/DSL modems, IP cameras, VoIP work in x3. We describe our dataset and methodology telephones, thermostats, and network-attached storage in x4. In x5, we evaluate the properties of the invalid devices. Many of these devices provide a web server to certificates and compare them to those of valid certifi- allow end users to access and manage the device. A cates. We present and evaluate our methodology for recent trend is to enable both HTTP and HTTPS ver- detecting reissues of invalid certificates in x6. In x7, we sions of this web server; devices that do so need an SSL explore using our techniques to track end-user devices, certificate for the HTTPS site. and we conclude in x8. While some devices allow users to upload a certificate, we find that most generate and use an invalid certificate 2. BACKGROUND by default. There are several reasons for this behavior. SSL and TLS now secure the vast majority of online First, until recently [31], obtaining valid SSL certifi- communication.

Measuring and Applying Invalid SSL Certificates: the Silent Majority

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support