The Tangled Web of Password Reuse
Total Page:16
File Type:pdf, Size:1020Kb
The Tangled Web of Password Reuse Anupam Das∗, Joseph Bonneauy, Matthew Caesar∗, Nikita Borisov∗ and XiaoFeng Wangz ∗University of Illinois at Urbana-Champaign fdas17, caesar, [email protected] yPrinceton University [email protected] zIndiana University at Bloomington [email protected] Abstract—Today’s Internet services rely heavily on text-based password meters to help users understand the strength of their passwords for user authentication. The pervasiveness of these passwords. Studies have shown that password composition services coupled with the difficulty of remembering large numbers policies along with password meters (or verbal notifications) of secure passwords tempts users to reuse passwords at multiple do help users to choose stronger passwords [35], [44], [46]. sites. In this paper, we investigate for the first time how an However, they also increase user fatigue. attacker can leverage a known password from one site to more easily guess that user’s password at other sites. We study several Unfortunately, the number of passwords a user must re- hundred thousand leaked passwords from eleven web sites and member continues to increase, with typical Internet user es- conduct a user survey on password reuse; we estimate that 43- timated to have 25 distinct online accounts [10], [11], [32]. 51% of users reuse the same password across multiple sites. Because of this, users often reuse passwords across accounts We further identify a few simple tricks users often employ to on different online services. Password reuse introduces a secu- transform a basic password between sites which can be used by an attacker to make password guessing vastly easier. We develop the rity vulnerability as an attacker who is able to compromise one first cross-site password-guessing algorithm, which is able to guess service can compromise other services protected by the same 30% of transformed passwords within 100 attempts compared to password, reducing overall security to that of the weakest site. just 14% for a standard password-guessing algorithm without Password reuse cannot be prevented by traditional composition cross-site password knowledge. policies or meters, as these tools only see passwords at a single site. Recent high-profile leaks of large numbers of I. INTRODUCTION passwords (including 55 000 accounts from Twitter [1], [12], [16], 450 000 accounts from Yahoo [17]–[19], and 6.5 million Text passwords are the most common mechanism for accounts from LinkedIn [2], [6], [8]) demonstrate that attackers authenticating human users of computing systems, particularly can potentially gain huge lists of valid credentials for use in on the Internet. Password security has become a key research cross-site password attacks. interest due to the pervasiveness of modern web services and their increasingly critical nature. Passwords form the Beyond attacks exploiting exact password reuse, it is an foundation of security policy for a broad spectrum of online open question if an attacker can use knowledge of a user’s services, protecting users’ financial transactions, health records password at one site to more easily guess a different password and personal communications, as well as blocking intrusions chosen by the same user at another site. In this work, we study into corporate, power grid, and military networks. this question. We examine several leaked password data sets to measure password reuse across Internet sites and find that Security can be undermined if passwords are easy to guess exact reuse of passwords is often mitigated by the fact that and research has consistently shown that users tend to choose different sites have different complexity policies. However, we simple passwords that are easy to remember [20], [33]. To also find that users often use simple tricks to work around counter this, online services often make use of password these different policies, for example making small edits to a composition policies (e.g., “the password must contain a mix common passphrase (e.g., adding a number 1 to the end of of upper- and lower-case letters and at least one number”), or a password used at another site). We find that users typically use a very small set of simple rules to make these edits which can vastly improve an attacker’s ability to guess passwords Permission to freely reproduce all or part of this paper for noncommercial purposes is granted provided that copies bear this notice and the full citation at other sites. For example, we were able to guess 30% of on the first page. Reproduction for commercial purposes is strictly prohibited non-identical leaked password pairs within 100 attempts (10% without the prior written consent of the Internet Society, the first-named author password pairs required less than 10 attempts) while existing (for reproduction of an entire paper only), and the author’s employer if the password cracking libraries (like John the Ripper [4]) were paper was prepared within the scope of employment. able to crack 14% of the passwords. NDSS ’14, 23-26 February 2014, San Diego, CA, USA Copyright 2014 Internet Society, ISBN 1-891562-35-5 In this work, we make the following key contributions: http://dx.doi.org/doi-info-to-be-provided-later • An empirical estimate of the rate of direct password 1: Uppercase characters (A through Z) reuse for accounts controlled by the same user at 2: Lowercase characters (a through z) different websites, 43%, based on the largest data set 3: Base 10 digits (0 through 9) yet collected for this purpose. 4: Non-alphanumeric ASCII characters: ˜!@%ˆ&* -+=—()fg[]:;”‘’<>,.?$n • Extensive analysis of the similarity of non-identical passwords from the same users across different online More advanced policies can be used (e.g., NIST guide- accounts. This sheds light on how users modify their lines [25]) which may further increase guessing difficulty. password for reuse across different sites, as well as the However, password policies usually instill a tradeoff between specific transformation patterns users use when faced usability and security. Complex policies lead to password with various password composition policies. fatigue, complicating usability of the service and potentially decreasing security by encouraging users to write down or • We conduct a survey to understand users’ behavior electronically store lists of passwords. Because of this, the ma- in password construction across different online ac- jority of online services today use relatively simple password counts. This survey sheds light on how people use composition policies, leaving password choice up to users. transformation rules to modify their existing pass- words for different online accounts. To help users understand the security of their passwords a password meter can be used. However, password meters have • A cross-site guessing algorithm which uses a leaked been shown experimentally [46] and in practice [21] to make password at one site to produce guesses for passwords a relatively modest impact on password choices. Furthermore, potentially used by the same user at other sites. designers of password meters must make assumptions about We evaluate the feasibility of our guessing algorithm the way attackers will guess passwords (e.g., brute force). An under both offline and online attack scenarios. attacker with better knowledge of how the user constructed the The rest of the paper is organized as follows: SectionII password may be able to easily guess a password designated gives an overview of some password composition policies and as strong by a password meter. related work. Section III introduces our measurement studies and results. We summarize our survey results in SectionIV. We B. Related Work describe the basic design principle of our guessing algorithm There are alternatives to passwords, such as biometrics, in SectionV. Finally we discuss some implications of our public-key certificates, and hardware tokens. However, au- research in SectionVI and conclude in Section VII. thentication using text-based passwords remains the de facto standard for authentication in today’s Internet. Passwords have II. BACKGROUND AND RELATED WORK several advantages over alternative schemes, including the lack of need for custom hardware, ease of deployment and To better understand how passwords are composed we incorporation with existing services, their simplicity, and lack first look at some of the most frequently used password of need to memorize or store complex cryptographic keys [22]. composition policies. We will then summarize recent academic literature in the field of password analysis. A large body of research has been conducted in understand- ing the security of user-created passwords. For example, there A. Password Composition Policies has been studies that look into understanding users’ reaction to visual and textual based feedback provided by password meters The hardest passwords to crack are random character [46]. Others have looked at the strength of different password strings. However, such passwords are considered too difficult composition policies [21], [35], [41], [44]. These studies have to remember in many applications. Given the choice, human shown that users do tend to create stronger passwords in the users typically produce a highly-skewed distribution of pass- presence of password composition policies accompanied with word choices reflecting common semantics that ease recall password meters. Interestingly, Bonneau et al. [24] showed that (e.g., incorporating a word or number sequence with special most websites do have some password policy, but provide no meaning). To prevent users from selecting passwords that are strength meter. too simple (and therefore potentially common), designers of password authentication mechanisms often impose a password There have also been studies that analyze password attack composition policy, comprising a set of rules that constrain the strategies beyond brute force and dictionary attacks. The length and formation of passwords. Here is an example policy: most well-known password guessing algorithms are based on Markov models and probabilistic context free grammars. I. Passwords must not contain the user’s entire Markov chain-based modeling of substrings with parametric name/user ID.