Fast Dictionary Attacks on Passwords Using Time-Space Tradeoff
Total Page:16
File Type:pdf, Size:1020Kb
Fast Dictionary Attacks on Passwords Using Time-Space Tradeoff Arvind Narayanan and Vitaly Shmatikov The University of Texas at Austin {arvindn,shmat}@cs.utexas.edu ABSTRACT 1. INTRODUCTION Human-memorable passwords are a mainstay of computer It is often said that humans are always the weakest link security. To decrease vulnerability of passwords to brute- in the security chain. Social engineering is more frequently force dictionary attacks, many organizations enforce com- successful in penetrating a system than buffer overflow at- plicated password-creation rules and require that passwords tacks. Similarly, cracking a password is rarely accomplished include numerals and special characters. We demonstrate by breaking the cipher used and more often by exploiting that as long as passwords remain human-memorable, they the circumstance that it was generated by a human. are vulnerable to “smart-dictionary” attacks even when the The problem of guessing human-generated passwords has space of potential passwords is large. been studied for a long time. Morris and Thompson [27] Our first insight is that the distribution of letters in easy- in their 1979 paper on UNIX password security describe to-remember passwords is likely to be similar to the distribu- brute-force and dictionary attacks that are all too familiar to tion of letters in the users’ native language. Using standard the modern reader. The former is based on the observation Markov modeling techniques from natural language process- that short strings are easier for humans to remember, and ing, this can be used to dramatically reduce the size of the the latter on the fact that meaningful words are far more password space to be searched. Our second contribution is memorable than random character sequences. an algorithm for efficient enumeration of the remaining pass- The state of the art in dictionary attacks has not ad- word space. This allows application of time-space tradeoff vanced much since then. Current password crackers such techniques, limiting memory accesses to a relatively small as John the Ripper [29] use essentially the same two tech- table of “partial dictionary” sizes and enabling a very fast niques, plus a few rules to turn each dictionary word into a dictionary attack. set of related words (such as suffixing a digit). The brute- We evaluated our method on a database of real-world force attack can be made somewhat more efficient by us- user password hashes. Our algorithm successfully recovered ing a time-space tradeoff such as Oechslin’s “rainbow” tech- 67.6% of the passwords using a 2 × 109 search space. This is nique [28, 1], which employs precomputation to speed up a much higher percentage than Oechslin’s “rainbow” attack, the process of cracking individual passwords. A common which is the fastest currently known technique for searching defense is the use of password-creation rules, called composi- large keyspaces. These results call into question viability of tion rules, that require passwords to be drawn from certain human-memorable character-sequence passwords as an au- regular languages, and require passwords to include digits thentication mechanism. and non-alphanumeric characters. The goal is to increase the size of the search space, making naive dictionary attacks Categories and Subject Descriptors: infeasible. K.6.5[Security and Protection]: Authentication; We demonstrate that fast, devastating “smart dictionary D.4.6[Security and Protection]: Authentication; attacks” can be staged even on large password spaces that E.3[Data Encryption]: Code breaking are not vulnerable to brute-force or straightforward dictio- nary attacks. Our insight is that passwords, even if long General Terms: Security and sprinkled with extra characters, must still be human- memorable. Human limitations imply that the entropy of Keywords: Passwords, Dictionary Attack, Time-Space passwords is rather small: an NIST document [7] estimates Tradeoff, Cryptanalysis, Markov Models that user-generated 8-character passwords have between 18 and 30 bits of randomness, even when English dictionary words are filtered out and composition rules are enforced. In a very general sense, our attacks work by rapidly enu- merating candidate passwords that can be produced with Permission to make digital or hard copies of all or part of this work for a small amount of randomness. For instance, the string personal or classroom use is granted without fee provided that copies are asasasasasasasasasasasasasasasas, at 32 characters, is not made or distributed for profit or commercial advantage and that copies far beyond the reach of brute-force attacks using current bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific hardware. Intuitively, however, this string is not very ran- permission and/or a fee. dom, and should be easily guessable. CCS'05, November 7–11, 2005, Alexandria, Virginia, USA. Formally, this can be modeled by saying that the Kol- Copyright 2005 ACM 1-59593-226-7/05/0011 ...$5.00. mogorov complexity [23] of this string is low. The Kol- (i.e., more than two thirds of the passwords were success- mogorov complexity (a.k.a. K-complexity) of a string is de- fully recovered), using a search space of size 2×109. Detailed fined as the length of the shortest Turing Machine that results are presented in Section 5. The filters we used can outputs it and then halts. The sequence of instructions be found in Appendix A. By contrast, Oechslin’s attack repeat print "as" 16 times in any reasonable encoding achieves coverage of only 27.5%1. scheme should have a complexity of around 30 bits, which We believe that with a larger search space (around 3 × is in keeping with our intuition that the resulting string is 1012, which is what the passcracking.com implementation easily guessable. uses), and more comprehensive regular expressions, coverage An obvious approach would be to try to enumerate all of up to 90% can be achieved. A more rigorous and larger strings which have a K-complexity smaller than some thresh- experiment with the rainbowcrack [33] implementation is old, and use the result as the dictionary. There are two prob- in progress. lems with this. First, K-complexity is uncomputable. Sec- Defenses against dictionary attacks. Salting of pass- ond, human perception of randomness in different from com- word hashes defeats offline dictionary attacks based on pre- putational randomness. Subjective randomness has been computation, and thus foils our hybrid attack. studied by cognitive scientists Griffiths and Tenenbaum [15, Using an inefficient cipher slows the attacker down by a 16], who argue that we need a different way to measure the constant factor, and this is in fact done in the UNIX crypt() perceived information content of a string. What is needed is implementation. This technique, however, can only yield a a rigorous multidisciplinary study of human password gen- limited benefit because of the range of platforms that the eration processes, as well as the algorithms for very efficient client may be running. Javascript implementations in some enumeration of the resulting password spaces. The existence browsers, for example, are extremely slow. of such algorithms would be a strong indicator of vulnera- Feldmeier and Karn [10] surveyed methods to improve bility to fast dictionary attacks. password security and concluded that the only technique We take the first steps in both directions. We posit that offering a substantial long term improvement is for users to phonetic similarity with words in the user’s native language increase the entropy of the passwords they generate. As we is a major contributor to memorability. We capture this show, achieving this is far harder than previously supposed. property using “Markovian filters” based on Markov mod- There is also a large body of work, subsequent to the els, which is a standard technique in natural language pro- above survey, on password-authenticated cryptographic pro- cessing [32]. Our other observation is that the way in which tocols and session key generation from human-memorable humans use non-alphabetic characters in passwords can be passwords [3, 24, 2, 6, 19, 14, 13]. The objective of these modeled by finite automata, which can be considered a gen- protocols is to defeat offline dictionary attacks on protocols eralization of the “rules” used by John the Ripper and other where participants share a low-entropy secret. password cracking tools. One drawback of password-authenticated key exchange Secondly, we present an algorithm to efficiently enumer- (PAKE) protocols is that they typically rely on unrealistic ate all strings of a given length that satisfy a Markovian assumptions such as multiple noncooperating servers or both filter, and an algorithm to efficiently enumerate all strings parties storing the password in plaintext (one exception is of a given length accepted by a deterministic finite automa- the PAK-X protocol of [6]). Storing client passwords on the ton. More precisely, we give algorithms that, given an index th server is very dangerous in practice, yet even for “provably i, generate the i string satisfying each of these proper- secure” PAKE protocols, security proofs implicitly assume ties (without searching through the entire space!). Next, that the server cannot be compromised. we combine these two algorithms into a hybrid algorithm th Furthermore, our attacks apply in a limited sense even that produces the i string satisfying both the Markovian to PAKE protocols protocols because our Markovian filters filter and the regular language. Finally, we show how to use also make online dictionary attacks much faster. Thus, our this algorithm to implement a time-space tradeoff over this attacks call into our question whether it is ever meaningful search space. for humans to generate their own character-sequence pass- The best-known brute-force attack technique using the words.