<<

& security 73 (2018) 507–518

Available online at www.sciencedirect.com ScienceDirect

journal homepage: www.elsevier.com/locate/cose

LPSE: Lightweight -strength estimation for password meters

Yimin Guo *, Zhenfeng Zhang

Trusted Computing and Information Assurance Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing, China

ARTICLE INFO ABSTRACT

Article history: User-created strong are the to guaranteeing the security of password au- Received 20 April 2017 thentication. In practice, users often choose passwords that feel safe and that they can Received in revised form 7 July 2017 remember easily. However, the user’s perception of the strength of passwords is inconsis- Accepted 11 July 2017 tent with the actual strength of these passwords. To encourage users to create strong Available online 21 December 2017 passwords, many websites use password meters to visualize the strengths of user-chosen passwords, whereas the existing password meters have limited accuracy. The state-of-the- Keywords: art password-guessing approaches have high accuracy in testing the strengths of passwords, Password meter but these algorithms are not suitable for detecting user password strength directly on the Password-strength estimation client side, due to the long running time and the data storage problem. In this paper, we Cosine-length similarity propose a lightweight password-strength estimation method (LPSE). By testing the strong Edit-length similarity and weak passwords selected by a state-of-the-art -algorithm, we ob- Password guessing served that our LPSE algorithm is superior to the existing lightweight password-strength estimation algorithms in the accurate identification of strong passwords and weak pass- words. Moreover, the LPSE algorithm requires notably little storage space and is sufficiently fast for client-side measurement of password strength. © 2017 Elsevier Ltd. All rights reserved.

assigned passwords can be hard to guess, but they are also 1. Introduction difficult to remember for users. Password composition poli- cies require that each password meet certain requirements (e.g. Password is the first line of defense in protect- at least 8 characters in length and at least from three of four ing network systems. Text passwords are a common form of character classes), which can also make passwords difficult to authentication (Stobert and Biddle, 2014). To ensure the secu- guess (Komanduri et al., 2011). However, a strict password com- rity of text-password authentication, many systems require position policy can lead to user frustration, and the user may users to choose strong passwords. User-created passwords are choose the password in a simple and predictable way to meet easy to remember, but are also vulnerable to guessing attacks the policy requirements (Komanduri et al., 2011; Shay et al., (Das et al., 2014; Stobert and Biddle, 2014; Ur et al., 2016; Wash 2010). Another way to permit users to select strong pass- et al., 2016). To prevent the emergence of weak passwords, words is to employ proactive password checking (Bishop and system administrators have adopted a variety of measures, in- Klein, 1995). Recently, proactive password checkers are being cluding system-assigned passwords (Al-Ameen et al., 2015; Huh deployed as password meters on many websites to encour- et al., 2015) and strict password composition policies (Inglesant age users to choose strong passwords (Shay et al., 2015). The and Sasse, 2010; Shay et al., 2016; Weir et al., 2010). System- password meter is usually a visual representation of password

* Corresponding author. E-mail address: [email protected] (Y. Guo). https://doi.org/10.1016/j.cose.2017.07.012 0167-4048/© 2017 Elsevier Ltd. All rights reserved. 508 computers & security 73 (2018) 507–518

strength on the screen with text and color bars. Evidence shows niques are computationally intensive, require a large amount that users are influenced by password meters in their pass- of storage space, and are not suitable for deployment on the word choices when informed about password strength (Egelman client side for measuring password strength (Kelley et al., 2012; et al., 2013; Ur et al., 2012). Ur et al. (2012) found that various Melicher et al., 2016). It is also not appropriate to adopt a visual password meters encourage users to create longer pass- common password dictionary on the client side, because a words. The password meters also affect the users’ act of common password dictionary not only needs some storage password creation, and lead users to choose different charac- space but also increases the searching time of the password- ters with which to build their passwords, such as digits, special evaluation algorithms. In addition, a common password symbols, and uppercase letters. dictionary includes words in only one language, and the most To help users choose stronger passwords, many websites common words in passwords in other languages are not have deployed password meters to provide visual feedback on recognized. password strength. However, previous study has shown that To evaluate accurately the strengths of user-created pass- the existing password meters have limited accuracy as they words in real time, we propose a lightweight password- often label weak passwords as strong and mark strong pass- strength estimation method (LPSE) that is suitable for running wordsasweak(Castelluccia et al., 2012), and different password entirely on the client side. LPSE measures the strength of a pass- meters give highly inconsistent strength outcomes for the same word by comparing the similarity in the structure and the password (de Carnavalet and Mannan, 2014, 2015; Ji et al., 2015). distance between a given password and a standard strong pass- The limited accuracy of existing password meters may confuse word. A large number of strong passwords and weak passwords, users in choosing a stronger password. Most of the existing selected by the state-of-the-art password-cracking algo- password meters are deployed on either the server or the client, rithm, were tested. We found that the similarity-evaluation and they employ rule-based methods to measure password method can more fully capture the strong and weak pass- strength. Simple rule-based strength-estimation algorithms word features and can determine the password strength more cannot capture the complexity of passwords, making it diffi- accurately than the existing password-strength meters. cult to accurately estimate the strengths of given passwords Our main contributions are the following. (Melicher et al., 2016). Recent researches (de Carnavalet and Mannan, 2015; Melicher et al., 2016) show that the zxcvbn al- (1) We represent passwords by vectors that contain the com- gorithm (Wheeler, 2016) uses some reasonable evaluation rules, ponents of the different character types and lengths that and gives more accurate strength estimates than other rule- make up the password. By determining the similarity based password-strength algorithms. between a given password vector and the standard It is a challenging task to design a password-strength evalu- strong-password vector in terms of structure, vector ation method that is suitable for password meters. Due to the modulus and password distance, it can accurately capture complexity of passwords, there are still some deviations in the characteristics of a given password from multiple strength measurements of the same password using the most perspectives. advanced password-guessing techniques (Ur et al., 2015). We (2) The password-evaluation method proposed in this paper believe that an efficient lightweight password meter should be is lightweight: it not only can accurately identify the capable of accurately determining the strength of a proposed password strength but also has such advantages as password on the user’s client machine. There are security and small storage space, fast running speed, and ease of latency problems with the password meters deployed on the implementation. It is suitable for client-side password server side (Melicher et al., 2016; Van Acker et al., 2015). If the checking. password-strength estimation is done on the client side, it is (3) The accuracies of our scheme and several existing necessary to run the password strength meter in real time with schemes are evaluated by performing a number of ex- small storage. de Carnavalet and Mannan (2015) suggest that periments. We compared the impacts of these schemes several factors must be considered in designing a high- on the usability and security of user-chosen pass- accuracy password checker: inherent patterns in user choice, words, and found that LPSE has the lowest false-negative dictionaries used in cracking tools, exposure of large pass- rate in determining the password strength. word databases, and user adaptation against password policies. Designing such a password checker would apparently be a dif- ficult task. One cannot know what dictionaries the attackers 2. Related work are using, nor can one know all the leaked password data- bases. To simplify these challenges, de Carnavalet and Mannan There are two primary ways to measure password strength. (2015) suggest that the primary goal of password meters is only The first approach relies on the complexity of the password to detect weak passwords by leveraging known password- itself, such as evaluating password strength as Shannon entropy cracking techniques and common password dictionaries. (Bonneau et al., 2015; Lin, 1991) or using some statistical Current password-cracking techniques, such as probability- methods (Bonneau, 2012a, 2012b; Weir et al., 2010). The second based guessing algorithms (Ma et al., 2014; Narayanan and approach is to simulate the adversary’s password-guessing ca- Shmatikov, 2005; Weir et al., 2009), are the most accurate way pabilities, such as measuring password strength as the number to measuring password strength. They use guessability (how of guesses that an attacker would need to guess a given pass- many guesses a given algorithm, with a given set of param- word (Ur et al., 2015). eters and training sets, requires to guess a password) as the Early efforts to measure password strength often use pass- password-strength metric. Probability-based guessing tech- word entropy (Florencio and Herley, 2007; Forget et al., 2008). computers & security 73 (2018) 507–518 509

Many studies have shown that the password-entropy metric the number of guesses for long passwords and/or passwords is not suitable for measuring the strengths of user-chosen pass- that do not appear in dictionaries. The most well-known pass- words and is only suitable for assessing the strengths of word probabilistic attack methods are probabilistic context- randomly generated passwords (Egelman et al., 2013; Florêncio free grammars (Weir et al., 2009) and methods based on Markov et al., 2014). For user-created passwords, the entropy metric models (Narayanan and Shmatikov, 2005). The underlying idea needs to know the probability distribution of passwords. An of probabilistic context-free grammar is that a typical pass- accurate measure of the password probability distribution re- word has a certain structure, and the probabilities of different quires a large number of password samples (Bonneau, 2012a; structures are extracted from the list of leaked passwords, which Paninski, 2003). Password checkers for the password meters are then used to generate password guesses. The technique con- often use the “entropy” or “score” of the password to measure sists of two steps: the first is constructing the probabilistic password strength, such as NIST password entropy (Burr et al., context-free grammar from a leaked password training set, and 2013) and password checker. This type of approach the second is automatically extracting mangling rules by train- is also referred to as rule-based password-strength mea- ing the context-free grammar, and generating the actual guesses sures. Rule-based methods measure the “entropy” of a password in descending order of probability. Narayanan and Shmatikov using various bonus and decrement rules. Password scores are (2005) first used the Markov model to represent the password usually determined by password length, the use of digits, low- structures and calculate the probability of each password struc- ercase letters, uppercase letters, and special characters in the ture. The basic idea of the Markov model is that the adjacent password, and whether or not the password contains black- letters in a human-generated password are not chosen inde- listed words. zxcvbn (Wheeler, 2016) is a very good rule- pendently but follow some specific regularity. Narayanan and based password measure algorithm, which divides a given Shmatikov’s algorithm cannot enumerate passwords in de- password into several patterns, and then evaluates the entropy scending order of probability. Dürmuth et al. (2015) then of each pattern separately. The following patterns are consid- implemented this feature. ered: repeat (e.g., 111, sss, 10gf10gf); sequence (e.g., 123, 246, To compute directly the number of guesses for a given pass- efgh); reversed (to reverse a word, e.g., DrowssaP); keyboard (e.g., word, Kelley et al. (2012) designed a guess-number calculator qwerty, qAzxcde3); date (e.g., 7/8/1990, 07081992). If any of these for password-guessing algorithms. It maps a password to the weak patterns appear in a given password, the algorithm only number of guesses required to guess the password. The guess- considers that particular “part” of the password to be drawn number calculator does not need to run a guessing algorithm from a small space of possibilities in estimating the overall but can figure out the number of guesses for each password. strength. zxcvbn also embeds multiple dictionaries to check For example, for the PCFG algorithm, the guess-number cal- the given password: if a subpart of the password is found in culator creates a lookup table based on a training set, and then a dictionary, the entropy value is assigned based on the rank computes the number of guesses for each password. However, of the subpart in the dictionary. The remaining part of a given it is computationally slow to create this lookup table. Dell’Amico password that is not contained in the dictionary is consid- et al. (2010) adopted an approximation technique to estimate ered a random string. The final password entropy is calculated the number of guesses when using n-gram models. Dell’Amico as the sum of the entropy of each pattern. and Filippone (2015) proposed a method to count the number Password guessing, which measures the strength of a pass- of guesses for a given password that does not need to run a word from the perspective of an attacker’s ability, is an approach password-guessing algorithm. This method requires less that is in many ways similar to password estimating. The training-data resources, and has good convergence proper- password-guessing techniques enumerate the guesses in de- ties. Castelluccia et al. (2012) presented the design of password scending order of probability, which means that most frequent meters using Markov models. This approach adopts the Markov passwords are first tested, and a password that is guessed later model to assign probabilities for each password, and then takes is stronger. Therefore, researchers often use the number of the probability of the password as the password strength. guesses to indicate password strength (Ur et al., 2015). Melicher et al. (2016) proposed a password-guessing attack John the Ripper (JtR) (OpenWall, 1990) is one of the most method using an artificial neural network, and the neural popular password-cracking tools. It supports multiple modes network can be compressed to several hundred kilobytes to generate password guesses. In dictionary mode, a diction- without substantially diminishing the effectiveness of pass- ary and various mangling rules are used to generate guesses. word guessing. Similar to Markov models, neural networks in Mangling rules are used to model common behaviors in how this method are trained to generate the next character of a pass- users create passwords. They can generate variants of words word given the preceding characters of a password. from an input dictionary, which allows for more efficient guess- Rule-based algorithms estimate strength from the perspec- work. For example, a mangling rule may append a digit to a tive of the password structure and some weak password word, or replace the letter a with @. patterns, and their accuracies depend on the ability to capture At present, the state-of-the-art password-guessing algo- the characteristics of the user-chosen passwords. Due to the rithms are based on the probability password models complexity of the passwords that the users choose, the current (Narayanan and Shmatikov, 2005; Weir et al., 2009). Probabi- rule-based password-strength measure is not sufficiently ac- listic approaches establish a model by using plaintext password curate (de Carnavalet and Mannan, 2015). Password-guessing training data (usually obtained from publicly leaked pass- techniques are considered the most accurate measures of words), in which each password has a given probability of being password strength (Dell’Amico and Filippone, 2015; Kelley selected, and then enumerate the guesses in descending order et al., 2012), but these techniques require very expensive of probability. Probabilistic attack methods can greatly reduce computational efforts and very large disk space, thus they 510 computers & security 73 (2018) 507–518

are not well-suited to measure password strength for pass- the two passwords and, hence, the given password may be word meters. assumed to be a strong password. To design an effective password-strength evaluation method For a given password α, its cosine-length similarity to the that can run on the client side, we propose a similarity- standard strong password is expressed as sc(α), and its password- evaluation method, which measures password strength by distance similarity to the standard strong password is denoted evaluating the similarity between a given password and a stan- as sp(α). Then the strength of the password α is represented dard strong password in many ways. The similarity method as Tsα = ( cp()αα, s ()). evaluates password strength from the aspects of password structure, password vector modulus and password distance; 3.2. Password-vector calculation therefore, it can capture the characteristics of a password more comprehensively than the existing rule-based password- The numbers of different types of characters appearing in the strength methods, and it has higher accuracy. From the password are mapped to a frequency, together with the pass- experimental results, it can be seen that the false-negative rate word length. A password can be expressed as a vector α = of the similarity-evaluation method is significantly lower than ()xxxxx12345,,,, , where x1, x2, x3, x4, x5 represent, respec- those of other methods. tively, the vector components of digits, lowercase letters, uppercase letters, special characters, and password length. The general rules for mapping different types of characters to vector values are shown in Table 1. The vector values for uppercase 3. Our LPSE algorithm letters and special characters are larger than the vector values for digits and lowercase letters, since passwords that contain 3.1. Overview uppercase letters and special characters are stronger (Li et al., 2014; Ur et al., 2015). The vector value for two identical or con- Our LPSE algorithm adopts two kinds of similarity to measure secutive characters is equivalent to the vector values of a the strength of a given password, namely, cosine-length simi- character, such as 1*1* and 1* are the same vector values, A1B2 larity and password-edit distance similarity. The password is and A1 also have the same vector values. Assuming a pass- represented by a vector that consists of digits, lowercase letters, word is aa35*TX1, the vector values of this password string can uppercase letters, special characters, and the length of the pass- be expressed as α = (3,1,4,3,8). word. Password strength can be measured by comparing the Many studies have shown that users prefer to choose some similarity between a given password vector and a standard weak password patterns (Das et al., 2014; Ji et al., 2015; Li et al., strong-password vector. 2014). In the calculation of a given password vector, we first We determine the similarity between the two password determine whether there is a weak password pattern, and the vectors from three aspects: the structure of the password (that weak pattern will be assigned a low vector value. The common is, what kinds of characters compose the password and the pro- weak password patterns are shown in Table 2. portions of various characters); the password length; and the Certain password-strength evaluation algorithms often use number of insertion, substitution, and deletion operations re- a dictionary to check whether a given password contains quired to transform a given password into a standard strong common words, such as zxcvbn (Wheeler, 2016). The advan- password. Obviously, if a given password is more similar to a tage of using a dictionary check is that when the user chooses standard strong password in terms of structure and pass- a password containing common words, they can be quickly word length, and the number of insertion, substitution, and identified, but the use of the dictionary will take up a certain deletion operations required to transform it into a standard amount of storage space, and there are such issues as slow strong password is lower, then the password is stronger. matching and low coverage. In addition, when the user chooses Cosine similarity determines the difference in direction a word that is not in the checked dictionary, the system de- between the two vectors, so it can be used to check whether termines that it is a random word. the two password vectors are similar in structure (Bayardo et al., To identify whether a password contains common words, 2007). However, cosine similarity cannot determine the simi- we determine whether a password contains the most common larity in length between the two vectors. To this end, we propose a cosine-length similarity, which is an improved method of cosine similarity. The cosine-length similarity can determine Table 1 – General rules for mapping characters to the similarity between two password vectors in password struc- vectors. ture and password-vector modulus. Patterns Vector value Example The edit distance is the minimum number of operations re- (character→vector quired to convert a string into another string by inserting, value) replacing and deleting (Ristad and Yianilos, 1998). Since the Digits 1 8→1 standard strong password refers to a class of passwords with Lowercase letters 1 d→1 certain characteristics, rather than a specific password, the im- Uppercase letters 2 G→2 proved password-edit distance is the minimum number of Special characters 3 &→3 insertion, substitution, and deletion operations required to Two identical Equivalent to one aa→1, 3a3a→2 transform a given password into a standard strong password. characters character vector → → The smaller the edit distance between the given password and Two consecutive Equivalent to one AB 2,1a2b 2 characters character vector a standard strong password the greater the similarity between computers & security 73 (2018) 507–518 511

password is of equal probability (C−L). An attacker cannot search Table 2 – The common weak password patterns. the password space in descending order of probability, and Patterns Examples would have no better strategy than to guess passwords at Password string contains Password “Password123” contains random, so the search cost is CL. The length of a strong pass- common words the “password” word should also meet certain requirements. In their study, Shay Keyboard pattern Qwer, zxcvb, qazwsx et al. (2014) found that simple 16-character passwords are stron- Password string matches Assuming the user name is zhouh ger than short complex passwords. Most passwords are less the user name and the password is zhou356 than 12 characters long based on the passwords that have been Password string matches Assuming the registration is in the registered site name hotmail and the password is leaked, and only a small number of passwords are longer than hotmail123 16 characters (Ji et al., 2015). Based on this evidence, we believe Date pattern 910212, 20120828, 070867 that a strong password should be randomly generated, and the Leeting pattern a→@, s→$, 1→! password length should be greater than 16 characters. For example, to construct a standard strong password with a length of 18 characters, suppose there are 94 characters that can be used to form the password and these 94 characters consist of two-character combinations, three-letter combinations, and 10 digits, 26 lowercase letters, 26 uppercase letters, and 32 starting and ending letters. The 30 most frequently occurring special characters. Since each character appears with equal two-letter combinations in English include: th he in er an re probability, a strong password with a length of 18 characters ed on es st en at to nt ha nd ou ea ng as or ti is et it ar te se hi should contain 2 digits, 5 lowercase letters, 5 uppercase letters, of. The three-letter combinations with the highest frequen- and 6 special characters. The password vector is formed ac- cies (in descending order of frequency) are: the ing and her ere cording to the frequencies of the characters and their weights. ent tha nth was eth for dth hat she ion int his sth ers ver. Half Thus a standard strong-password vector with a length of 18 of all English words begin with t, a, s, or w, and half end in e, characters can be αs = (2,5,10,18,18), where the component values s, d, or t. If these frequently occurring patterns are found, they respectively represent digits, lowercase letters, uppercase letters, are mapped to the lowest possible values. The password- special characters, and password length. vector mapping rules for the weak mode are shown in Table 3. 3.4. Password similarity 3.3. Standard strong-password vector 3.4.1. Cosine-length similarity The probability distribution of the most secure passwords is Assuming that a given password vector is denoted by evenly distributed, so we select the standard strong pass- α = ()xxxxx12345,,,, , and the standard strong-password vector word that satisfies two conditions: is denoted by αS = ()yyyyy12345,,,, , the cosine similarity of the two vectors is: (1) The password is randomly generated. 5 (2) The password is sufficiently long (e.g. at least 12 char- × ∑()xyii acters in length). = cos()θ = i 1 5 5 ()2 × 2 ∑∑xyi ()i We believe that if a password is sufficiently long and the i==1 i 1 different types of characters that make up the password string and of equal probability then we consider the password to be The cosine similarity can only represent the structural simi- strong. For example, for a randomly generated password with larity between two passwords, and does not reflect the similarity a length of L, and drawn from an alphabet of C characters, each in vector modulus between the two passwords. Considering the

Table 3 – Vector-mapping rules for weak-password patterns. Weak-password patterns Vector calculation Example (password: character→vector value)

The most common 2-letter and 2-letter and 3-letter combinations are each Goodidea47: ea→1 3-letter combinations, and starting calculated as one letter vector; Starting and estimation33: ion→1 and ending letters ending letters are mapped to 0 password123: d→0 Keyboard pattern Equivalent to a continuous character vector qwer123: qw→1,er→1 (see Table 1) Password string matches the user User name is equivalent to a lowercase-letter zhou356: zhou→1 name vector (zhou356 is a password, zhouh is the user name) Password string matches the Site name is equivalent to a lowercase-letter hotmail123: hotmail→1 registered site name vector (hotmail123 is a password, hotmail is the site name) Date pattern Equivalent to a 3-digit vector 07081967: 07081967→3 Leeting pattern Only the 3 most common leeting transforms P@ssword123: @→1 (a→@, s→$, 1→!) are considered. Equivalent to a lowercase-letter vector 512 computers & security 73 (2018) 507–518

modulus lengths of the vectors, cosine-length similarity can The greater the edit distance between a given password and be defined as: a standard strong password, the greater the difference between the two passwords. Conversely, smaller edit distance between min()XY , a given password and a standard strong password denotes that Pxy = cos()θ i , max()XY , the given password contains the properties of a standard strong password. Therefore; the similarity between them should be where |X| and |Y| denote the moduli of the given password vector smaller. Using passworddist (P, T) to denote the edit distance and the standard strong-password vector, respectively. between a given password P and a standard strong password T, the password-edit distance similarity is: 5 5 XxYy= ∑∑()2 ,.= ()2 i i passworddist() P, T i==1 i 1 Passwordsimilarity =−1 , max()PT ,

3.4.2. Improved password-distance similarity where |P| denotes the sum of the components of the given pass- The improved password-edit distance is the minimum cost for word and |T| denotes the sum of the components of the transforming a given password into a standard strong pass- standard strong password. For example, supposing the stan- word by insertion, deletion, and substitution operations. We dard strong-password vector is (2,5,10,18,18), the |T|valueis53. specify that any two adjacent characters in a standard strong password are of different types. 3.5. Password-strength evaluation The edit-distance operation rules for converting a given password to a standard strong-password type are as To evaluate the strength of a given password, we first need to follows: determine the thresholds of the cosine-length similarity and edit-distance similarity, that is, determine the ranges of values (1) The edit-distance rule for a deletion is a subtraction op- of these two similarities in which the password is a strong pass- eration: for the deletion of a digit or lowercase letter, word or a weak password. To calculate the thresholds, we subtract 1 from the edit distance; for the deletion of an analyze a variety of strong passwords and weak passwords that uppercase letter, subtract 2 from the edit distance; for have been selected randomly from a large number of leaked the deletion of a special character, subtract 3 from the password lists using the most advanced password-cracking edit distance 3. tools. Section 4.3 details the analysis process and results. There- (2) The edit-distance rule for an insertion is an addition op- fore, with the threshold values of the two similarities, we can eration: for the insertion of a digit or a lowercase letter, finally determine the strength of a given password. add 1 to the edit distance; for the insertion of an up- percase letter, add 2 to the edit distance; for the insertion of a special character, add 3 to the edit distance. (3) The edit-distance rule for a substitution may be an ad- 4. Evaluating LPSE performance dition or a subtraction operation. If a lowercase letter or a digit is replaced with an uppercase letter, the edit dis- In this section, we first determine the thresholds for the two tance is incremented by 1. If a lowercase letter or a digit similarities and later compare the LPSE meter with other light- is replaced with a special character, the edit distance is weight password-strength evaluation methods. incremented by 2. Lowercase letters are converted to digits, and digits are converted to lowercase letters, 4.1. Datasets without changing the edit distance. If a special charac- ter is replaced with a lowercase letter, the edit distance Due to various reasons, in the past few years, a large number should be decremented by 2. of user passwords have been leaked. These leaked password lists provide the largest samples of real-world passwords. To The password-edit distance also takes into account the fol- test the LPSE performance, we randomly select passwords from lowing special calculation rules. the multiple leaked password lists and these passwords are in totally different languages and from totally different (1) If adjacent characters in a password are of different types, populations. the edit distance is reduced by 0.5 for each occurrence; These selected passwords are divided into 3 subsets: train- if adjacent characters are of the same type, the edit dis- ing dataset, test dataset 1 and test dataset 2. The training dataset tance is increased by 0.5 for each occurrence. is used to train the PCFG algorithm to calculate the number (2) If the given password length is less than 7 characters, of guesses of passwords. Test dataset 1 is used to determine the edit distance is set to the maximum value. the threshold values of the two similarities for strong and weak (3) If the given password contains only one type of char- passwords, and test dataset 2 is used to test the perfor- acters, the edit distance is increased by 6; if it includes mances of different password-strength estimation algorithms. two types of characters, the edit distance does not We randomly chose 11,561,483 passwords from the leaked change; if it includes three types of characters, the edit RockYou and Tianya password lists as the training data, of which distance is decreased by 2; and if it includes 4 types of 7,599,861 passwords were selected from RockYou and 3,961,622 characters, the edit distance is decreased by 6. passwords were selected from Tianya. 198,270 passwords were computers & security 73 (2018) 507–518 513

with these selected strong and weak passwords, and the test Table 4 – The details of three datasets. results are shown in Fig. 1. It can be seen from Fig. 1(a) that Data Total the cosine-length similarities of the strong passwords are mostly Training dataset RockYou, Tianya 11,561,483 within the interval [0.2–0.6], and the cosine-length similari- Test dataset 1 7k7k, Yahoo 198,270 ties of the weak passwords are within the interval [0.13–0.3]. Test dataset 2 Duduniu, 178, abc, aha, casio, hellfire, 999,211 If the cosine-length similarity is greater than 0.3, or less than mayhem, opnkorea, rootkit, sunrise, 0.2, we can distinguish whether a given password is strong tomsawyer, walla and whitefox or weak. We can also see some overlap between the values 0.2 and 0.3 of cosine-length similarity. In other words, the cosine-length similarity within this range is unable to solely chosen from the 7k7k and Yahoo leaked password lists as test distinguish strong passwords and weak passwords. From dataset 1, and 999,211 passwords as test dataset 2 were se- Fig. 1(b), the password-distance similarities of strong pass- lected from the leaked password lists of Duduniu, 178, abc, aha, words are mainly in the range of [0.45–0.8], while those of casio, hellfire, mayhem, opnkorea, rootkit, sunrise, tomsawyer, walla the weak passwords are mainly distributed between [0.05– and whitefox. The details of the training dataset and test dataset 0.46]. It can be seen that a reasonable choice of the value of 1and 2 are summarized in Table 4. the password-distance similarity can distinguish strong pass- words and weak passwords. 4.2. Testing process There are some differences between the two similarities of strong and weak passwords. The distance similarity and cosine- We calculate the number of guesses of passwords using the length similarity of a strong password are basically proportional Monte Carlo method (Dell’Amico and Filippone, 2015), which to each other (Fig. 1(c)), but the two similarities of a weak pass- does not need to run a probability-guessing algorithm to cal- word do not have a directly proportional relation (Fig. 1(d)). For culate the number of guesses for a given password, and only some weak passwords with large cosine-length similarity, the a small amount of training data is required. The testing process password-distance similarity is relatively small, and the consists of three steps. First, the PCFG algorithm is trained on password-distance similarity is large for some weak pass- the training data set, and the Monte Carlo method is used to words whose cosine-length similarity is small. There are a few calculate the numbers of guesses of test passwords. Second, outliers for which the two similarities are relatively large for according to the numbers of guesses of passwords, the strong weak passwords. The different characteristics of the two simi- and weak passwords are selected from test dataset 1 and the larities for strong and weak passwords make it easily determine threshold values of the LPSE algorithm are determined by using the thresholds to distinguish strong passwords from weak these strong and weak passwords. Finally, the strong and weak passwords. passwords are selected from test dataset 2, and the perfor- Based on the above test results, we give a function to dis- mances of the LPSE algorithm and other password-strength tinguish strong from weak passwords as follows: evaluation algorithms are compared using these strong and weak passwords. ⎧strongif scp()αα≥ 04.. or s ()≥ 055 ⎪ The number of guesses for a password is often used to de- if 03. ≤ s ()α < 04. ⎪ c termine the strength of the password (Ur et al., 2015). Florêncio strong ≤ ()α < Tsα = ( cp()αα, s ()) = ⎨ and 04..sp 055 et al. (2014, 2016) believe that to resist online attacks a pass- ⎪ ()αα≤ ()< ⎪weakif scp019.. or s 035 word must survive 106 guesses and to withstand offline attacks ⎩⎪mediium otherwise a password must survive 1014 or more guesses. Since the number of guesses for a password is related to the password-guessing where sc(α) and sp(α) represent the cosine-length similarity and algorithms and the training dataset, we choose the weak and password-distance similarity of the password, respectively. strong passwords according to the following conditions. We propose a weak password cannot resist more than 1010 guesses, 4.4. Performance-comparison results and a strong password should meet three conditions: (1) it must 14 be able to survive at least 10 guesses, (2) it must contain at We compare the performance of the LPSE algorithm with those least two different types of characters, or its length must be of PM (PM: Passwordmeter) and zxcvbn (Wheeler, 2016). The at least 12, (3) it must contain at least five different charac- password-strength scale of the LPSE algorithm is Weak, Medium, ters (for example, a and b are different characters). Strong. The strength scales of the PM algorithm and zxcvbn algo- 4.3. Determining the LPSE thresholds rithm are expressed in terms of 5-item scales; for comparison, the password strength is handled in accordance with the 3-item

The standard strong-password vector αS = (2,5,10,18,18) is taken scale for all algorithms. The “Very Weak” and “Weak” scale cat- as an example to confirm the threshold values of the LPSE egories of the PM algorithm are denoted as “Weak”, “Good”as algorithm. There are a total of 198,270 passwords in test dataset “Medium”, and “Strong” and “Very Strong”as“Strong”. Likewise, 1; 152,530 passwords remain after deleting repeated pass- the “Very Weak” and “Weak” scale categories of the zxcvbn al- words and unrecognized garbled characters. The method gorithm are denoted as “Weak”, “So-so” is expressed as “Medium”, described in Section 4.2 is used to determine that test dataset and “Good” and “Great” are denoted as “Strong”. We use the 1 contains 5960 strong passwords and 120,419 weak pass- 999,211 passwords in test dataset 2 to analyze the effects of words. The two similarities of the LPSE algorithm are tested the password-strength algorithms. After deleting repeated 514 computers & security 73 (2018) 507–518

Fig. 1 – Comparison of two similarities between strong and weak passwords.

passwords and unrecognized garbled characters, the number shown in Figs. 3 and 4, which clearly indicate the advantages of valid passwords in test dataset 2 is 651,854. In accordance of LPSE for identifying strong and weak passwords. with the criteria for determining strong and weak passwords The results of the performance comparison of the three al- given in Section 4.2, test dataset 2 has 48,813 strong pass- gorithms are shown in Table 5, where βSW− is the false- words, 460,156 weak passwords, and 142,885 medium negative rate for strong passwords that the password-evaluation passwords. The password-strength distribution in test dataset algorithm identified as weak passwords; βWS− is the false- 2 is shown in Fig. 2. negative rate for weak passwords that the password-evaluation

The selected strong and weak passwords are used to test algorithm identified as strong passwords; and βWSM− is the rate the accuracies with which LPSE, PM and zxcvbn evaluate these at which weak passwords were identified by the password- passwords. The accuracy of a good password-strength evalu- evaluation algorithm as strong passwords or medium ation approach should be close to those of the most advanced passwords. The running time is the time required by the password-guessing techniques. The results of identifying the password-strength algorithm to determine the strength of a strong and weak passwords with these three algorithms are password, which is the average time required to run 10 con- secutive tests of 290,786 passwords in the configuration of CPU AMD A4-4300M 2.50G, RAM 4G.

As seen from Table 5, βSW− and βWS− of the LPSE algorithm are significantly smaller than those of the other two algo- rithms, that is to say, the LPSE algorithm is closest to the state- of-the-art password guessing techniques in accurately determining the password strength. An interesting phenom-

enon is that the zxcvbn algorithm has the highest βSW− value, which means that the algorithm has the highest proportion of strong passwords misjudged as weak passwords. When strong passwords contain several weak patterns, the zxcvbn algo- rithm gives very low “entropy”, which may be due to the zxcvbn algorithm being too severe for weak password patterns. It can also be seen from Table 5 that the effects of the three algorithms on the usability and security of user-chosen pass- words are different. Websites or organizations generally reject Fig. 2 – Password-strength distribution in test dataset 2. weak passwords and accept medium or strong passwords. If computers & security 73 (2018) 507–518 515

Fig. 3 – Result of the three algorithms for identifying strong passwords.

Fig. 4 – Result of the three algorithms for identifying weak passwords.

a password-strength evaluation method gauges the strong pass- βWSM− , indicating that it has the lowest misjudgment rate for word selected by the user as a weak password, the strong weak passwords. As seen from Table 5, the storage space re- password is rejected, which reduces the usability of pass- quired by the LPSE algorithm is 33 k, and the running time for words. If a password-strength evaluation method identifies a measuring the strength of a password is 0.181 ms, which is suit- weak password as a strong password or a medium password, able for evaluating the password strength on the client side. this weak password is accepted, which decreases the secu- Here we compare the relationship between LPSE, PM, zxcvbn rity of passwords. and the ideal password strength function using Spearman’s

The value of βSW− can reflect the impact of a password- rank correlation. The Spearman’s rank correlation coefficient evaluation algorithm on the usability of passwords. βSW− is is a nonparametric technique for evaluating the degree of linear smallest for LPSE among the three algorithms, indicating that association or correlation between two independent vari- the LPSE algorithm has the lowest false-positive rate for strong ables (Gautheir, 2001). The basic idea is that for a given password passwords. βWSM− can reflect the impact of a password-evaluation test set, the password-strength measure algorithm can sort the algorithm on the security of passwords. LPSE has the smallest passwords’ strength values in the password set. The greater the Spearman’s correlation coefficient between the two strong sequences output by a password-strength measure method and the ideal password measure method, the closer the accuracy Table 5 – Performance comparison of the three of the two measure methods is. Spearman’s correlation coef- algorithms. ficients are between −1 and 1, where 1 means that the two β − β − β − Storage T(ms) SW WS WSM sequences are completely positive correlation, −1 indicates LPSE 0 0.014 0.201 33 k 0.181 perfect negative correlation, and 0 indicates no correlation. PM 0.184 0.133 0.471 6.4 k 0.011 The optimal strategy for an attacker is to guess passwords zxcvbn 0.284 0.104 0.298 1.5M 2.030 in descending order of probability. Guessing entropy represents 516 computers & security 73 (2018) 507–518

Fig. 5 – Spearman’s correlation coefficient against the ideal password strength function.

the average number of guesses an attacker needs to guess pass- performance. LPSE algorithm is a combination of the pass- words in an optimal way. Let χ be a probability distribution, word cosine-length similarity and password-distance similarity, the sample space size is N, and the event probability in the reasonably set the thresholds of two similarities, and can there- probability distribution χ satisfies the monotonically decreas- fore more accurately determine whether a password is strong ing sequence p1 ≥ p2≥…≥pN. The guessing entropy G(χ) is defined password or a weak password. But LPSE algorithm some- as: times cannot distinguish which password is stronger or weaker.

For example, suppose the strength of the password α1 is (0.51, N α ()χ =⋅ 0.43), the strength of the password 2 is (0.50, 0.45). According Gip∑ i. i=1 to the strong and weak passwords threshold given in Section

4.3, we can know that α1 and α2 are all the strong passwords, If the order of the output of a function is the same as that but we are unable to determine which one is stronger. This de- of the optimal password guessing sequence, the function can ficiency of the LPSE algorithm does not prevent it to become be used as an ideal password strength measure function. The an excellent password checker for password meter. We believe optimal password guessing order is guessed in descending order that a general password meter only needs to determine whether of probability, so we can use the probability of each password a proposed password is strong or weak and does not need to in the password set to design an ideal password strength func- determine which password is stronger or weaker. tion. Assuming that each password has a fixed probability on a given space of passwords, an ideal password strength func- tion f (x)is(Castelluccia et al., 2012): 5. Conclusions fx()=−log() px() . We propose a new lightweight password-strength measure- ment approach that can comprehensively capture the The PCFG algorithm is used to determine the probability of complexity of user-chosen passwords by comparing the simi- passwords in the 651,854 test dataset 2, which can be approxi- larity of a given password with a standard strong password. mated as an ideal password strength measure function. In the Our LPSE meter’s accuracy is significantly higher than those PCFG algorithm, passwords are grouped by templates, and the of the existing password-evaluation methods for password guessing probability of a given password is the frequency of meters. Using the most advanced password-guessing algo- the password templates multiplied by the frequency of each rithm to select the strong and weak passwords, the false- pattern in the template, such as P(guo@126) = P(L3S1D3) × P(L3→guo) negative rates for the strong passwords and weak passwords

× P(S1→@) × P(D3→126), where P(L3→guo) is the frequency of “guo” of LPSE are obviously less than those of other algorithms. The in three-lowercases groups, P(S1→@) is the frequency of “@” in LPSE algorithm has a memory size of only 33 k and an average one-special character groups, and P(D3→126) denotes the fre- run time of 0.181 ms, which is suitable for assessing the strength quency of “126” in three-digits groups. of user-chosen passwords on the client side. Fig. 5 shows the Spearman’s correlation of the password cosine-length similarity, the password-distance similarity, the REFERENCES zxcvbn and PM, and the ideal password-strength measure func- tion. As a result of the test, it can be seen that the accuracy of cosine-length similarity is superior to that of other methods. Al-Ameen MN, Fatema K, Wright M, Scielzo S. The Impact of The distance similarity, zxcvbn and PM perform similarly poor Cues and User Interaction on the Memorability of computers & security 73 (2018) 507–518 517

System-Assigned Recognition-Based Graphical Passwords. Huh JH, Oh S, Kim H, Beznosov K, Mohan A, Rajagopalan SR. Proceedings of the Eleventh Symposium On Usable Privacy Surpass: System-initiated user-replaceable passwords. and Security (SOUPS 2015). 2015: 185-196. Proceedings of the 22nd ACM SIGSAC Conference on Bayardo RJ, Ma Y, Srikant R. Scaling up all pairs similarity search. and Communications Security. ACM, 2015: Proceedings of the 16th international conference on World 170-181. Wide Web. ACM, 2007: 131-140. Inglesant PG, Sasse MA. The true cost of unusable password Bishop M, Klein DV. Improving system security via proactive policies: password use in the wild. Proceedings of the SIGCHI password checking. Comput Secur 1995;14(3):233–49. Conference on Human Factors in Computing Systems. ACM, Bonneau J. The science of guessing: analyzing an anonymized 2010: 383-392. corpus of 70 million passwords. Proceedings of the 2012 IEEE Ji S, Yang S, Wang T, Liu C, Lee WH, Beyah R. PARS: A Uniform Symposium on Security and Privacy. IEEE, 2012a: 538-552. and Open-source Password Analysis and Research System. Bonneau J. Statistical metrics for individual password strength. Proceedings of the 31st Annual Computer Security Proceedings of the 20th international conference on Security Applications Conference. ACM, 2015: 321-330. Protocols. Springer-Verlag, 2012b: 76-86. Kelley PG, Komanduri S, Mazurek ML, Shay R, Vidas T, Bauer L, Bonneau J, Herley C, Van Oorschot PC, Stajano F. Passwords and et al. Guess again (and again and again): measuring password the evolution of imperfect authentication. Commun ACM strength by simulating password-cracking algorithms. 2015;58(7):78–87. Proceedings of the 2012 IEEE Symposium on Security and Burr WE, Dodson DF, Polk WT. NIST Special Publication 800-63-2 Privacy. IEEE, 2012: 523-537. Electronic Authentication Guideline. Computer Security Komanduri S, Shay R, Kelley PG, Mazurek ML, Bauer L, Christin N, Division, Information Technology Laboratory, National et al. Of passwords and people: measuring the effect of Institute of Standards and Technology; 2013. password-composition policies. Proceedings of the SIGCHI Castelluccia C, Dürmuth M, Perito D. Adaptive Password-Strength Conference on Human Factors in Computing Systems. ACM, Meters from Markov Models. Proceedings of the Network and 2011: 2595-2604. Distributed System Security Symposium (NDSS). 2012: 1-15. Li Z, Han W, Xu W. A large-scale empirical analysis of Chinese de Carnavalet XDC, Mannan M. From very weak to very strong: web passwords. Proceedings of the 23rd USENIX Security analyzing password-strength meters. Proceedings of the Symposium (USENIX Security). 2014: 559-574. Network and Distributed System Security Symposium (NDSS). Lin J. Divergence measures based on the Shannon entropy. IEEE 2014, 14: 23-26. Trans Inf Theory 1991;37(1):145–51. de Carnavalet XDC, Mannan M. A large-scale evaluation of high- Ma J, Yang W, Luo M, Li N. A study of probabilistic password impact password strength meters. ACM Transactions on models. Proceedings of the IEEE Symposium on Security and Information and System Security (TISSEC) 2015;18(1):1. Privacy. IEEE, 2014: 689-704. Das A, Bonneau J, Caesar M, Borisov N, Wang X. The Tangled Web Melicher W, Ur B, Segreti SM, Komanduri S, Bauer L, Christin N, of Password Reuse. Proceedings of the Network and et al. Fast, lean and accurate: modeling password guessability Distributed System Security Symposium (NDSS). 2014, 14: 23- using neural networks. Proceedings of the 25st USENIX 26. Security Symposium (USENIX Security). 2016, 175-191. Dell’Amico M, Filippone M. Monte Carlo strength evaluation: Fast Narayanan A, Shmatikov V. Fast dictionary attacks on passwords and reliable password checking. Proceedings of the 22nd ACM using time-space tradeoff. Proceedings of the 12th ACM SIGSAC Conference on Computer and Communications conference on Computer and communications security. ACM, Security. ACM, 2015: 158-169. 2005: 364-372. Dell’Amico M, Michiardi P, Roudier Y. Password Strength: An OpenWall, John the Ripper. 1990. Available from: http://www Empirical Analysis. Proceedings of the 29th conference on .openwall.com/john. [Accessed January 6, 2018]. Information communications. IEEE Press, 2010: 983-991. Paninski L. Estimation of entropy and mutual information. Dürmuth M, Angelstorf F, Castelluccia C, Perito D, Abdelberi C. Neural Comput 2003;15(6):1191–253. OMEN: faster password guessing using an ordered Markov PM: Passwordmeter. 2009. Available from: http://www enumerator. Proceedings of theInternational Symposium on .passwordmeter.com. [Accessed January 6, 2018]. Engineering Secure Software and Systems. Springer Ristad ES, Yianilos PN. Learning string-edit distance. IEEE Trans International Publishing, 2015: 119-132. Pattern Anal Mach Intell 1998;20(5):522–32. Egelman S, Sotirakopoulos A, Muslukhov I, Beznosov K, Herley C. Shay R, Komanduri S, Kelley PG, Leon PG, Mazurek ML, Bauer L, Does my password go up to eleven?: the impact of password et al. Encountering stronger password requirements: user meters on password selection. Proceedings of the SIGCHI attitudes and behaviors. Proceedings of the Sixth Symposium Conference on Human Factors in Computing Systems. ACM, on Usable Privacy and Security. ACM, 2010: 2. 2013: 2379-2388. Shay R, Komanduri S, Durity AL, Huh PS, Mazurek ML, Florencio D, Herley C. A large-scale study of web password Segreti SM, et al. Can long passwords be secure and habits. Proceedings of the International Conference on World usable? Proceedings of the 32nd annual ACM conference Wide Web, WWW 2007, Banff, Alberta, Canada, May. 2007:657- on Human factors in computing systems. ACM, 2014: 2927- 666. 2936. Florêncio D, Herley C, Van Oorschot PC. An administrator’s guide Shay R, Bauer L, Christin N, Cranor LF, Forget A, Komanduri S, to password research. Proceedings of the28th Large et al. A Spoonful of Sugar?: The Impact of Guidance and Installation System Administration Conference (LISA14). 2014: Feedback on Password-Creation Behavior. Proceedings of the 44-61. 33rd Annual ACM Conference on Human Factors in Florêncio D, Herley C, Van Oorschot PC. Pushing on string: the Computing Systems. ACM, 2015: 2903-2912. ‘don‘t care’ region of password strength. Commun ACM Shay R, Komanduri S, Durity AL, Huh PS, Mazurek ML, Segreti SM, 2016;59(11):66–74. et al. Designing password policies for strength and usability. Forget A, Chiasson S, van Oorschot PC, Biddle R. Improving text ACM Transactions on Information and System Security passwords through persuasion. Proceedings of the 4th (TISSEC) 2016;18(4):13. symposium on Usable privacy and security. ACM, 2008: 1-12. Stobert E, Biddle R. The password life cycle: user behaviour in Gautheir TD. Detecting trends using Spearman’s rank correlation managing passwords. Proceedings of the Symposium On coefficient. Environ Forensics 2001;2(4):359–62. Usable Privacy and Security (SOUPS). 2014: 243-255. 518 computers & security 73 (2018) 507–518

Ur B, Kelley PG, Komanduri S, Lee J, Maass M, Mazurek ML, Weir M, Aggarwal S, De Medeiros B, Glodek B. Password cracking et al. How does your password measure up? the effect of using probabilistic context-free grammars. Proceedings of the strength meters on password creation. Proceedings of the 30th IEEE Symposium on Security and Privacy. IEEE, 2009: 391- 21st USENIX Security Symposium (USENIX Security). 2012: 405. 65-80. Weir M, Aggarwal S, Collins M, Stern H. Testing metrics for Ur B, Noma F, Bees J, Segreti SM, Shay R, Bauer L, et al. “I Added ‘!’ password creation policies by attacking large sets of revealed at the End to Make It Secure”: Observing Password Creation in passwords. Proceedings of the 17th ACM conference on the Lab. Proceedings of the 11th Symposium On Usable Computer and communications security. ACM, 2010: Privacy and Security (SOUPS). 2015: 123-140. 162-175. Ur B, Segreti SM, Bauer L, Christin N, Cranor LF, Komanduri S, Wheeler DL. zxcvbn: low-budget password strength estimation. et al. Measuring real-world accuracies and biases in modeling Proceedings of the 25st USENIX Security Symposium (USENIX password guessability. Proceedings of the 24st USENIX Security). 2016, 157-173. Security Symposium (USENIX Security). 2015: 463-481. Ur B, Bees J, Segreti SM, Bauer L, Christin N, Cranor LF. Do Users’ Yimin Guo was born in 1992. She received the M.S. degree in de- Perceptions of Password Security Match Reality? Proceedings partment of computer science from Shaanxi Normal University. She of the 2016 CHI Conference on Human Factors in Computing is currently pursuing the Ph.D. degree at Trusted Computing and Systems. ACM, 2016: 3748-3760. Information Assurance Laboratory, Institute of Software from Van Acker S, Hausknecht D, Joosen W, Sabelfeld A. Password Chinese Academy of Sciences. Her research interests include pass- meters and generators on the web: from large-scale empirical words, modern cryptography and information security. study to getting it right. Proceedings of the 5th ACM Conference on Data and Application Security and Privacy. Zhenfeng Zhang received the Ph.D. degree in Chinese Academy of ACM, 2015: 253-262. Sciences. He is now a Research Professor at Trusted Computing and Wash R, Rader E, Berman R, Wellmer Z. Understanding Password Information Assurance Laboratory, Institute of Software from Choices: How Frequently Entered Passwords are Re-used Chinese Academy of Sciences. His research interests include theo- Across Websites. Proceedings of the Symposium on Usable retical cryptography, applied cryptography, security model and Privacy and Security (SOUPS). 2016, 175-188. provable security of security protocols.