Statistical Metrics for Individual Password Strength

Statistical metrics for individual password strength Joseph Bonneau [email protected] University of Cambridge Abstract. We propose several possible metrics for measuring the strength of an individual password or any other secret drawn from a known, skewed distribution. In contrast to previous ad hoc approaches which rely on textual properties of passwords, we consider the problem without any knowledge of password structure. This enables rating the strength of a password given a large sample distribution without assuming any- thing about password semantics. We compare the results of our generic metrics against those of the NIST metrics and other previous \entropy- based" metrics for a large password dataset, which suggest over-fitting in previous metrics. 1 Introduction It is often desirable to estimate the resistance to guessing provided by a specific password. This can be useful both for research purposes to compare password choices between two groups with insufficient data to construct full distributions [9,11,14]. It can also be used to make a proactive password checker which indicates to a user the strength of a particular password during enrolment. [1]. Current practice usually relies on rules-of-thumb based on entropy estimates of English such as those standardised by NIST [3]. These have been argued to be inaccurate estimators of a password's vulnerability to cracking attacks [15]. Instead, we advocate using probability estimates based on an approximation of the distribution X which a passord x was drawn from. We'll call such a measure the estimated strength and denote it as SX (x). It is important to recognise that any estimated strength is only as accurate as our approximation for X is accurate for a target population. For example, the password tequiero (which roughly translates to iloveyou) is much weaker within a distribution of passwords chosen by Spanish speakers than by English speakers.1 We'll assume initially that the purposes of this paper that the population-wide distribution of passwords X is completely known before exploring the implications of relaxing this assumption. 1 This problem was illustrated by the research of Dell'Amico et al. [7], who found vastly different guessing efficiency based on the language of a password dictionary and the users who chose passwords in a dataset. 2 Desired properties We can describe two important properties that we would like for a strength metric: 1. Consistency for uniform distributions: For a discrete uniform distribution UN in which each of N events is equally likely, such as randomly-chosen 64-bit keys, we want any strength metric to indicate that each value has 2N bits of strength. Specifically: 8x2UN SUN (x) = lg N (1) 2. Monotonicity: A strength metric should rate any event more weakly than events which are less common in the underlying distribution X : 0 8x;x02X px ≥ px0 () SX (x) ≤ SX (x ) (2) 3 Strength metrics We formalise several purely statistical techniques for estimate a password's strength in this way, which require no general assumptions about human ten- dencies in password selection. In all cases, we convert our strength estimates into units of bits to satisfy our desired consistency property. 3.1 Probability metric The simplest approach is to take the estimated probability px from X and estimate the size of a uniform distribution in which all elements have probability px: P SX (x) = − lg px (3) This is, in fact, the classic definition of the self-information of x (also called the surprisal), which is the underlying basis for Shannon entropy H1 [13]. It is easy to see that for a randomly drawn value x R X , the expected value of this metric is: jX j P R X E SX (x) x X ] = px · − lg px = H1(X ) (4) i=1 Previous metrics which attempt to measure the probability of a password, for example using Markov models [6,12,4] or probabilistic context-free gram- P mars [16], can be seen as attempts to approximate SX (x). Applied to guessing, this model measures the (logarithmic) expected number of guesses before x is tried by an attacker who is guessing random values x R X with no memory. This attack model might apply if an attacker is guessing randomly according to the distribution to evade an intrusion-detection system. For optimal sequential guessing attacks however, this metric has no direct relation. 3.2 Index metric To estimate the strength of an individual event against an optimal guessing attack, the only relevant fact is the index ix of x, that is, the number of items in X of greater probability than px. We can convert this to an effective key-length by considering the size of a uniform distribution which would, on average, require ix guesses. This gives us a strength metric of: I SX (x) = lg (2 · ix − 1) (5) Intuitively, this metric is related to real-world attempts to measure the strength of an individual password by estimating when it would be guessed by password-cracking software [10]. A practical problem with this metric is that many events may have the same estimated probability. If we break ties arbitrarily then in the case of the uniform distribution UN the formula won't give lg N for all events. In fact, it won't even give lg N on average for all events, but instead ≈ lg N − (lg e − 1) (proved in Appendix B). A better approach for a sequence of j equiprobable events xi ··· xi+j where i+j pi = ::: = pi+j is to assign index 2 to all events when computing the index strength metric. This is equivalent to assuming an attacker will choose randomly given a set of remaining candidates with similar probabilities and it does give a value of lg N to all events in the uniform distribution. This is slightly unsatisfactory however, as it means for a distribution X ≈ UN which is \close" to uniform but with a definable ordering of the events, the average strength will appear to jump down by (lg e − 1) ≈ 0:44 bits. A second problem is that the most probable event in X is assigned a strength of lg 1 = 0. To satisfy our consistency requirement is a necessary limitation, since for U1 we must assign the single possible event a strength of lg 1 = 0. 3.3 Partial guessing metric The probability metric doesn't model a sequential guessing attack, while the index approach has a number of peculiarities. A better approach is to consider the minimum amount of work done per account by an optimal partial guessing attack which will compromise accounts using x. The α-guesswork G~α, defined in [2], reflects exactly the (logarithmic) expected amount of work per account required of an attacker desiring to break a proportion α of accounts. We provide the full definition for computing G~α in Appendix A. For example, if a user chooses the password encryption, an optimal attacker performing a sequential attack against the RockYou distribution will have broken 51.8% of accounts before guessing encryption. Thus, a user choosing the password encryption can expect to be safe from attackers who aren't aiming to compromise at least this many accounts, which takes G~0:518 work on average per account. We can turn this into a strength metric as follows: ix G 0 ~ X SX (x ) = Gαx (X ): αx = pi (6) i=1 Because G~α (UN ) = lg N for all α, the consistency property is trivially sat- 1 isfied. Moreover, for \close" distributions X ≈ UN where jpi − N j < " for all i, G I we'll have SX (xi) ! lg N for all i as " ! 0, unlike for S where strength will vary as long as there is any defined ordering. I As with SX though, we encounter the problem of of ordering for events with statistically indistinguishable probabilities. We'll apply the same tweak and give i+j each event in a sequence of equiprobable events xi ··· xi+j the index 2 . 4 Estimation from a sample Just like for distribution-wide metrics [2], there are several issues when estimating strength metrics using an approximation for X obtained from a sample. 4.1 Estimation for unseen events All of the above metrics are undefined for a previously unobserved event x0 2= X . This is a result of our assumption that we have perfect information about X . If not, we inherently need to rely on some approximation. As a consequence of the monotonicity requirement, S(x0 2= X ) must be ≥ max (S(x 2 X )). The P 0 naive formula for SX (x ) assigns a strength estimate of 1 though, which is not desirable. We should therefore smooth our calculation of strength metrics by using some estimate p(x0) > 0 even when x0 2= X . Good-Turing techniques [8] do not address this problem as they provide an estimate for the total probability of seeing a new event, but not the probability of a specific new event. A conservative approach is 0 1 to add x to X with a probability N+1 , on the basis that if it is seen in practice in a distribution we're assuming is close to our reference X , then we can treat x0 as one additional observation about X . This is analogous to the common heuristic of \add-one smoothing" in word frequency estimation [8]. This correction gives P an estimate of SX (x) = lg(N + 1) for unobserved events. I 0 For the index metric, this correction produces the smoothed estimate SX (x ) = lg 2N + 1, an increase of roughly 1 bit, due to the aforementioned instability for a distribution X ≈ UN .

Load more