On the Accuracy of Password Strength Meters

On the Accuracy of Password Strength Meters Maximilian Golla Markus Dürmuth Ruhr University Bochum Ruhr University Bochum Bochum, Germany Bochum, Germany [email protected] [email protected] ABSTRACT to implement for the operator, don’t require additional hardware, Password strength meters are an important tool to help users choose and are supported by a broad ecosystem such as password man- secure passwords. Strength meters can only then provide reasonable agers. In all likelihood, password-based authentication will stay for guidance when they are accurate, i. e., their score correctly reflect the foreseeable future. password strength. A strength meter with low accuracy may do Password strength meters (PSMs) are designed to help with one more harm than good and guide the user to choose passwords with of the central problems of passwords, namely weak user-chosen a high score but low actual security. While a substantial number of passwords. From leaked password lists we learn that up to 20 % different strength meters is proposed in the literature and deployed of passwords are covered by a list of only 5 000 common pass- in practice, we are lacking a clear picture of which strength meters words [63]. A PSM (also called strength meter, password checker, provide high accuracy, and thus are most helpful for guiding users. or similar) displays an estimation of the strength of a password Furthermore, we lack a clear understanding of how to compare when chosen by the user, and either helps or forces the user to pick accuracies of strength meters. passwords that are strong enough to provide an acceptable level of In this work, (i) we propose a set of properties that a strength security. meter needs to fulfill to be considered to have high accuracy, (ii) we The accuracy with which a PSM measures the actual strength use these properties to select a suitable measure that can deter- of passwords is crucial; as people are known to be influenced by mine the accuracy of strength meters, and (iii) we use the selected PSMs [61] (or even forced to comply), an inaccurate PSM can do measure to compare a wide range of strength meters proposed in more harm than good. If weak passwords are rated strong users the academic literature, provided by password managers, operating might end up choosing this password, actually harming security; systems, and those used on websites. We expect our work to be similarly, if strong passwords are rated weak the meter drives helpful in the selection of good password strength meters by service away people from those strong passwords. Traditionally, ad-hoc operators, and to aid the further development of improved strength approaches such as counts of lower- and uppercase characters, dig- meters. its, and symbols (LUDS) have been used to measure the strength of passwords. Despite being well-known that these do not accu- CCS CONCEPTS rately capture password strength [68], they are still used in practice. More recently, more sound constructions for PSMs based on pre- • Security and privacy → Authentication; Usability in security cise models capturing user choice have been proposed, e. g., based and privacy; Web application security; on Markov models [13], based on probabilistic context-free gram- KEYWORDS mars (PCFGs) [34, 65], neural networks [46, 59], and others [71]. Surprisingly, very little work has been performed on a fair com- Strength Meter; Password; User Authentication parison of these different proposals, and it remains unclear which ACM Reference Format: password meter is best suited for the task of estimating password Maximilian Golla and Markus Dürmuth. 2018. On the Accuracy of Password strength. Even worse, we lack consensus on how to determine the Strength Meters. In 2018 ACM SIGSAC Conference on Computer and Commu- accuracy of strength meters, with different techniques used ranging nications Security (CCS ’18), October 15–19, 2018, Toronto, ON, Canada. ACM, from Spearman and Kendall correlation to ad-hoc measures. New York, NY, USA, 16 pages. https://doi.org/10.1145/3243734.3243769 In this work, we propose a sound methodology for measuring the accuracy of PSMs, based on a clear set of requirements and 1 INTRODUCTION careful selection of a metric, and we will use this metric to compare Password-based authentication is still in widespread use, specifi- a variety of different meters. In more detail, our contributions are: cally for online authentication on the Internet and for hard-disk (i) We discuss properties an accurate strength meter needs to encryption. Passwords are easy to understand for laypersons, easy fulfill, and create a number of test cases from these requirements. (ii) We report tests of 19 candidate measures (and have tested Permission to make digital or hard copies of all or part of this work for personal or several more) from a wide range of types and select good metrics classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation for the accuracy of strength meters. on the first page. Copyrights for components of this work owned by others than the (iii) We address the challenge of estimating the accuracy from author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission limited datasets and show that meters can be reasonably approxi- and/or a fee. Request permissions from [email protected]. mated with a small number of random samples. CCS ’18, October 15–19, 2018, Toronto, ON, Canada © 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM. (iv) We provide an extensive overview of the current state of ACM ISBN 978-1-4503-5693-0/18/10...$15.00 the art of strength meters. https://doi.org/10.1145/3243734.3243769 (v) We use the derived measures and provide a comparison of a first. Weir et al. [70] suggested a method exploiting structural pat- broad selection of 45 password meters in 81 variations, ranging from terns from a password leak using probabilistic context-free gram- academic proposals over meters deployed in password managers mars (PCFGs). Veras et al. [64] extended the approach by building and operating systems to meters in practical use on websites. a semantically meaningful PCFG-based password guesser. An em- More important than the results of this work, are the methods we pirical study on the effectiveness of different attacks was doneby developed. They provide means to select suitable similarity metrics Dell’Amico et al. [19]. Another study targeting probabilistic pass- that match the requirements for specific use cases. We hope that word modeling approaches was done Ma et al. [44]. Recently, Ur it will foster the future development of strength meters and will et al. [62] did a large scale comparison and found running a single simplify the selection process for service operators. guessing algorithm, often yields a very poor estimate of password strength. In 2016, Melicher et al. [46] proposed using recurrent 2 RELATED WORK neural networks (RNNs) for probabilistic password modeling. In the following, we review some material related to password strength including password choice, password guessability, and 2.3 Password Strength metrics that were proposed for measuring strength. Trying to estimate password strength as a measure to defend against guessing attacks has a long history. In 1979, Morris and Thomp- 2.1 Password Choice son [47] did password checking by attempting to crack hashed Jakobsson and Dhiman [37] found that users produce passwords passwords. The ones successfully cracked were marked as weak, using only a small set of rules and components such as dictionary and the users were notified. In the following, one started to check words, replacement strategies, and misspellings. Veras et al. [64] ex- the strength of a password before it is accepted by a system via pro- plored the semantics of passwords. They found male/female names active password checkers or password strength meters, using certain and concepts relating to love, profanity, animals, food, and money. rules-sets that try to exclude weak passwords [7, 41, 56]. Schechter Recently, Ur et al. [60] investigated the relationship between users’ et al. [53] classified passwords as weak by counting the number perceptions of the strength of specific passwords and their actual of times a certain password is present in the password database. strength. They report serious misconceptions on the consequences Bonneau [8] proposed α-guesswork as a proper metric for esti- of constructing passwords by using common phrases and including mating the guessing entropy of a fraction α of accounts. Kelley digits and keyboard patterns. A possible countermeasure to prevent et al. [40] proposed a guess-number calculator to determine if and users from choosing easy to guess passwords are so-called pass- when a given password-guessing algorithm, would guess a specific word composition policies (e. g., requiring the password to contain password. Dell’Amico and Filippone [18] proposed a method to a digit or symbol). Komanduri et al. [42] analyzed the effect of such estimate the number of guesses required to find a password. policies by investigating their impact on password strength, user Many of the password strength meters in the current literature behavior, and user sentiment. Based on their findings, they pro- are based on the aforementioned password guessing approaches: duced recommendations for password-composition policies that using neural networks by Melicher et al. [46] and Ur et al. [59], us- result in stronger passwords without burdening users too much. In ing PCFGs by Houshmand and Aggarwal [34] and Wang et al. [65], 2014, Florêncio et al. [24] highlighted the importance to limit the and using Markov models by Castelluccia et al.

On the Accuracy of Password Strength Meters

Modern Password Security for System Designers What to Consider When Building a Password-Based Authentication System

Password Security Compliance

Analyzing Password Strength

Deepmnemonic: Password Mnemonic Genera- Tion Via Deep Attentive Encoder-Decoder Model

Do Strong Web Passwords Accomplish Anything?

Encrypted Negative Password Using ELGAMAL

Design and Evaluation of a Data-Driven Password Meter

Design and Implementation of Authentication Scheme By

Just in Time Hashing

Using Improved D-HMAC for Password Storage

Draft NIST SP 800-118, Guide to Enterprise Password Management

Protecting Passwords in the Event of a Password File Disclosure