Effect of Grammar on Security of Long Passwords

Effect of Grammar on Security of Long Passwords Ashwini Rao Birendra Jha Gananand Kini Carnegie Mellon University Massachusetts Institute of Carnegie Mellon University [email protected] Technology [email protected] [email protected] ABSTRACT requirements, e.g. minimum 16 character passwords[21] and Use of long sentence-like or phrase-like passwords such as sentence-like or phrase-like passphrases[6, 11, 3, 12, 27]. In “abiggerbetterpassword”and“thecommunistfairy”is increas- the minimum 16 character password policy, the only restric- ing. In this paper, we study the role of grammatical struc- tion is that passwords cannot contain spaces. An example tures underlying such passwords in diminishing the security of a passphrase policy is “choose a password that contains of passwords. We show that the results of the study have at least 15 characters and at least four words with spaces direct bearing on the design of secure password policies, and between the words”[6]. The increase in the length of the on password crackers used for enforcing password security. password supposedly makes the password difficult to guess. Using an analytical model based on Parts-of-Speech tagging To memorize longer passwords users may rely on mem- we show that the decrease in search space due to the presence ory aids such as rules of English language grammar. Users of grammatical structures can be more than 50%. A signifi- may use memory aids voluntarily or due to policy recom- cant result of our work is that the strength of long passwords mendations. Our analysis of a set of 1434 passwords of 16 does not increase uniformly with length. We show that using characters or more from a published study[21] shows that a better dictionary e.g. Google Web Corpus, we can crack more than 18% of users voluntarily chose passwords that more long passwords than previously shown (20.5% vs. 6%). contain grammatical structures. Each of these passwords We develop a proof-of-concept grammar-aware cracking al- contains a sequence of two or more dictionary words. An gorithm to improve the cracking efficiency of long passwords. example is “abiggerbetterpassword” that contains the gram- In a performance evaluation on a long password dataset, 10% matical structure “Determiner Adjective Adjective Noun”. of the total dataset was exclusively cracked by our algorithm Table 1 provides more examples. In addition to grammati- and not by state-of-the-art password crackers. cal structures we also found other types of structures such as postal addresses, email addresses and URLs. Given the Categories and Subject Descriptors: D.4.6 [Operat- evidence of use of structural patterns in long passwords, we ing Systems]: Security and Protection–Authentication are motivated to investigate its effect on password security. General Terms: Security, Algorithms, Performance Studies on password security so far have focused only on structural dependencies at the character level[35, 33, 21]. Keywords: Password; Passphrase; Cracking; Grammar; Main Contributions: (1) We propose an analytical frame- Policy work to estimate the decrease in search space due to the presence of grammatical structures in long passwords. We use 1. INTRODUCTION a simple natural language processing technique, Parts-of- Text-based password authentication is a widely deployed Speech (POS) tagging, to model the grammatical structures. user authentication mechanism. Use of text-based pass- (2) We show that the strength of a long password does not words involves a trade-off between usability and security. necessarily increase with the number of characters or words System assigned passwords and user-selected passwords sub- in the password. Due to the presence of structures, two pass- ject to complex constraints (e.g. including mixed-case, sym- words of similar length may differ in strength by orders of bols and digits) are harder to guess, but less usable[22]. magnitude. (3) We develop a novel cracking algorithm to in- Conversely, simple, memorable user-selected passwords of- crease the cracking efficiency of long passwords. Our crack- fer poor resilience to guessing. ing algorithm automatically combines multiple words using To obtain a good compromise between security and usabil- our POS tagging framework to generate password guesses. ity, researchers and organizations are recommending the use (4) We show that it is necessary to analyze the distribution of longer user-selected passwords with simpler composition of grammatical structures underlying password values in addition to the distribution of password values themselves to quantify the decrease in guessing effort. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies 2. BACKGROUND AND RELATED WORK bear this notice and the full citation on the first page. To copy otherwise, to Parts-of-Speech Tagging: It is the process of assign- republish, to post on servers or to redistribute to lists, requires prior specific ing a part of speech to each word in a sentence[25]. In permission and/or a fee. CODASPY’13, February 18–20, 2013, San Antonio, Texas, USA. English language, parts of speech are noun, verb, adjec- Copyright 2013 ACM 978-1-4503-1890-7/13/02 ...$15.00. tive etc. For example, the parts of speech for a sentence Table 1: Examples of phrases in long password dataset Category Password Example Phrase Total Simple abiggerbetterpassword a bigger better password 178 Substitution thereisnomored0ts there is no more dots 20 Extra Symbol longestpasswordever8 longest password ever 70 Total out of 1434 268 Brown Corpus Statistics passphrases if the distribution of passphrases are identical Words 1161192 Unique Words 49815 to the distribution of phrases in a natural language corpus Sentences 57340 such as Google Web Corpus. In this work, we assume that Characters per Word 4.26 users model their password as a sequence of words following Words per Sentence 20.25 the rules of grammar such as “Determiner Adjective Noun”. Unique Characters 58 Content Genres 15 The sequence need not occur as-is in a natural language corpus. An example of such password is “the communist fairy” that occurs in the long-password dataset presented in Table 1 but not in the Google Web Corpus. By making a re- laxed assumption we model a more powerful adversary who can attack defenses such as use of nonsensical phrases[27]. Figure 1: Brown Corpus statistics and count of unique words (Unique Word Count) for top 21 3. SEARCH SPACE ANALYSIS Parts-of-Speech tags (POS Tag) in the Brown Cor- The password search space is the set of all possible unique pus. N, NP, ADJ,... correspond to Noun, Noun password values. In this section we investigate how the pres- Proper, Adjective,. The uneven distribution of ence of grammatical structures modifies the password search word counts among POS tags has important implica- space. One can consider a password value as a sequence of tions on password search space and guessing effort. characters, a sequence of words, or a sequence of words gen- erated using the rules of grammar. We propose an analytical “She runs fast” are “Pronoun Verb Adverb”. Given a se- framework to estimate the size of the password search space quence of words (word1word2 ... wordn), a POS tagger such under each of the three assumptions. By comparing the as CLAWS[19] can output a sequence of tags, one tag per three estimated sizes we can understand the level of reduc- word ((word1, tag1) (word2, tag2) ...(wordn, tagn)). tion in the size of the password search space when grammar Natural Language Corpora: The field of natural lan- structures are present. Via numerical evaluation, we show guage processing commonly uses collection of real data sam- that the reduction in search space could be 50% or more. ples or corpus to train and test tools[25]. Two examples are the Google Web Corpus[17] and the Brown Corpus[23]. The 3.1 Computing Search Space Size Google Web Corpus is a corpus of 1 trillion word tokens of Consider a password that contains up to n words. If English text collected from web pages and it contains 1 to 5 the words are from a dictionary D={the, run, king, hand- word n-grams and their frequency counts. The Brown Cor- some,... } that contains numw unique words, the size of the pus is a corpus of printed English language of more than 1 password search space of all possible word sequences is n million words. It contains articles from 15 genres such as G(word) = numwi (1) fiction, government, news, and user reviews. Because of the X i=1 presence of multiple genres, the Brown Corpus is considered Consider a word as any sequence of characters, for exam- a balanced corpus that well represents the entire printed ple “llmmnn”, and not only the elements in a standard dic- English language. Sentences in the Brown Corpus are POS tionary. Now, the password search space is bigger than tagged. The Simple Brown POS tag set consists of 30 tag G(word). Let numc be the number of unique characters types. Fig. 1 contains the statistics of the Brown Corpus possible and avgc be the average number of characters per and the unique word counts for popular tag types. word. We can approximate the size of the password search Password Security: Current password security primar- space of all possible character sequences as ily focuses on the relationships at character level. In[33] n avgc×i the authors estimate the password search space using Shan- G(char) = X numc (2) non entropy[32] at the character level. Password crackers i=1 that enumerate password search space use zeroth or higher Let us now consider a password as a sequence of words cre- order Markov models trained on character probability distri- ated using grammatical rules.

Effect of Grammar on Security of Long Passwords

NSA Vs. Encryption Article Written by Datto Developer Dan Fuhry First Appeared on Mspmentor.Com in November 2013

Analysis of Password Cracking Methods & Applications

Implementation and Performance Analysis of PBKDF2, Bcrypt, Scrypt Algorithms

Key Derivation Functions and Their GPU Implementation

Thread Commissioning White Paper

Harden Zero Knowledge Password Proofs Against Offline Dictionary

II-2.03 Encryption Policy

Securezip User Guide

Analysis and Processing of Cryptographic Protocols

Probabilistic Passphrase Cracking

8 Steps to a Strong Password, Infographic

Security Analysis of Truecrypt