<<

Analyzing Strength

Martin M.A. Devillers

Juli 2010

Password requires no ABSTRACT specialized hardware, such as with finger- print authentication, can be easily imple- Password authentication is still the most mented by developers and just as easily used used authentication mechanism in today’s by users. In short: It’s usable. systems. In most systems, the But is it safe? The topic of “Security ver- password is set by the user and must adhere sus Usability” has always been of much de- to certain password requirements. Addition- bate in the computer security world (3). Us- ally, password checkers rank the strength of ers must be protected from harm by security a password to give the user an indication of controls, but these controls may not interfere how secure their password is. In this paper, (much) with the tasks the users want to per- we take a look at a large database of user form. For instance, a firewall that simply chosen to determine the current blocks all traffic can be considered secure, state of affairs. In the end, we extract a mod- but heavily impedes the overall usability of el from the database and provide our own the system. password checker which ranks passwords in various ways. We ran this checker against Security experts or system developers are our dataset which shows that over 90% of usually the ones who have to make this tra- the passwords is highly insecure. deoff, but with password authentication, this task is essentially passed on to the user (3): INTRODUCTION One can either choose a short and simple password, which is easy to remember but As we perform increasingly important also easy to crack, or a very long and com- tasks from our living room computer, the plicated password, which is hard to remem- topic of computer security also becomes in- ber but also hard to crack. creasingly important. And indeed, many ad- Unfortunately, most users do not see a vances have been made in the field of com- tradeoff: They see an obstacle and they will puter security to protect home users from choose the path of least resistance to over- digital crime. For instance, the wireless come that obstacle. Thus, short and simple transmission protocol, Wi-Fi, has had sever- passwords that are easy to remember, but al (much overdue) security overhauls to pro- also easy to guess, are used (4). tect home users from being eavesdropped or worse (1). However, when we look at what we To stop users from using weak pass- typically use to authenticate ourselves with, words, most systems enforce certain re- we are stuck with a system that dates back quirements that a password must meet be- to the Roman empire (2): passwords. fore it gets set. Examples of common re- quirements are minimal length of the pass- Although an ancient concept, password word, the occurrence of uppercase letters, authentication is, and most likely will be for digits and/or symbols in the password and a long time, the most used authentication inequality with the user’s username or e- mechanism for computer users. mail address (5).

However, holding on to the principle of usernames. Since the source of the data was the path of least resistance, one can expect questionable at best, we ran various tests users to try and ‘circumvent’ these require- and filters to ensure the quality of the data. ments in a predictable manner (4). For in- Various noise factors were discovered: stance, given that an user starts out with an 1. Analysis showed that entries longer actual word such as ‘house’ and the re- than 30 characters were more often noise quirement that the password must contain than actual passwords. A lot of entries were at least one digit is given, one might expect pieces of HTML or JavaScript, which might the user to simply suffix the word with a sin- be indicative of an injection attack or some gle digit. In the findings which we will kind of input anomaly. Others contain a sin- present, you will see that 15% of the pass- gle character repeated up to a hundred words were a word or name suffixed with the times. The most likely explanation for this number one. anomaly, are users who want to quickly In this paper, we will first discuss the da- sign-up and fill in bogus information to get a taset which we used to perform our research. onetime account. Such entries cannot be After this, we will show you the results of our considered to be actual passwords, since the preliminary tests on this dataset. In the next user does not have the intention of memoriz- chapter, we will go one step further and ex- ing or re-using the password. Since these tract actual patterns from the passwords. entries are much longer than normal entries, After this, we will show you how we used a a filter was applied that removes any entry probabilistic approach to password analysis. longer than 30 characters. In the final chapter, we will present you our 2. At various points in the database, the password checker, which combines the re- character encoding switches. For instance, sults of all the previous chapters. the first 3 million passwords contain Un- icode characters in their HTML encoded ATASET D form. Hereafter, Unicode characters are On December 4th 2009, a hacker stored in their actual form, which might be breached a company database of RockYou!1 indicative of the backend switching to UTF-8 containing the usernames and unencrypted encoding support. Similarly, the file itself is passwords of about 32 million users (3). This part encoded in ANSI and part encoded in database was subsequently published to the UTF-8. Character analysis gave indication and is now in wide circulation. Ob- when these switches occurred and a pro- viously, we don’t condone hacking, but the gram was written to rewrite the database in presence of this database gives us an unique one uniform encoding scheme (UTF-8). opportunity to perform a large scale empiri- 3. Various (uncommon) passwords occur cal study on passwords. multiple times in close proximity to each The most notable fact about the afore- other. This might be indicative of a single mentioned affair would not be that a large user registering multiple accounts. A filter database was hacked, but that the pass- was applied that removes these extra occur- words inside the database were stored in rences. Common passwords did not apply to unencrypted form (so-called plaintext pass- this filter, as they may naturally occur mul- words). These days, it is common practice to tiple times in close proximity. and hash passwords before permanently storing them, which makes it generally hard PRELIMINARY TESTS to study passwords even when granted access to the right databases. Before we started with the actual analy- sis, various basic tests were ran to gain The RockYou! database was acquired in some insights into the database. These in- the form of a long text file where each pass- clude a letter frequency analysis, a character word resides on its own line. The file con- type analysis, a length distribution analysis tained no other information, such as the and a common password analysis.

1 RockYou! (originally known as RockMySpace), Letter frequency analysis helps us in var- based in Redwood City, California is a publisher ious ways. Knowing the frequency of each and developer of applications and other social letter gives us the ability to define a more network services. As of December, 2007 it is the fine-grained metric for measuring password most successful widget maker for the strength. By grading the chance of occur- platform in terms of total installations.

Radboud University Nijmegen 2 Martin M. A. Devillers 16% RockYou English Spanish

14%

12%

10%

8%

Percentage 6%

4%

2%

0% abcde f ghi j k lmnopqrs tuvwxyz

Figure 1 – Letter frequencies of the RockYou! dataset, English and Spanish language rence of each individual character rather  Unicode, which are any characters that than grading each letter a flat twenty-sixths do not belong to the above categories. chance of occurrence. Examples of these are the euro sign and Furthermore, letter frequency analysis the Japanese alphabet. allows us to measure the degree in which the Given these categories, 2 32 combina- passwords conform to actual words. This tions are possible. However, we expect that helps us to better understand if the dataset only a few will be really prominent. follows a language and what language this might be (6). Figure 1 shows the letter frequencies of the dataset and those of the English and Other Spanish language. As you can see, the dis- 9% tribution of the RockYou! dataset shows great similarity to those of the English and Digit Lower Spanish language. When we do a quick 16% Case search for English and Spanish words in the 42% dataset, we find that 10% of the entries match English words and 2.5% match Span- ish words. Lower Character type analysis looks at every Case, password and flags what kind of characters Digit make up the password. We distinguish be- 33% tween the following character types:  Lowercase, which are the standard lo-

wercase letters of the alphabet. Figure 2 – Character type analysis  Uppercase, which are the standard up- percase letters of the alphabet. Figure 2 shows the results of our charac- ter type analysis. The largest category, which  Digits, which are the digits zero through nearly makes up for half the dataset, are nine passwords that consist solely out of lower-  Symbols, which are any characters found case characters. This is troublesome, consi- in the non-extended ASCII set that do not dering that most passwords are only 6 to 8 belong to the above categories. Most of characters long and the letters found in the these can be found on a standard Ameri- passwords conform to that of a language. can keyboard.

Radboud University Nijmegen 3 Martin M. A. Devillers This would suggest that a significant part princess 33291 0.1021% of our dataset are words or names. Another 1234567 21727 0.0666% troublesome result, are the passwords that rockyou 20903 0.0641% consist solely out of digits, which represents 12345678 20553 0.0630% 16% of the dataset. Numeric passwords have abc123 16648 0.8918% limited complexity. Table 1 – Top 10 passwords Length distribution analysis gives us in- Table 1 shows the top 10 passwords with sights in what the common length of user their absolute counts. The use of numerical chosen passwords are. sequences immediately stands out as 5 of the 10 passwords represents this class. 30% When we break the numbers down to 25% percentages, two troublesome conclusions can be drawn: 20% 1. Roughly one out of every hundred 15% passwords is the sequence ‘123456’

Percentage 10% 2. The top 10, 100, 1,000 and 10,000 passwords cover respectively 2, 5, 11 and 22 5% percent of all passwords. 0% In other words, if an attacker would try 56789101112 to gain access to a system by trying the top 10 most common passwords using a listing Figure 3 – Password length distribution of known accounts. He could expect to suc- ceed within 25 accounts, costing him only The results of the analysis, as shown in 250 guesses. Given more accounts and just Figure 3, do not show a normal form distri- the password ‘123456’, success could be ex- bution, but rather a truncated form. We ar- pected with the 50th account, costing only 50 gue that this is a result of minimum pass- guesses. This attack is feasible without au- word length requirements. Strangely, the tomation and low-profile enough to not raise database contained entries as short as one any alarms. character. At the moment of writing, the RockYou! website enforces a minimum 8 We compare our results with other top character password length, but this was tens of common passwords (4) (5) in Table 2. most likely less (or non-existent) in the past All highlighted cells represent passwords (10). that are present in our top 10. It should be In the previous chapter we already noted noted that most other passwords are in our that we ignored any entry longer than 30 top 100 and all are present in our top 1000. characters. The range of passwords of size RockYou! PCMag M.Burnet 15 through 29 covered about 2 percent of 123456 password 123456 the database. Half the passwords of length 12345 123456 password 20 and above were e-mail addresses, thus 123456789 qwerty 12345678 explaining their unusual length. password abc123 1234 iloveyou letmein pussy Finally, we take a look at the most com- princess monkey 12345 mon passwords. We expect these passwords 1234567 myspace1 dragon to be really weak, as a password that is rockyou password1 qwerty widely used is most likely one you can logi- 12345678 blink182 696969 cally guess, find in a common password list- abc123 Firstname mustang ing or is context dependent (such as the Table 2 – Top 10 comparison (the last entry of PCMag name of the service) ‘firstname’ is a placeholder for an actual firstname)

Password Count Percentage Although these overlapping lists, streng- 123456 290731 0.8918% thens our believe that the RockYou! dataset 12345 79078 0.2426% is representative, it at the same time sad- 123456789 76790 0.2356% dens us to see that so many users still use password 59463 0.1824% iloveyou 49952 0.1532% very predictable passwords.

Radboud University Nijmegen 4 Martin M. A. Devillers are matched. We then look at the most PATTERN ANALYSIS popular numbers and revert these back to regular expressions. The previous chapter showed several ways of defining characteristics of a pass- Number Count Percentage word. Now, we want to go a step further and 1 1476941 16.37% extract actual patterns from passwords. This 123 325963 3.61% 2 284354 3.15% will help us to better understand how pass- 12 213870 2.37% words are formed and ultimately allow us to 3 166762 1.85% construct an improved password checker. 13 150069 1.66% M. Dell’Amico (6) studied a much smaller 7 147951 1.64% database of roughly 10,000 entries. Various 11 122630 1.36% 5 120376 1.33% regular expressions were ran against the da- 22 107444 1.19% tabase, in an attempt to recognize patterns 23 106425 1.18% in passwords. We repeat their experiment on 01 102756 1.14% the RockYou! database and present both 4 101573 1.13% their and our results below in Table 3 07 100693 1.12% Expression Example IIMS RockYou 21 100370 1.11% 14 95288 1.06% [a-z]+ abcdef 51.20% 41.69% 10 92655 1.03% [A-Z]+ ABCDEF 0.29% 1.50% 06 86495 0.96% [A-Za-z]+ AbCdEf 53.74% 44.05% 08 86065 0.95% [0-9]+ 123456 9.10% 15.93% 8 83819 0.93% [a-zA-Z0-9]+ A1b2C3 93.43% 96.20% 15 83708 0.93% [a-z]+[0-9]+ abc123 14.51% 27.69% 69 81299 0.90% [a-zA-Z]+[0-9]+ aBc123 16.30% 30.16% 16 78506 0.87% [0-9]+[a-zA-Z]+ 123aBc 1.80% 2.75% 6 76798 0.85% [0-9]+[a-z]+ 123abc 1.65% 2.53% 18 71343 0.79% Table 3 – Regular expressions Table 4 – Top 25 numeric suffixes The most notable difference is the use of digits. Most percentages in the IIMS data- Table 4 shows the 25 most used numeric base indicate a reliance on lowercase charac- suffixes, which makes up for halve the ters, whilst in the RockYou! database pat- passwords matched by the regular expres- terns that contain digits are more prominent. sion ‘[a-zA-Z]+[0-9]+’. We expect single The patterns that describe a word that is digits to be popular suffixes and indeed, the suffixed by a number are almost double as digit ‘1’ covers over a million passwords that popular in the RockYou! database. Pass- use it as a suffix. However, double digit words that are made purely out of digits are also more popular in the RockYou! database. numbers occur twice as often as single di- gits, which we did not expect. Interestingly, Based on these results we can argue that the digit 9 is only present once in the listing the users of RockYou! have been trained to (and not even as a single digit number). create more secure passwords, most likely by other applications which enforce stricter re- Besides the top 25, we present a second quirements. One exception would be the list which contains entries longer than 2 passwords that consist purely out of digits. characters from the top 100. These passwords are used more often in the RockYou! dataset and we consider these Number Count Percentage kind of passwords to be very insecure, due to 101 51065 0.57% 1234 49619 0.55% their limited complexity. 2007 30731 0.34% Besides the aforementioned expressions, 2006 29122 0.32% we propose our own set of supplemental ex- 666 24317 0.27% pressions. We are most interested in pass- 2008 24300 0.27% words that start with letters and end with 12345 20276 0.22% digits, as they make up for one third of our 2005 18694 0.21% 007 18261 0.20% dataset and regular expressions can help us 420 16470 0.18% better understand this set. Therefore, we 123456 15811 0.18% repeat the regular expression ‘[a-zA-Z]+[0- 1994 14288 0.16% 9]+’ and keep track of all the numbers that 1993 13695 0.15%

Radboud University Nijmegen 5 Martin M. A. Devillers Number Count Percentage 1992 13609 0.15% N-GRAM APPROACH 777 13223 0.15% 1995 13149 0.15% Another approach to analyzing the data- 2000 12613 0.14% set is by counting the occurrences of small 111 12426 0.14% sized character pairs that make up the 1991 12358 0.14% password. Such an approach delivers a so- 1990 11740 0.13% called n-gram model, which is one of the ap- 2004 11466 0.13% proaches used in a paper by M. Dell’Amico 321 11073 0.12% (6) to generate passwords. An n-gram model 1989 10940 0.12% 1987 10689 0.12% is a type of probabilistic model for predicting the next item in a sequence. n-grams are Table 5 – Entries from top 100 numeric suffixes used in various areas of statistical natural with length larger than 2 characters language processing and genetic sequence Table 5 lists these entries. The length of analysis. the entries exposes very apparent patterns. A n-gram model can be useful in two All entries of length 4 are years. We argue ways: that the gap between 1994 and 2004 is in- 1. It can be used to measure the likelih- dicative of users using the current year dur- ood of a password as a product of the rela- ing sign-up or a birth year. We know that the tive occurrences of its containing n-grams. user base of RockYou! consists largely out of This gives us yet another metric to measure teenagers. Entries larger than 4 represent passwords. numeric sequences while entries with length 2. It can be used to generate new pass- 3 represent repetitive digits or numbers from words, based on the n-gram model. This pop culture such as the number of the beast could form the basis of a new kind of pass- (666), lucky sevens (777), James Bond’s call word attack strategy, which is smarter than sign (007) and four-twenty from the canna- a brute force approach. (8) bis subculture (420). To build a n-gram based model, every All two digit numbers are represented possible combination of characters of every within the top 180 entries, which makes any password in the dataset must be extracted. This is required to solve the probability term entry in that range have more than 50% of a n-gram based likelihood calculation, chance of falling in this category. We formu- which can be described with the term: late this in a regular expression that matches any two digit number. |,,, Expression Description To calculate the probabilistic likelihood ^(1|12|123|1234 Numeric sequence of a password, the following formula is ap- |12345|123456)$ plied: ^[0-9]$ Single digit ^[0-9]{2}$ Double digit || | ^(19[7-9][0-9] Year between 1970 and || |200[0-9])$ 2009 Where is the size of the n-gram, is Table 6 – Regular expressions for numeric suffixes the password, || is the length of the pass- word and || is the probability of being Table 6 shows the patterns described as that length. regular expressions. It should be noted that the order is of importance. For instance, we The formula | is consider the use of years as a suffix to be the conditional probability of the substring more secure than a single digit suffix. Regu- given all possible strings that can be created from when appended lar expressions and their order will play an with a single character. This is an embodi- important role in our own password checker ment of the term |,,, which we will present later on. which is used by the n-gram model to pre- dict based on ,,, and allows us to calculate the probability of a n-gram.

Radboud University Nijmegen 6 Martin M. A. Devillers Since we are calculating the product of a The actual value of has a huge impact series of probabilities, the assumption is on the behavior of the model. In the next few made that the collection of occurrences is paragraphs, we will discuss various values of mutually independent, which means that we and their implications. assume that the following property of mutual Using a value of one for (unigrams), will independence holds: make the algorithm contextless, as the algo- rithm will only look at the frequency of single letters. This is synonymous to using the cha- racter frequency analysis from the previous Although we know that this property chapter to grade or predict passwords. does not actually hold, the assumption that If one were to choose a very large , then it does is required for us to simplify the the algorithm would lose its ability to dissect problem. Through this independence as- passwords and simply grade a password by sumption, our model assumes the Markov its entire string of characters. This is syn- property, which enables reasoning and com- onymous to calculating the frequency of a putation with the model that would other- given password within the original dataset. A wise be intractable. password generator with such a model would The length of the password determines (almost) never generate any passwords out- the amount of iterations, which generally side the dataset. means that the longer the password, the On another note, a large would also more iterations there are and the smaller the impose problems of the computational kind. overall outcome of the formula will be. This As the value of increases, the amount of is taken in account, due to the fact that possible n-grams increases exponentially. we’re explicitly including the probability of Given a dataset of our size and a of 5 our the length of the password, as denoted by algorithm used up nearly a gigabyte of sys- ||, in our formula. tem memory. For a larger we had to resort In the event that , which happens in to using a smaller subset of our dataset. the first 1 iterations of the product, the Finally, a large also imposes another substring references a non-existing charac- problem with the effectiveness of our algo- ter index of zero or smaller. These references rithm. As we said earlier, one of the powerful are substituted with the ‘»’ sign to denote features of using n-grams is that they can be the fact that we are dealing with a starting used to analyze or produce new (unseen) sequence. Thus, we get the convention: data based on known data.

0| » However, this imposes some require- The set of possible characters is deter- ments on the known data. Say we want to mined by the distinct characters which we calculate the strength of the phrase “pass- encounter in the dataset. Thus, we do not word” with 4, thus giving the formula include the full Unicode character set by de- "password". This would mean calculating fault. Given the previous description of , the the product of the probabilistic frequencies corresponding formula looks like: of all 4-grams contained within the phrase “password”. Consider the 4-gram “word”, which requires the calculation as shown at | ∑ the top of this page. The formula denotes the number of oc- Now, consider a model which has never currences of the input string inside the mod- seen the 4-gram “word”. This would result in el. The numerator part of the fraction "word" 0 and thus "word"|"wor" 0. represents the number of occurrences of the As we’re taking the product of in our algo- string that we are interested in (the n-gram), rithm this would ultimately lead to whilst the denominator represents the num- "password" 0. ber of occurrences of all possible permuta- One might argue that this is favorable, tions of that string. In the denominator, since we cannot grade something when we indicates a character from the set of all poss- have no background data. However, given ible characters as encountered in our data- the previous example, we do have actual set. values for for all the 4-grams preceding “word”. Thus there is certainly some meas-

Radboud University Nijmegen 7 Martin M. A. Devillers ure of occurrence present in the calculation, From our findings we concluded that us- which is lost when we multiply this by zero. ing trigrams was a good trade-off between We overcame this limitation by altering the accuracy and usability (in terms of perfor- algorithm to raise all occurrences by one and mance). counting any item that never occurs as one instead of zero. This has little impact on n- PASSWORD CHECKER grams that occur many times and makes our algorithm more robust when dealing with The previous chapters illustrated various unseen data. The downside being that n- ways in which we can grade password grams that occur very few times will have a strength. In this chapter, we combine these less accurate representation. metrics into a program that can help users We used two third of our dataset as a to assess their passwords safety. Each step training set for our algorithm and the re- employs one or more metrics, which leads to maining one third as our test set. one or more results. Each result has a name, N=1 N=2 N=3 N=4 a short description in natural language what aaaaa 123456 123456 123456 the result represents and a severity level to eeeee 123123 1234567 1234567 indicate the impact of the result. Some re- ale 12345 12345 12345678 sults are pure informative and therefore have aanaa 121234 12345678 123456789 no severity impact. aaaan 112345 234567 passwo aaaaaa 123234 milove 12345 At the core, our checker does various aaeaaa 123412 112345 passwor things. aeaeae 123450 lover ilovey 1. The length of the password is ana- ann 123452 012345 password lyzed, as this is a simple metric that can rule Table 7 – Most frequent passwords a password out as weak without even look- ing at the actual content of the password. Table 7 shows the top 10 most likely passwords as graded by our algorithm for 2. The password is ran across a common various sizes of n. As we noted earlier, an list of passwords. These are lists acquired analysis based on unigrams (1-gram), is bas- from the internet or calculated by ourselves ically a character frequency analysis. As vo- from our own dataset. wels are the most frequently occurring cha- Pattern analysis is employed to deter- racters in the natural language, passwords mine if the password fits a common pattern. that contain vowels are graded higher, than One of the most important parts of this step those that do not. When we look further is to actually break up the password in let- down the list we see mostly short pronoun- ceable passwords. ter(s), digit(s) and symbol(s). If pattern analy- sis fails to find a suitable pattern, an un- When using bigrams (2-grams), numbers mangling strategy is applied. Some users try take the upper hand as they form the most to add complexity to passwords by replacing frequent two character pair in the dataset. certain characters with digits or symbols (i.e. As we’ve seen from the previous chapter, password becomes p4$$w0rd) (14). This in- there are simply a lot of passwords that con- hibits regular expression based pattern sist out of or contain numerical sequences. matching from finding actual patterns. The Moreover, the chance that some digit follows unmangling strategy attempts to revert said another is larger than the chance that some letter follows the other, simply because there mangled passwords to their letter based are only 10 digits as opposed to 26 letters. form. Once a matching pattern has been found, the algorithm looks at what groups When using trigrams (3-grams), text the pattern captured. based passwords start to show up, but nu- merical sequences still prevail. This top 10 a. If any words were found during however shows similarities to the actual top the pattern analysis step, they will be 10 of most occurring passwords in the Rock- held against common dictionaries of You! Dataset. passwords, words and names. We prefer to use small dictionaries of a specific Finally, when using quadrigrams (4- type, since this gives us more power and grams), numerical sequences are overly certainty in describing what the word present, but the larger list of results show a represents. great deal of overlap with our own top 1000.

Radboud University Nijmegen 8 Martin M. A. Devillers b. If any numbers were found during word consisting out of a word suffixed by the the pattern analysis step, they are in- digit one. We extrapolated common patterns spected to see if they fit a common form from the dataset and presented their occur- (such as a year). rences. c. If any symbols were found during The results from our n-gram approach the pattern analysis step, they are in- showed how probability calculation can help spected for common occurrence. us assess password strength. We looked at various values of n for determining the op- 3. Finally, we use a 3-gram model based timal size of the model and determined that on our dataset to grade the password. To trigrams (3-grams) were best suited for grad- make our 3-gram based algorithm suitable ing passwords. for password grading, we need to map the range of possible values from our algorithm The above findings were incorporated in to a set of gradations. First off, the actual our own password checker, which we then range of lies between 1 and 10 , ran against our dataset. The results were, where the lower limit is a constraint set by again, troublesome, as 95% of the passwords our computer environment. We determined were marked as being highly unsafe. One the boundary values for the gradations by mitigating argument is that RockYou! pro- manually checking the results of our algo- vides a very simple non-critical service, thus rithm on several sets of passwords. For in- users may not feel inclined to provide a stance, running the algorithm on the top strong password. Yet, with halve a million 1000 we computed earlier, gave us a good users choosing a short numerical sequence boundary value for passwords that are ‘Criti- as a password, one may wonder if security in cal’. terms of password authentication is at all present. When all results are calculated, they are presented to the user in order of severity. Each result is accompanied with a short REFERENCES message describing the ‘problem’ that was found. 1. Alliance, Wi-Fi. Wi-Fi Protected Access: Strong, standards-based, interoperable We used two third of the RockYou! data- security for today’s Wi-Fi networks. Wi-Fi set as the training set and the remainder one Alliance. [Online] April 29, 2003. third as the test set. From the training set, http://www.wi- we constructed the various metrics required fi.org/files/wp_8_WPA%20Security_4-29- by our password checker such as the n-gram 03.pdf. model and the length distribution. We then 2. Paton, W.R. The histories of Polybius / ran the checker on our test set. 50% percent with an English translation by W.R. Paton. of the passwords were reported as being s.l. : Heinemann ; Harvard University Press, ‘Critical’ which means such a password 1922. should never be used. 45% percent was re- 3. Security and usability: the case of the user ported as being ‘High’, which indicates that authentication methods. Braz, Christina and the password should be changed or only Robert, Jean-Marc. s.l. : ACM, 2006. pp. used for none-critical purposes. Only 5% 199--203. percent had a severity of ‘Medium’ or lower. 4. Users are not the enemy. Adams, Anne and Sasse, Martina A. s.l. : ACM, 1999, CONCLUSION Commun. ACM, Vol. 42, pp. 40--46. 0001- 0782. Summarizing, when we look at the Rock- 5. A large-scale study of web password You! Dataset, the preliminary tests indicate habits. Florencio, Dinei and Herley, that passwords are generally short, conform Cormac. s.l. : ACM, 2007. pp. 657--666. to existing language patterns and show a 6. . Create strong passwords. great deal of overlap. In other words, a sig- Microsoft Online Safety. [Online] October 4, nificant part of the user base has an inse- 2009. [Cited: November 19, 2009.] cure password. http://www.microsoft.com/protect/fraud/pa Moving onto pattern analysis, we have sswords/create.aspx. shown that users that do try to enhance 7. Human selection of mnemonic phrase- their password, do so in a very predictable based passwords. Kuo, Cynthia, manner: Over a million users chose a pass-

Radboud University Nijmegen 9 Martin M. A. Devillers Romanosky, Sasha and Cranor, Lorrie F. s.l. : ACM, 2006. pp. 67--78. 8. Siegler, M.G. One Of The 32 Million With A RockYou Account? You May Want To Change All Your Passwords. Like Now. TechCrunch. [Online] TechCrunch, December 14, 2009. [Cited: Februari 16, 2010.] http://techcrunch.com/2009/12/14/rockyo u-hacked/. 9. Prediction and Entropy of Printed English. Shannon, C. E. 30, 1951, Bell System Technical Journal, pp. 50-64. 10. RockYou Hack: From Bad To Worse. TechCrunch. [Online] December 14, 2009. http://techcrunch.com/2009/12/14/rockyo u-hack-security-myspace-facebook- passwords/. 11. 10 Most Common Passwords. PCMag. [Online] May 2007. http://www.pcmag.com/article2/0,2817,21 13976,00.asp. 12. Burnett, Mark and Kleiman, Dave. Perfect Passwords: Selection, Protection, Authentication. Rockland, MA : Syngress, 2005. 13. Measuring Password Strength: An Empirical Analysis. Dell'amico, Matteo, Michiardi, Pietro and Roudier, Yves. Jul 20, 2009. 14. Fast dictionary attacks on passwords using time-space tradeoff. Narayanan, Arvind and Shmatikov, Vitaly. s.l. : ACM, 2005. pp. 364--372. 15. Improving text passwords through persuasion. Forget, Alain, et al. s.l. : ACM, 2008. pp. 1--12.

Radboud University Nijmegen 10 Martin M. A. Devillers