Guessing Human-Chosen Secrets
Total Page:16
File Type:pdf, Size:1020Kb
Guessing human-chosen secrets Joseph Bonneau University of Cambridge Churchill College April 2012 This dissertation is submitted for the degree of Doctor of Philosophy Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration except where specifically indicated in the text. No parts of this dissertation have been submitted for any other qualification. This dissertation does not exceed the regulation length of 60; 000 words, including tables and footnotes. In memory of Fletcher (1998-2012). I wish you were around to see this dissertation and fall asleep on it. |Joseph Bonneau, April 2012 Acknowledgements I am grateful to my supervisor Ross Anderson for help every step of the way, from answering my emails when I was a foreign undergraduate to pushing me to finally finish the dissertation. He imparted countless research and life skills along the way, in addition to helping me learn to write in English all over again. I was also fortunate to be surrounded in Cambridge by a core group of \security people" under Ross' leadership willing to ask the sceptical questions needed to understand the field. In particular, I've benefited from the mentorship of Frank Stajano and Markus Kuhn, the other leaders of the group, as well as informal mentorship from Richard Clayton, Bruce Christianson, Mike Bond, George Danezis, Claudia Diaz, Robert Watson and Steven Murdoch amongst many others. I thank Arvind Narayanan for his support and mentorship from afar. I am most appreciative of the personal mentorship extended to me by Saar Drimer through my years in the lab, which always pushed me to be more honest about my own work. I am grateful to all of my collaborators, particularly my fellow students Andrew Lewis, S¨oren Preibusch, Jonathan Anderson, Rubin Xu and Ekaterina Shutova. I was also fortunate to be able to collaborate remotely with Cormac Herley and Paul van Oorschot, senior researchers who always treated me as an equal. I owe special thanks to Hyoungshick Kim, thanks to whose patience and positivity I spent thousands of hours peacefully sharing a small office. My research on passwords would not have been possible without the gracious cooperation and support of many people at Yahoo!, in particular Richard Clayton for helping to make the collaboration happen, Henry Watts, my mentor, Elizabeth Zwicky who provided extensive help collecting and analysing data, as well as Ram Marti, Clarence Chung, and Christopher Harris who helped set up data collection experiments. My research on PINs depended on many people's help, including Alastair Beresford for assistance with survey design, Daniel Amitay for sharing data, and Bernardo B´atiz-Lazofor comments about ATM history. I never would have made it to Cambridge without many excellent teachers along the way. From Stanford, I thank Ilya Mironov, Dan Boneh, and John Mitchell for inspiring me to pursue computer security research as an undergraduate. I thank Robert Plummer for his 4 mentorship of me while at Stanford, for inspiring me to love teaching and encouraging me to study at Cambridge. From earlier on, I thank all of the teachers who showed me how to learn: Mike Kelemen, Steve Hettleman, David Goldsmith, David Goldman, Michael Collins, and David Nelson. My research depended on a large suite of free software. I am indebted to the entire free soft- ware movement, in particular the developers of the GNU, Linux, Ubuntu, GNOME, Mozilla, TEX/LATEX, Python, matplotlib, SciPy, NumPy, and R projects. My time in Cambridge was supported financially by the Gates Cambridge Trust. I am par- ticularly grateful to Gordon Johnson and James Smith for personal help and encouragement, as well as all the officers of the Gates Scholars' Council during my time as president. I thank all of my friends in Cambridge for helping me adjust to life in a new country and the frustrations of life as a graduate student. I'll particularly remember my housemates An- drew Marin, Niraj Lal and Matt Warner, as well as Andra Adams, Marianne Bauer, Lindsay Chura, Justine Drennan, Molly Fox, Talia Gershon, Simone Haysom, Stella Nordhagen, Ade- line Oka, Sri Raj, Megan Sim, Jessica Shang, Brian Spatocco, Elsa Trevi~no,and Cleo Tung for close friendships, all of which turned a day around for me at some point during my time in Cambridge. Thanks to modern technology I also received considerable support from friends overseas which kept my spirits up throughout my time in England. I thank my friends Alexandra Bowe, Dave Emme, Alissa Chow and Brent Newhouse for being there when I wanted to talk to a familiar voice, as well as the entire Smitty league of Keegan Dresow, Will Helvestine, Tyler Jank, Jon Levine, Bobby Simon and Steve Zabielskis for listening to my rants and giving me a reason to laugh just about every day. Above all I am grateful for support from my family, who may be few in number and small in stature but have remained a big presence in my life through it all: my cousins Selim and Sinan, uncle Turhan and aunt Phyllis for welcoming me in Turkey after my years-delayed trip, my aunt Amy for chocolate and weekly trivia questions, my grandmother Anne for making sure I keep warm, my grandmother Margaret for teaching me to love words, my siblings Buzzy and Alissa for making sure I can laugh at myself, and my mother and father for giving me so much and teaching me to always appreciate it. I love you all. Guessing human-chosen secrets Joseph Bonneau Summary Authenticating humans to computers remains a notable weak point in computer security despite decades of effort. Although the security research community has explored dozens of proposals for replacing or strengthening passwords, they appear likely to remain entrenched as the standard mechanism of human-computer authentication on the Internet for years to come. Even in the optimistic scenario of eliminating passwords from most of today's authentication protocols using trusted hardware devices or trusted servers to perform federated authenti- cation, passwords will persist as a means of \last-mile" authentication between humans and these trusted single sign-on deputies. This dissertation studies the difficulty of guessing human-chosen secrets, introducing a sound mathematical framework modeling human choice as a skewed probability distribution. We introduce a new metric, α-guesswork, which can accurately models the resistance of a dis- tribution against the full range of possible guessing attacks. We also study the statistical challenges of estimating this metric using empirical data sets which can be modeled as a large random sample from the underlying probability distribution. This framework is then used to evaluate several representative data sets from the most im- portant categories of human-chosen secrets to provide reliable estimates of security against guessing attacks. This includes collecting the largest-ever corpus of user-chosen passwords, with nearly 70 million, the largest list of human names ever assembled for research, the largest data sets of real answers to personal knowledge questions and the first data published about human choice of banking PINs. This data provides reliable numbers for designing security systems and highlights universal limitations of human-chosen secrets. Contents 1 Introduction 11 1.1 Model of authentication and guessing attacks 12 1.2 Outline of this dissertation 14 1.3 Prerequisites 15 1.4 Mathematical notation 16 1.5 Previous publications and collaboration 17 1.6 Statement on research ethics 18 2 Background 19 2.1 History 19 2.2 Practical aspects of password authentication 21 2.3 Improvements to passwords 26 2.4 Password cracking 34 2.5 Evaluating guessing difficulty 36 3 Metrics for guessing difficulty 43 3.1 Traditional metrics 43 3.2 Partial guessing metrics 46 3.3 Relationship between metrics 52 3.4 Application in practical security evaluation 56 4 Guessing difficulty of PINs 57 4.1 Human choice of other 4-digit sequences 57 4.2 Surveying banking PIN choices 63 4.3 Approximating banking PIN strength 64 4.4 Security implications 67 5 Estimation using sampled data 68 5.1 Naive estimation 68 5.2 Known negative results 70 5.3 Sampling error for frequent events 71 5.4 Good-Turing estimation of probabilities 72 5.5 The region of stability for aggregate metrics 75 5.6 Parametric extension of our approximations 79 6 Guessing difficulty of passwords 82 6.1 Anonymised data collection 82 6.2 Analysis of Yahoo! data 86 6.3 Comparison with other password data sets 90 6.4 Comparison with natural language patterns 93 7 Guessing difficulty of personal knowledge questions 94 7.1 Sources of data 95 7.2 Analysis of answers 97 7.3 Security implications 101 8 Sub-optimal guessing attacks 103 8.1 Divergence metrics 103 8.2 Applications 106 9 Individual-item strength metrics 112 9.1 Strength metrics 113 9.2 Estimation from a sample 115 9.3 Application to individual passwords 116 9.4 Application to small data sets 118 10 Conclusions and perspectives 120 Bibliography 147 A Glossary of symbols 148 B Additional proofs of theorems 151 B.1 Lower bound on G1 for mixture distributions 151 B.2 Bounds between G~α andµ ~α 151 B.3 Non-comparability of λ~β with G~1 and H1 153 B.4 Non-additivity of partial guessing metrics 154 B.5 Expected value of index strength metric σI(x) for a uniform distribution 156 C PIN survey detail 157 D List of password data sets 159 E Sources of census data 162 Computers are useless. They can only give you answers. |Pablo Picasso, 1968 Chapter 1 Introduction Secret knowledge stored in human memory remains the most widely deployed means of human- computer authentication.