Guessing Human-Chosen Secrets
Total Page:16
File Type:pdf, Size:1020Kb
UCAM-CL-TR-819 Technical Report ISSN 1476-2986 Number 819 Computer Laboratory Guessing human-chosen secrets Joseph Bonneau May 2012 15 JJ Thomson Avenue Cambridge CB3 0FD United Kingdom phone +44 1223 763500 http://www.cl.cam.ac.uk/ c 2012 Joseph Bonneau This technical report is based on a dissertation submitted May 2012 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Churchill College. Technical reports published by the University of Cambridge Computer Laboratory are freely available via the Internet: http://www.cl.cam.ac.uk/techreports/ ISSN 1476-2986 Summary Authenticating humans to computers remains a notable weak point in computer security despite decades of effort. Although the security research community has explored dozens of proposals for replacing or strengthening passwords, they appear likely to remain entrenched as the standard mechanism of human-computer authentication on the Internet for years to come. Even in the optimistic scenario of eliminating passwords from most of today's authentication protocols using trusted hardware devices or trusted servers to perform federated authenti- cation, passwords will persist as a means of \last-mile" authentication between humans and these trusted single sign-on deputies. This dissertation studies the difficulty of guessing human-chosen secrets, introducing a sound mathematical framework modeling human choice as a skewed probability distribution. We introduce a new metric, α-guesswork, which can accurately models the resistance of a dis- tribution against all possible guessing attacks. We also study the statistical challenges of estimating this metric using empirical data sets which can be modeled as a large random sample from the underlying probability distribution. This framework is then used to evaluate several representative data sets from the most im- portant categories of human-chosen secrets to provide reliable estimates of security against guessing attacks. This includes collecting the largest-ever corpus of user-chosen passwords, with nearly 70 million, the largest list of human names ever assembled for research, the largest data sets of real answers to personal knowledge questions and the first data published about human choice of banking PINs. This data provides reliable numbers for designing security systems and highlights universal limitations of human-chosen secrets. 3 To Fletcher, for teaching me the value of hard work. I'm glad you're back. |Joseph Bonneau, May 2012 Acknowledgements I am grateful to my supervisor Ross Anderson for help every step of the way, from answering my emails when I was a foreign undergraduate to pushing me to finally finish the dissertation. He imparted countless research and life skills along the way, in addition to helping me learn to write in English all over again. I was also fortunate to be surrounded in Cambridge by a core group of \security people" under Ross' leadership willing to ask the sceptical questions needed to understand the field. In particular, I've benefited from the mentorship of Frank Stajano and Markus Kuhn, the other leaders of the group, as well as informal mentorship from Richard Clayton, Bruce Christianson, Mike Bond, George Danezis, Claudia Diaz, Robert Watson and Steven Murdoch amongst many others. I thank Arvind Narayanan for his support and mentorship from afar. I am most appreciative of the personal mentorship extended to me by Saar Drimer through my years in the lab, which always pushed me to be more honest about my own work. I am grateful to all of my collaborators, particularly my fellow students Andrew Lewis, S¨oren Preibusch, Jonathan Anderson, Rubin Xu and Ekaterina Shutova. I was also fortunate to be able to collaborate remotely with Cormac Herley and Paul van Oorschot, senior researchers who always treated me as an equal. I owe special thanks to Hyoungshick Kim, thanks to whose patience and positivity I spent thousands of hours peacefully sharing a small office. My research on passwords would not have been possible without the gracious cooperation and support of many people at Yahoo!, in particular Richard Clayton for helping to make the collaboration happen, Henry Watts, my mentor, Elizabeth Zwicky who provided extensive help collecting and analysing data, as well as Ram Marti, Clarence Chung, and Christopher Harris who helped set up data collection experiments. My research on PINs depended on many people's help, including Alastair Beresford for assistance with survey design, Daniel Amitay for sharing data, and Bernardo B´atiz-Lazofor comments about ATM history. I never would have made it to Cambridge without many excellent teachers along the way. From Stanford, I thank Ilya Mironov, Dan Boneh, and John Mitchell for inspiring me to pursue computer security research as an undergraduate. I thank Robert Plummer for his 4 mentorship, for inspiring me to love teaching and encouraging me to study at Cambridge. From earlier on, I thank all of the teachers who showed me how to learn: Mike Kelemen, Steve Hettleman, David Goldsmith, David Goldman, Michael Collins, and David Nelson. My research depended on a large suite of free software. I am indebted to the entire free soft- ware movement, in particular the developers of the GNU, Linux, Ubuntu, GNOME, Mozilla, TEX/LATEX, Python, matplotlib, SciPy, NumPy, and R projects. My time in Cambridge was supported financially by the Gates Cambridge Trust. I am par- ticularly grateful to Gordon Johnson and James Smith for personal help and encouragement, as well as all the officers of the Gates Scholars' Council during my time as president. I thank all of my friends in Cambridge for helping me adjust to life in a new country and the frustrations of life as a graduate student. I'll particularly remember my housemates Andrew Marin, Niraj Lal and Matt Warner, as well as Andra Adams, Marianne Bauer, Lindsay Chura, Justine Drennan, Molly Fox, Talia Gershon, Simone Haysom, Julia Fan Li, Stella Nordhagen, Adeline Oka, Sri Raj, Megan Sim, Jessica Shang, Brian Spatocco, Elsa Trevi~no,and Cleo Tung for close friendships, all of which turned a day around for me at some point during my time in Cambridge. Thanks to modern technology I also received considerable support from friends overseas which kept my spirits up throughout my time in England. I thank my friends Alexandra Bowe, Dave Emme, Alissa Chow and Brent Newhouse for being there when I wanted to talk to a familiar voice, as well as the entire Smitty league of Keegan Dresow, Will Helvestine, Tyler Jank, Jon Levine, Bobby Simon and Steve Zabielskis for listening to my rants and giving me a reason to laugh just about every day. Above all I am grateful for support from my family, who may be few in number and small in stature but have remained a big presence in my life through it all: my cousins Selim and Sinan, uncle Turhan and aunt Phyllis for welcoming me in Turkey after my years-delayed trip, my aunt Amy for chocolate and weekly trivia questions, my grandmother Anne for making sure I keep warm, my grandmother Margaret for teaching me to love words, my siblings Buzzy and Alissa for making sure I can laugh at myself, and my mother and father for giving me so much and teaching me to always appreciate it. I love you all. 5 Contents 1 Introduction 9 1.1 Model of authentication and guessing attacks 10 1.2 Outline of this dissertation 12 1.3 Prerequisites 13 1.4 Mathematical notation 14 1.5 Previous publications and collaboration 15 1.6 Statement on research ethics 16 2 Background 17 2.1 History 17 2.2 Practical aspects of password authentication 19 2.3 Improvements to passwords 24 2.4 Password cracking 32 2.5 Evaluating guessing difficulty 34 3 Metrics for guessing difficulty 41 3.1 Traditional metrics 41 3.2 Partial guessing metrics 44 3.3 Relationship between metrics 50 3.4 Application in practical security evaluation 54 Contents 4 Guessing difficulty of PINs 55 4.1 Human choice of other 4-digit sequences 55 4.2 Surveying banking PIN choices 61 4.3 Approximating banking PIN strength 63 4.4 Security implications 65 5 Estimation using sampled data 66 5.1 Naive estimation 66 5.2 Known negative results 68 5.3 Sampling error for frequent events 69 5.4 Good-Turing estimation of probabilities 70 5.5 The region of stability for aggregate metrics 73 5.6 Parametric extension of our approximations 77 6 Guessing difficulty of passwords 80 6.1 Anonymised data collection 80 6.2 Analysis of Yahoo! data 84 6.3 Comparison with other password data sets 88 6.4 Comparison with natural language patterns 91 7 Guessing difficulty of personal knowledge questions 92 7.1 Sources of data 93 7.2 Analysis of answers 95 7.3 Security implications 99 8 Sub-optimal guessing attacks 101 8.1 Divergence metrics 101 8.2 Applications 104 9 Individual-item strength metrics 110 9.1 Strength metrics 111 9.2 Estimation from a sample 113 9.3 Application to individual passwords 114 9.4 Application to small data sets 116 7 Contents 10 Conclusions and perspectives 118 Bibliography 145 A Glossary of symbols 146 B Additional proofs of theorems 149 B.1 Lower bound on G1 for mixture distributions 149 B.2 Bounds between G~α andµ ~α 149 B.3 Non-comparability of λ~β with G~1 and H1 151 B.4 Non-additivity of partial guessing metrics 152 B.5 Expected value of index strength metric SI(x) for a uniform distribution 154 C PIN survey detail 155 D List of password data sets 158 E Sources of census data 161 8 Computers are useless. They can only give you answers. |Pablo Picasso, 1968 Chapter 1 Introduction Secret knowledge stored in human memory remains the most widely deployed means of human- computer authentication.