Florida State University Libraries

Electronic Theses, Treatises and Dissertations The Graduate School

2007 Toward Usable, Robust Memometric Authentication: An Evaluation of Selected Password Generation Assistance Peter Thomas Henry

Follow this and additional works at the FSU Digital Library. For more information, please contact [email protected]

THE FLORIDA STATE UNIVERSITY

COLLEGE OF INFORMATION

TOWARD USABLE, ROBUST MEMOMETRIC AUTHENTICATION:

AN EVALUATION OF SELECTED PASSWORD GENERATION ASSISTANCE

By

PETER THOMAS HENRY

A Dissertation submitted to the College of Information in partial fulfillment of the requirements for the degree of Doctor of Philosophy

Degree Awarded: Summer Semester, 2007

Copyright © 2007 Peter Thomas Henry All Rights Reserved

The members of the Committee approve the dissertation of Peter Thomas Henry defended on May 21, 2007.

______Charles R. McClure Professor Directing Dissertation

______Michael Burmester Outside Committee Member

______John Carlo Bertot Committee Member

______Gary Burnett Committee Member

Approved:

______Lawrence C. Dennis, Dean College of Information

The Office of Graduate Studies has verified and approved the above named committee members.

ii

ACKNOWLEDGEMENTS

This dissertation was possible in large part because of the guidance of my committee members, and I want to express my deep appreciation of their help during all stages of the process. Dr. Michael Burmester assisted me immensely with all issues mathematic and cryptographic. Dr. Gary Burnett eased me into compliance with the English language, and in many ways increased the overall usability of this usability research. Dr. John Carlo Bertot graciously provided methodological guidance throughout the development of the research. Finally, the Chairman of my committee, Dr. Charles McClure was by far my greatest source of guidance and encouragement. I am grateful to all the participants of this study. I demanded a lot from them to complete eleven distinct steps and to use passwords that were far longer than usual. They must remain anonymous to the reader, even if their passwords cannot. I owe much gratitude to the many friends and colleagues who have helped me during my studies. They include such computer scientists as Leo Kermes, Sahil Cooner, Dr. Peter Jorgensen, Keeffee Haynes, and Dr. Breno DeMedeiros; such colleagues and teachers as Linda Most, Tommy Snead, David Miner, Dr. Corrine Jorgensen, and Dr. Darrell Burke. The highest gratitude belongs to Judie Mulholland, who inspired me to go for the doctorate and helped me in countless ways. I would also like to thank my friends and family, who know full well what they have gone through, for all their support. Special thanks and love always to my dedicated parents, Robert and Janet Henry, who encouraged me from higher and drier regions.

iii

TABLE OF CONTENTS

List of Tables ...... v List of Figures ...... vii Abstract ...... viii

1. INTRODUCTION ...... 1

2. LITERATURE REVIEW ...... 26

3. METHODOLOGY ...... 65

4. PASSWORD STRENGTH TESTING ...... 83

5. PASSWORD-GENERATION SCHEME TESTING ...... 98

6. GENERATION STAGE TREATMENT ASSESSMENT ...... 121

7. CONCEPTUAL FRAMEWORK ASSESSMENT ...... 132

8. CONCLUSION ...... 145

APPENDIX A: GLOSSARY ...... 165

APPENDIX B: HUMAN SUBJECTS APPROVAL ...... 168

APPENDIX C: THINK-ALOUD PROTOCOL ...... 169

APPENDIX D: FINAL QUESTIONNAIRE ...... 172

APPENDIX E: GROUP 1 PROTOCOL ...... 173

APPENDIX F: GROUP 2 PROTOCOL ...... 176

APPENDIX G: GROUP 3 PROTOCOL ...... 180

APPENDIX H: GROUP 4 PROTOCOL ...... 183

REFERENCES ...... 187

BIOGRAPHICAL SKETCH ...... 197

iv

LIST OF TABLES

Table 2.1: Computation times with a PDP-11/70 ...... 33

Table 2.2: The Effect of Search Space on Cracking Time ...... 35

Table 2.3: Attack Times versus Password Length ...... 35

Table 2.4: Attack Times on Representative Passwords ...... 36

Table 2.5: Password Restrictions of Common Operating Systems ...... 38

Table 2.6: Frequency of English Letters ...... 42

Table 2.7: Frequency of Numbers in Written English ...... 43

Table 2.8: Frequencies of English letters at Word Boundaries ...... 43

Table 2.9: The Effect of Search Space on Entropy ...... 45

Table 2.10: Results of Password Attacks, by Test Group ...... 57

Table 2.11: Responses to Email Survey ...... 58

Table 3.1: Research Design ...... 66

Table 3.2: Subject Groups by Treatment ...... 69

Table 3.3: Relation of Research Questions to Data Collection Methods ...... 71

Table 4.1: The Effect of Search Space on Entropy ...... 89

Table 4.2: Reinhold’s Survey Results ...... 92

Table 4.3 Engelfriet’s Passphrase Security Scores ...... 93

Table 4.5: Entropy Scores of Selected Passwords ...... 95

Table 5.1: Analysis of Five Password-generation Schemes ...... 103

Table 5.2: Selected Passwords by Generation Scheme ...... 106

Table 5.3: Study Passwords by Scheme ...... 107

Table 5.4: Authentication Failure Rate by Scheme ...... 111

v

Table 5.5: Manual Password Resets ...... 112

Table 5.6: Participants completing each Step ...... 113

Table 5.7: Dropouts by Scheme ...... 113

Table 5.8: Password Generation Scheme by Reason ...... 116

Table 5.9: Plaintext Password Failure by Scheme ...... 118

Table 5.10: Final Scheme Choice ...... 119

Table 6.1: Participants completing each Step ...... 122

Table 6.2: Subject Groups by Treatment ...... 124

Table 6.3: Authentication Failure Rate by Group ...... 126

Table 6.4: Password Resets by Group ...... 130

Table 7.1: Episodic v. Semantic Memory ...... 134

Table 7.2: Think-aloud Participant Performance ...... 139

Table 7.3: Alternative Authentication Preference ...... 141

Table 7.4: Participant Password Change Frequency ...... 142

Table 7.5: Password-generation Scheme Choice ...... 143

Table 8.1: Authentication Failure Rate by Scheme ...... 146

Table 8.2: Password Input and Recall Error Rates ...... 149

Table 8.3: Authentication Failure Rate by Group ...... 163

Table 8.4: No Failure and Total Failure by Group ...... 163

vi

LIST OF FIGURES

Figure 1.1: Tulving’s GAPS – Elements of Episodic Memory and their Relations ...... 16

Figure 1.2: Preliminary Password GAPS Model ...... 17

Figure 6.1: Participants by Age ...... 123

Figure 6.2: Current Passwords ...... 123

Figure 6.3: Ciphertext Password Recall ...... 128

Figure 6.4: Plaintext Password Recall ...... 129

Figure 6.5: Length of Important Passwords ...... 129

Figure 7.1: Password-generation Schemes within the Password GAPS Model ...... 136

Figure 7.2: Study Treatments within the Password GAPS Model ...... 137

Figure 8.1: Reformulated Password GAPS Model ...... 156

Figure 8.2: Initial Scheme Preference ...... 161

Figure 8.3: Final Scheme Preference ...... 162

vii

ABSTRACT

This dissertation explored the effects of various types of assistance on the generation, recall, and input of robust passwords containing at least twenty characters. Passwords are desirable memometric authentication secrets for many reasons, but their continued effectiveness depends on increasing their resistance to emerging attacks. Resistance to attacks is increasingly a function of length. Although previous password research revealed widespread use of short, weak passwords and conventional wisdom considers users incapable of reliably generating, recalling, and accurately inputting strong passwords, this study investigated ways to assist users in meeting the specific challenges of robust password management. Interventions in the password-generation stage of this study introduced participants to five password generation schemes, supplied various numbers of example passwords, and required reentry of passwords immediately after generation to explore possible benefits on subsequent authentication performance. Key findings of this research were that: • Twenty-character passwords can be as strong as their corresponding 128-bit hashes; • Acrostic password-generation schemes produced strong passwords; • Confessional and Unexpected Nonsense schemes produced memorable passwords; • Supplying example passwords led to stronger passwords; • All participants easily generated 20-character passwords and most experienced few problems in the vague recall of them; • 30% of participants generated and used very strong passwords without failure for seven weeks; • The input of the precise formulation of robust passwords was the greatest single cause of authentication failure; • Exposure to 5 or 10 additional password examples during the generation stage did not improve subsequent password performance; • Reentry of passwords four times during the generation stage did not improve subsequent password performance; • Although education and training are beneficial, the actual study treatments were not universally effective; and

viii

• The population of password users and the reasons for password failure are complex, and users who experience difficulties require additional attention and resources on a contingency basis.

ix

CHAPTER 1 INTRODUCTION

1.1 The “Problem” with Passwords This study addressed a persistent challenge in information security: the generation, recall, and input of strong passwords. Passwords are nearly ubiquitous components of access control imposed on authorized users of networked computer systems, and advances in the abilities of attackers in environments such as the internet and other networks increasingly demand the use of stronger passwords. This study argues that stronger passwords are necessarily longer passwords. Researchers, security experts, administrators, and users have made efforts to assist users in the process of generating passwords that are sufficiently strong, easy to remember, and easy to reliably input, and this study contributes to research into the usability of human interaction with password-based authentication mechanisms by evaluating the effectiveness of popular password- generation schemes and other means to assist users in the process. As information technology becomes increasingly interconnected and integrated with human activity, access control to networked computer systems grows more important as a means to protect the connected resources, identities, and other sensitive data that it processes from unauthorized use, theft, or compromise. Access control relies on the accurate and positive authentication of authorized users and effective access control minimizes false positives – unauthorized access – ideally to zero, while also minimizing the frustration of false negatives – the denial of access to authorized users – caused by the brittleness access control mechanism that demand reproduction of the precise formulation of the pre-established password. There are many different mechanisms in use to authenticate users with IT systems, but all are based on the pre-established unique relationship between the authorized user and the system. The principle means to authenticate users to computer systems are based on three traits of the authorized user: “what you know,” “what you have,” and “what you are” (Kaufman et al., 2002, p. 237). Passwords are the most salient examples of what you know. Tokens, such as hardware dongles and smart cards, are examples of what you have. Biometric characteristics, as detected by hardware retinal scanners, fingerprint readers, etc., are examples of what you are. In addition

1

to the above three traits, Bishop adds location, or “where you are,” such as at a particular terminal, IP address, or within a security perimeter (Bishop, 2003, p. 309). Yet another trait is based on “how you act,” or how the user responds to a series of questions, types on the keyboard, or selects graphic “pass-faces” from an array of images (Dharmija & Perrig, 2000, p. 41; Bastroff & Sasse, 2000), or. This study will use the following categories to refer to these mechanisms: memometrics, authentication based on the memory recall of the user; cognometrics, based on the thought process of the user; biometrics, based on the physiology of the user, tokenometrics, based on physical possessions of the user, behaviorometrics, based on the behavior of the user, and locometrics, based on the physical or logical location of the user. Because IT is a rapidly developing field with a dynamic vocabulary, Appendix A: Glossary defines all technical terms and industry jargon used in this text as a reference for readers. Among the many types of access control mechanisms in current use, the primary authentication method for networked computers has been, and remains, the memometric combination of pre-negotiated username and alphanumeric password. This mechanism is advantageous for several reasons. First, it has grown out of legacy systems, so it works with nearly all combinations of existing hardware devices, software, systems, and networks. Second, it is relatively easy to implement because all computing systems can readily manipulate the text strings used and perform the cryptographic computations needed for , hashing, and other processing of those strings. Third, the usernames and passwords that comprise a shared secret between the system and the user can be strengthened by session key amplification protocols, resulting in secure remote authentication messages over insecure channels, such as the internet, even when weak passwords are used. Fourth, sophisticated zero-knowledge protocols can dramatically reduce vulnerabilities caused by storage of memometric authentication data. Fifth, passwords can be combined with other methods of authentication into multi-factor schemes to provide multiple layers of defense. Because of these five advantages, passwords will likely remain an important component in IT authentication protocols for the foreseeable future. Despite their above-mentioned advantages, passwords suffer from three significant deficiencies. First, their cryptographic “search space” – their set of constituent elements – is historically small, typically the 94 characters of the ASCII character set available on the American English keyboard. Second, passwords are generally short (e.g. 5- to 8-character) formulations drawn from this limited search space, and some legacy systems limit or truncate

2

their length. The combination of these two characteristics of passwords results in a cryptographically weak authentication secret. The third and most significant weakness of passwords is that human users view access control as a hindrance to their primary work or other computing tasks, and, as a result, typically use only the minimum allowable passwords so that they are easy for them to remember and input. Because users often choose passwords that are short, comprised of common words, and that utilize only a small part of the available search space, such passwords are relatively easy to guess, infer, or pre-compute, and are thus cryptographically weak. If passwords are to remain an effective part of access control, they must be strong, and if passwords are to be strong, this study argues that they must be long. Over the years, researchers, system administrators, and others in information security have devised several password generation schemes to help users with long passwords, and this study explored the relative merits of these approaches in light of Endel Tulving’s GAPS memory model (Tulving, 1972, 1983). This study focused on the Human-Computer Interaction Security (HCISec) problem, in terms of cognitive load, that long password use places on users, and evaluated attempts by security researchers, system administrators, and users in the field to help users with password management in various ways. Although passwords are nearly ubiquitous in IT access control mechanisms, they remain some of the weakest authentication tokens in use today and their use imposes significant cognitive challenges upon users. To overcome these two limitations, much research is underway into the alternative authentication mechanisms listed above, and thus one may legitimately question the need for further research into the problematic use of memometric authentication. The next section assesses the continued importance of research into password use.

1.2 Importance of Password Research Although passwords are often weak compared to newer alternatives as authentication secrets, usually because of human information-processing limitations, they remain the most widespread method of authentication for several reasons. First, human beings have found them useful for the identification of strangers for centuries, as in the “open sesame” formula in The 1001 Nights, and users intuitively understand their use in such cultural contexts. Second, they are straightforward and inexpensive to deploy on all computer systems because of their minimal

3

bandwidth, graphical, and other hardware requirements. Third, they are relatively easy to encrypt, hash, salt, and otherwise cryptographically manipulate. As compensation for their potential weakness, designers often use them as part of multi-factor authentication schemes along with some combination of cognometrics, behaviorometrics, and biometrics. In all, there are three primary reasons that password research remains important: (i) passwords are attractive targets for attackers, (ii) they are central to HCISec interaction, and (iii)there are no clear or singular alternatives to them. The following sub-sections discuss these reasons in greater detail.

1.2.1 Passwords are Major Targets for Attackers. Bishop formalizes the password as “information associated with an entity that confirms the entity’s identity,” in a system where “subjects act on behalf of some other, external entity. The identity of the entity controls the actions that its associated subjects may take. Hence, the subjects must bind to the identity of that external entity” (Bishop, 2002, p. 309-10). Authentication is the process of binding an identity to an authorized subject, and it occurs when the external entity (e.g. the authorized user computational process) supplies confirmatory evidence of its identity to the system. The breaching of this binding is the goal of a large number of attacks on computer systems, as this binding is a well-known vulnerability in . Most successful attackers (e.g. Kevin Mitnick) avoid brute force attacks on security mechanisms in favor of social engineering, by which they fool inside users into revealing passwords or leaking other information that can be used to crack the system. Because of the success of social hacking and social engineering attacks, crackers consider the user the weakest link in the chain of computer security. Historically, as computers became increasingly networked, passwords became more vulnerable, sometimes to online sniffing of the password within transmitted TCP/IP packets, or to offline dictionary attacks that compare the entire contents of a dictionary to the password file on the authentication server. In response to such rudimentary attacks, system designers deployed cryptographic hashing countermeasures to store the processed password. Advanced attacks now utilize huge “rainbow tables” of pre-computed hashes and their corresponding password strings and are not directed toward any particular password, but are brute force attacks on all the permutations of the ASCII character set. Sophisticated encryption protocols such as Secure

4

Socket Layer (SSL), IPSec, Secure Shell (SSH), Secure Remote Password (SRP), etc. greatly complicate sniffing attacks by encrypting all messages that might leak information to a listener. Robust protocols such as Encrypted Key Exchange (EKE), Strong Password Exponential Key Exchange (SPEKE), SRP, and others (viz. AMP, SNAPI, AuthA, OKE) combine sophisticated public key exchange with zero-knowledge techniques to thwart both online and insider attempts to uncover passwords. Secure operating systems complicate insider attacks by adding extra random bits, or “salt,” to passwords and then applying one-way hashing algorithms before storing the result. Interestingly, some widely used operating systems today do not add the salt, and thus provide the opportunity for dictionary attacks. This study showed, that, for now, the use of passwords longer than 20 characters remains an effective means against all but the most sophisticated pre- computing attacks, and sought to discover techniques to generate passwords that meet that security parameter while remaining manageable for the user.

1.2.2 Passwords are Central to HCISec Interaction. Usability and security are often considered antithetical and inversely proportional to each other, and, generally speaking, the more usable a computer system is, the less secure it is. All security mechanisms ultimately depend upon human beings and vulnerabilities can occur when humans fail to do their part. Consequently, designers, administrators, and security professionals must account for human frailties and engage users in the overall security process, and users must understand the need for security and the consequences of bad practices. For example, users often circumvent authentication mechanisms because they either consider them annoying or trivial, or do not fully understand the vulnerabilities created by doing so, and studies have found that people in group- based technological environments frequently subvert technology that they do not consider directly benefiting them or their work (Grundin, 1987, p. 809), and Research has shown that many vulnerabilities result from ignoring the human factors in security mechanisms (Davis & Price, 1987; Hitchings, 1995; Adams & Sasse, 1999; Sasse et al., 2001), and human interface designers increasingly strive for ease of use so that the “user’s mental image of his protection matches the mechanisms he must use,” to minimize resultant vulnerabilities (Seltzer & Schroeder, 1975, p. 1301). Clearly, there are many causes of poor security practice, including users, security administrators, and designers.

5

This study sought ways to ameliorate the usability problems plaguing memometric authentication. System designers and security administrators increasingly demand strong, unique, and limited-term passwords, while the users who are expected to generate, recall, and input them are often poorly served by the existing computer security culture. Long passwords, multiple passwords, and mandated periodic password changes place significant demands on users who fail to comprehend the need for security, and are insufficiently trained in password management. In particular, infrequently used passwords are particularly problematic because their recall is not part of a normal pattern, and they are often composed hurriedly and without great care. Memory research has shown that people remember frequently used items more easily (Kinsbourne & George, 1974, p. 67), and this has consequences on password recall for systems that are rarely used. Because they are forced to use multiple discrete computer accounts, users and administrators, some of whom require dozens of long passwords, have adopted different techniques to cope with password management. Some physically write the passwords, unencrypted, on a piece of paper. This practice creates another vulnerability if that list is compromised, but it is often tolerated since users are generally familiar with the safekeeping of physical objects, and since physical security is always a necessary precondition for cybersecurity. Increasingly, sophisticated users who need to remember many long passwords have adopted technological aids, such as encrypted spreadsheets, databases, or removable storage devices, as secure vehicles for their multiple passwords. This is better than a written, unencrypted list, but still requires physical security measures, and the fragility of the hardware used introduces a single point of failure with potentially catastrophic consequences if backup regimes are not followed. Computer systems are meaningless without users, but when the security of the system does not match the mental image of the users, the users can compromise the system. Systems without access control become unusable in hostile networked environments, but systems with onerous access control become unusable when authorized users cannot gain access. Thus, because access control is necessary, it remains central to HCISec research, and because memometric access control remains problematic and nearly ubiquitous, the need for password research remains.

6

1.2.3 There are no Simple Alternatives to Passwords. As discussed above, memometric authentication schemes are generally considered weak, and they increase the cognitive load of users, and much research is underway into alternatives. This section assesses the viability of alternative access control mechanisms including biometrics, behaviorometrics, tokenometrics, cognometrics, and locometrics. Biometrics seem to present attractive alternative authentication secrets, since they are based on the unique “fingerprints, voice, iris or retina, vein pattern, face, hand or finger geometry, or even ear shape” of the authorized user (Renaud, 2005, p. 109), but implementations have proven problematic for several reasons. First, they are easy to spoof in an uncontrolled or remote environment. Second, biometric mechanisms are relatively slow, both on initial enrollment and at subsequent authentication attempts. Third, they require dedicated hardware at each authentication point and no universal standards exist. Behaviorometrics can be based on “mouse usage patterns, keystroke latencies or dynamics, or signature dynamics” (Renaud, 2005, p. 109) that are highly characteristic of the authorized user, but they also have weaknesses. First, they suffer from the three shortcomings of biometrics: ease of spoofing, slowness, and reliance on dedicated hardware or software. Second, they are subject to false negatives because authorized users often behave differently in different authentication situations. Third, they are subject to false positives because behavior patterns and dynamics, although characteristic, are not necessarily unique to an individual user. Tokenometrics are represented by increasingly sophisticated technological solutions such as the familiar magnetic swipe card, the “Smart Card,” RSA Security’s “SecureID,” or the Universal Serial Bus (USB) token. Although tokens can provide very robust authentication for their holder because they can manipulate very strong secrets, they also have distinct disadvantages. First, they require special hardware for enrollment and for each subsequent authentication point, and this hardware can be relatively expensive and non-standard. Second, they require physical possession by the user, and can easily be misplaced, just as people lose physical keys. Third, there is no guarantee that their holder is the authorized user of the system, and not an imposter, unless they are used as part of a multi-factor authentication scheme. So, to avoid misuse by an unauthorized person, they often require a password or Personal Identification Number (PIN), which is yet another, although typically short, memometric.

7

Cognometrics typically involve graphical authentication using recognition-based or position-based systems. Prototypical recognition-based systems (e.g. Passfaces, Déjà Vu, Visual Identification Protocol) rely on visual memory and the ability to recognize previously seen visual objects. Position-based systems (e.g. Graphical Password, Draw-a-Secret, and Jiminy) rely on both visio-spatial memory and either the precise identification of target objects in an image or the ability to recreate a pre-drawn image on a grid (Renaud, 2005, p. 112). Both systems seek to exploit findings from classic cognitive studies that indicate an immense, innate human capacity for pictures (Madigan, 1983), and evidence that visual memory is minimally affected by the cognitive decline associated with aging (Park, 1997), but there are several problems with such systems. First, they require special hardware and high-resolution graphical user interfaces (GUI). Second, unless many stages of cognition are used, they are as weak as short passwords. Third, although humans easily recognize previously selected images, they can have difficulty remembering the exact sequence of previous selection. Although much research into alternatives to passwords is underway, and although passwords have usability and security problems, no single alternative has been as widely deployed as passwords. Alternative authentication schemes seek to circumvent the usability and security issues of passwords, and if passwords are to remain a part of IT access control mechanisms, research is needed to help individual users generate passwords that meet the seemingly conflicting requirements of: • Large cryptographic search space, or randomness • Sufficient length (e.g. twenty characters or more), • Ease of formulation, • Ease of input, and • Memorability. To this end, researchers, system administrators, and individual users have suggested schemes to assist users in the process of generating strong passwords that are memorable and relatively easy to input. This study was significant because it evaluated such schemes in terms of these requirements with the aim of making strong passwords more usable.

1.3 Human-Computer Interaction Research and Information Security

8

The Association for Computing Machines (ACM) defines HCI as “a discipline concerned with the design, production, and evaluation of interactive computing systems for human use and with the study of major issues involving them” (SIGCHI, 2005). Denning, et al., listed human- computer communication as one of nine sub-areas of computer science, and major topics of HCI research are: • joint performance of tasks by humans and machines; • structure of communication between human and machine; • human capabilities to use machines; • engineering and programming of the interface; and • design compromises (Denning et al., 1988, p 2). HCI includes the interaction between any number of humans and computational machines as computer networks take on increasingly sophisticated forms. The scope of HCI in commercial applications has grown so much that, within modern software, HCI typically comprises more than half a project’s code. HCI research is inter-disciplinary and uses methods from computer science, cognitive psychology, sociology, anthropology, and industrial design. The major thrust of the HCI research agenda has been to increase the usability of computer systems to make them more valuable, and even “friendly,” to their users. Computer systems have evolved into able and necessary players in today’s society, to the point that it has become inconceivable for modern culture and economy to continue without them. Because of this tight interaction on a daily and critical basis, HCI research has also focused on improving the reliability of input and output, and the accessibility of computing systems for people with disabilities. Although Information Technology (IT) has evolved exponentially in the last fifty years, human beings – at least on the biological and physiological levels – have changed only at an evolutionary rate. Human users still have two hands with five fingers and two eyes, and are still limited in attention span, vision, posture, reading and typing speed, etc. Early HCI research questions centered on the basics of the human-computer interface, and hardware and software advancement led to increasingly more usable systems as more computer power was dedicated to improving interaction. As systems improved, research questions included inquiries into the sequence of interactive events via input and output devices; and as computer applications

9

become more capable and networked, thousands of questions arose concerning interaction refinement. Human-Computer Interaction Security (HCISec) deals with the intersection of HCI and security. Ironically, the increased functionality, accessibility, and convenience brought about by widespread adoption of high-speed internet connectivity have complicated interaction for users in many ways. Personal computers at home, work, or school are increasingly used in online banking, commerce, communication, education, dating, socializing, etc., and they now store unprecedented quantities of sensitive personal information that has become the target for increasingly sophisticated online fraud. Attackers include innovative and competent hackers (many of whom live in economically-depressed, but connected, areas of the world), scam artists, spammers, spies-for-hire, script-kiddiez, and their agents: always-on, always-probing, always- scanning, networked computers. These computers, which can include hijacked zombies, increasingly work together to utilize the power and reach of the internet to hijack the sensitive data and resources of unsuspecting users. The dangers lurking on the internet demand countermeasures in the form of firewalls, spam filters, anti-virus software, etc. that in turn impose increased overhead and cognitive load on legitimate users. The increasingly sophisticated nature of internet scams, including viruses, worms, spyware, Trojan Horses, pharming and phishing attacks, rootkits, and other technical subterfuges necessitates correspondingly sophisticated defensive countermeasures. Some of these threats are so insidious that the average user has no indication that her identity has been stolen or that her computer is now a zombie server for lease on the market. In particular, the rootkit is so covert in hiding its action and attempts to discover its presence that in many cases a complete reinstallation of the operating system is the only remedy. Because of widespread ignorance of the actual threats involved, because of the widespread use of insecure operating systems and network appliances, and, ironically, because the market demands for usability have driven HCI researchers to hide the complexity of computer processes from the user, there may be no ultimate solution to this problem on the user level. This situation has led to the emerging field of HCISec, which seeks methods of helping users, administrators, and other stakeholders cope with the constant threats present in a hostile networked environment. Security countermeasures increase the cognitive load on users because they combat threats from a dynamic hostile environment and require extra attention above and beyond the

10

fundamental HCI concerns of functionality, accessibility, and usability. HCI is premised on the assumptions that the only interaction between humans and computers is user task driven, and that the computer is a sophisticated tool to do the bidding of the user, but this view obscures the underlying complexity of the networked computer. The growing threats enabled by the internet environment can turn the machine against the user, the system administrator, security staff, the network, the enterprise, and, potentially, national security. This threat is ignored only at great peril, and the mere recognition of it increases the cognitive load of the user. The increased cognitive load caused by the dynamic between internet threats and countermeasures to them is an unwanted addition to HCI, and is antithetical to the convenience and usability that the internet promises. Nevertheless, security has become a necessary evil, and is analogous to insurance. Like insurance, security is a negative deliverable since it comes at a premium and, in ideal situations, nothing happens. If the threat is ignored, however, vulnerabilities remain and the potential for loss continues. In networked environments, security is very crucial because of the ability of unknown outsiders to access computing resources and sensitive data. Organizational environments are usually regulated by a security policy and maintained by an IT staff that assists users in the access and use of networked resources within security parameters. These top-down policies are often a source of frustration for users because of the restrictions they impose on functionality. In non-organizational environments, as the internet increasingly extends into more private spheres in users’ lives, the burden of security is placed directly on the users. Thus, users face the same threats, but without the help of an IT staff, often resulting in dangerous vulnerability leading to resource or identity theft. Access control has become critically important for the protection of sensitive information, critical records, and vital resources in networked computing environments. To protect these resources, a computer or access control device assigns permission levels to them according to a database of authorized users. Access control mechanisms are designed to let authorized users in and to keep everyone else out. To do this, they must test each user’s account according to pre-negotiated secrets to confirm the identity of the person. These secrets (viz. username and password) form a bond between the individual and the computer’s digital representation of the individual, and significant HCISec research remains to be done in the effective and efficient binding of human beings with their online or system-based identities.

11

Strong shared secrets between users and memometric authentication mechanisms potentially impose a significant increase in cognitive load upon users, and this study explored methods of assisting users in generating, recalling, and inputting the primary bond in computer access control – the password.

1.4 Purpose, Goals, and Objectives The purpose of this study was to contribute to the usability of human interaction with secure memometric authentication systems. One goal of the study was to help users generate, input, and remember the cryptographically strong passwords necessary for such secure memometric authentication systems. A second goal was to inform the field of information security of the potential for robust password management on the part of everyday users, if sufficiently engaged in the security process. A third goal of this study was to identify user education issues and suggest strategies for improvement in terms of practice and research. Specific objectives of this study were to: 1. Identify effective password-generation schemes; 2. Contrast the relative strengths and weaknesses of identified schemes; 3. Measure the effects of examples and reentry during the password-generation stage on subsequent password recall and input; 4. Assess the long-term memorability of generated passwords; 5. Evaluate the relative effectiveness of each scheme; and 6. Propose future research into improving robust password management. This study evaluated the robustness, recall, and memorability of passwords that users self-select based on personal memories using schemes that assist them in the process. Typical computer users retain memories of events that are potentially both unforgettable and resistant to social engineering attacks. These memories, when formulated into long, easy to type, and easy to remember strings including characters from an expanded cryptographic search space, can result in passwords resistant to all but the most resourceful attackers. Effective password-generation schemes can contribute to the usability and practice of human interaction with secure memometric authentication systems. They can extend the usefulness of passwords in memometric authentication, either alone or as part of multi-factor protocols, into the foreseeable future. They can relieve some of the burden of access control

12

assistance by system administrators, and they can minimize the frustration and cost of denied access caused by forgotten passwords. To fulfill its greater purpose and to meet it specific objectives, this study used five data collection methods to test the effectiveness of five password- generation schemes in terms of password strength, ease of generation, ease of recall, and ease of input.

1.5 Conceptual Framework Early research into password problems proceeded from a cryptographic framework, and focused on the robustness of the password in terms of cryptographic strength and resistance to cracking (Saltzer & Schroeder, 1975; Morris & Thompson, 1979; Leung, 1991; Spafford, 1992; Klein, 1990; Bishop & Klein, 1992). It became clear that strong passwords themselves were not the problem, since computers can generate pseudo-random passwords of any length, but that user ability to remember and input the precise formulation of the shared secret, as needed for authentication, appeared to be limited. In a comparative empirical evaluation in 1993, Zviran and Haga examined the memorability and subjective preference of various password mechanisms on students (Zviran & Haga, 1993). Adams and Sasse, and Sasse et al. used grounded theory techniques and interviews with users in the field, and found high user frustration levels with password mechanisms (Adams & Sasse, 1999; Sasse et al., 2001). Such research indicates that security is a social problem in which IT designers and administrators often act as condescending technocrats, while users remain unconcerned with the need for security and the potential for loss. As noted above, although much access control research has gone into the development of alternatives to the password to bypass human memory limitations, the password remains nearly ubiquitous and problematic. Passwords are the primary part of the secret shared by authorized users and computer systems. The computer remembers them indirectly as the product of cryptographic algorithms, but the user must be able to recall the exact string for each successful authentication. For a multitude of reasons, users can fail to accurately recall their passwords, and this study sought ways to help users better manage their passwords in part by drawing upon empirical evidence from human memory research. There are three major conceptual approaches to understanding human memory. These are (i) the study of biochemical changes in the neural pathways of the brain, (ii) the study of the physical structure of the brain, and (iii) cognitive psychology. Of these three, cognitive

13

psychology identifies and models unobservable mental mechanisms and processes to explain observed patterns of behavior and private subjective experiences. Cognitive psychological memory research focuses less on the molecular biology and physical characteristics of the physiological storehouse of memory – the brain – and more on the processes and events involved in remembering. Cognitive psychologists have suggested many categories of human memory including episodic, semantic, declarative, procedural, implicit, and explicit (Conway, 1996, p. 165). Among these, episodic memories are specific, if not unique, to the individual, and thus show great promise for use as the shared secret – the password – between the human and the computer system. The concept of memory is central to understanding mental behavior, and cognitive psychology emphasizes the internal representations of past experiences and their utilization in mental activities (Gregg, 1986, p. 2). Cognitive memory research distinguishes three primary ways that humans remember: uncued recall, cued recall, and recognition. Uncued recall makes no provision for the user to recall an item from memory, and because no meaning or other mental “hook” is associated with the item (e.g. non-cognate words in a foreign language), this is the least efficient way to remember. Cued recall supplies a cue or reestablishes a context to aid in the recall of an episodic memory (Tulving, 1983, p. 159). Research has demonstrated the effectiveness of cues in the memory and recall of words in laboratory tests (Tulving & Osler, 1968). Recognition involves the presentation of something previously experienced and requires only that the subject recognize that the item has been previously encoded in memory, and it places the least cognitive load of the three on the user. Cognometric access mechanisms try to leverage this tendency to reduce the number of false negatives and to provide users with a pleasant authentication process. This study was concerned with the memometric challenge to users that the cued recall of the login prompt presents. In most authentication mechanisms, the user is presented only with a login prompt as a cue, and must recall and accurately input the precise password formulation. Although long, complex passwords place great cognitive load on authorized users, this study sought to utilize memories already present in the individual experience as hooks or cues to recall the event of password generation. In Tulving’s terms, it is the repeated Encoding, Recording, Ecphory (retrieval), and Conversion of this event that reinforces the robust password formulation within the memory of the user. This is the effective result of frequently used passwords in the

14

everyday world of IT security, and this study explored the feasibility of making the formulation and recall of the robust password into a purposeful event, in and of itself, in the user’s life, with the aim of improving subsequent password performance in terms of recall and accurate input. Tulving originally distinguished episodic memory in 1972 as preserving the knowledge of the spatio-temporal context of an individual’s experience (Tulving, 1972). For Tulving, episodic memory was the primary focus of memory research: it is “that aspect of the mind, or the brain, that makes the successful completion of individual acts of remembering possible” (Tulving, 1983, p. 135). Tulving developed the General Abstract Processing System (GAPS) as a framework to study of episodic memory (Tulving, 1972, p. 130), and his conception of GAPS is shown in Figure 1.1. In the GAPS framework, Tulving uses an “act of remembering” as the basic unit of conceptual analysis. This act begins with an original event that a rememberer perceives, encodes, recodes, and recollects. Based on findings from years of laboratory experiments, the GAPS model includes four observables: the original event, the interpolated event, the retrieval cue, and memory performance. From these observables, Tulving infers four processes, which he considers events: Encoding, Recoding, Ecphory (retrieval), and Conversion. From these events, he further extrapolates five internal cognitive states: (i) the Cognitive Environment, (ii) the Original Engram, (iii) the Recoded Engram, (iv) Ecphoric Information, and (v) Recollective Experience. Tulving argues that these thirteen elements are jointly necessary, and perhaps sufficient, to understand the phenomenon of remembering. The arrows connecting the thirteen elements of GAPS represent a relation that could be translated as “influences,” “effects,” or “brings about” (Tulving, 1983, p. 136). Taking issue both with Craik’s and Lockhart’s emphasis on processes, and with Gestalt theorists’ (e.g. Asch, Koffka, & Köhler) emphasis on hypothetical states, Tulving contends that Encoding, Recoding, and Ecphory are “instantaneous changes in state rather than activities enduring in time” (Tulving 1983, p. 139). Thus, he argues, they are better conceived as events than as processes. The GAPS focus on events makes it especially useful as a framework for the study of the events of password generation and retrieval. Because the GAPS model focuses on events of cue and recall, the investigator modified it for the purpose of this exploratory study, with the events of password generation, recall, and input. The GAPS framework’s observables, other than the Original Event, corresponded with the generation and recall of the password on cue. They provided actions that could be empirically observed and

15

Figure 1.1: Tulving’s GAPS – Elements of Episodic Memory and their Relations analyzed for performance in real world security situations. Its processes and states provided a preliminary framework to understand the internal events necessary for accurate recall. It also

16

Figure 1.2: Preliminary Password GAPS Model provided a temporal framework for the methodological treatment of subject groups for this study. This study conceptualized each of the events in the GAPS model as potentially meaningful personal episodes as a possible means for user to meet the password challenge and overcome the security culture estrangement that researchers have uncovered.

17

Secure memometric access mechanisms are unforgiving because they rely on inefficient uncued recall, and because the shared secrets – the username and password – must be input by the user 100% accurately. This is a common cause of frustration for users and other stakeholders when many passwords are mandated, when passwords are infrequently used, or when password- aging policies are enforced. Figure 1.2 illustrates the Password GAPS Model used in this study. In the formulation of a robust password, it is important to utilize an original event that is not discoverable by potential attackers who might use social engineering to crack it. Rather, this event ideally remains a distant, but unforgettable, episode in the user’s life. In it the Original Events and Engrams are ideally three in number, and the Interpolated Event is the password-generation stage in which study treatments introduce participants to password-generation schemes, practical password examples, and the password in paper hardcopy. Recoding becomes Password Generation and the Recoded Engram becomes the Password Engram, which is also an observable in the form the hardcopy. The Retrieval Cue becomes the Login Prompt and Memory Performance is observable as Password Input. Section 3.3.2 describes the relation between study methodology and the Password GAPS Model in greater detail, Section 7.2 evaluates its applicability to password research, and Section 8.6 suggests possible revisions to it for use in future research.

1.7 Research Questions and Methodology The purpose of this study was to contribute to the usability of human interaction with secure memometric authentication systems. The goal of the study was to discover ways to help users generate, input, and remember the cryptographically strong passwords necessary for such secure memometric authentication systems, to advocate the use of password-generation techniques, and to suggest strategies for user education in password management. The objectives of this study were to: 1. Identify effective password-generation schemes; 2. Contrast the relative strengths and weaknesses of identified schemes; 3. Measure the effects of examples and reentry during the password-generation stage on subsequent password recall and input; 4. Assess the long-term memorability of generated passwords; 5. Evaluate the relative effectiveness of each scheme; and

18

6. Propose future research into improving robust password management. In pursuit of these objectives the following four research questions drove the research in this study: 1. What level of cryptographic strength is necessary for contemporary secure passwords? 2. What available password-generation schemes best combine security and usability? 3. Can password examples and multiple reentries during the password-generation improve subsequent recall and input of robust passwords? 4. Does the modified GAPS framework used in this study contribute to the usability of long passwords? To answer these four research questions, five methods were used for data collection. First, cryptanalysis and expert testing determined an effective threshold of cryptographic strength for secure passwords to be used in modern memometric authentication systems, and all passwords generated and used in this study met or exceeded this threshold. Second, log analysis, think- aloud user testing, and a survey questionnaires tested the usefulness of all password-generation schemes used in this study. Third, log analysis and survey questionnaires tested the applicability of the GAPS conceptual framework for meeting the password challenge. An extensive survey identified candidate password-generation schemes in use in the field or recommended by security experts. To have been considered for evaluation in this study, these schemes had to provide both an assurance of password strength and an appeal to usability for the user. Expert testing operationalized password strength in terms of length (i.e. twenty or more characters), search space (i.e. at least one upper case letter, number, and symbol from the 94- character ASCII set), and entropy enhancement (e.g. semantic nonsense, word boundary obfuscation, etc.). The research design of the treatment stage of this study subjected four randomly selected groups from the study’s participants to four different instruments instructing them in password generation. Instruments introduced participants to five different password-generation schemes to compose passwords, but also allowed participants to use a scheme of their own. Instruments also facilitated this password-generation process with examples and encouragement. All participants were also allowed to printout or write the password as a backup to use in the case of forgetfulness. The two key differences between the treatments were: 1. the number of example passwords provided, and

19

2. the number password reentries used to reiterate the Recoding, Ecphory, and Conversion events central to the GAPS model. Instruments supplied two control groups with one example of passwords generated under each scheme, another group with five, and another with ten. They also required two groups to reenter their new passwords successfully only once, as is the common practice, to confirm its accuracy, and required two groups were required to reenter their new password successfully five times with the expectation of making password generation a memorable event in its own right. The combination of these two treatments required four subject groups, as listed in Table 3.3. This study required participants to login to a remote authentication server once per week for seven successive weeks. The purpose of this was to test participant recall and input of the password over time. The server logged each attempt to collect empirical data of the success rates of participant password input. These server logs provided a quantitative perspective of these two variables to compare with participant self-report data on password performance.

1.8 Impacts and Benefits This study had a number of impacts in the areas of information security, practical security culture, asset and trust loss prevention, IT staff resource appropriation, cognitive load on users, and memory theory. Usability and security are often seen as opposite ideals, so even incremental means of improving the usability of security are desirable. The benefits of networked computing are only sustainable in a secure environment, and robust, usable passwords remain a crucial element of IT security. The theoretical impacts of this study involved applying episodic memory theory to the generation of passwords to make them more usable in terms of generation, recall, and input. Recall and input failure on the part of the user are the main impediments to robust password usage. Tulving conceptualized all internal processes in an act of remembrance as events (Tulving, 1983, p. 139), and by treating password generation, recall, and input as events, this study explored GAPS’ applicability to the specific problem of password use. This study purposively tested the effects of reiterating the GAPS events of Recoding, Ecphory, and Conversion, as applied to the specific password events of prompting, recall, and input. The findings of this study impacted many practical aspects of information security. Studies have repeatedly revealed weaknesses in user-selected passwords and the policies that

20

allow them (Morris & Thompson, 1979; Spafford, 1992; Klein, 1991; Bishop & Klein, 1992). Other studies have shown the frustration that users experience if random passwords are imposed upon them (Saltzer & Schroeder, 1975; Adams & Sasse, 1999; Sasse et al., 2001; Bastroff & Sasse, 2003; Yan et al., 2004; Sasse & Flechais, 2005). If organizations can see improvements to the effective use of robust passwords by users, they may respond with greater allocation of resources to user security practice, education, and training. Furthermore, memorable and easy to use passwords can reduce the need, frequency, and cost of password resetting. The findings of this study had an impact on all stakeholders of online commerce and finance: consumers, vendors, and service providers. The popularity of the personal computer and widespread access to high speed internet has necessitated authentication practice beyond the physical confines of the organization and its IT staff. Sasse et al. argue that security must become “a visible and integral part of an organisation’s long-term goals and its daily activities” (Sasse et al., 2001, p. 130), but this is applicable to individual computing as well, and the personal computer user must be motivated and cognizant of the threats. The Federal Trade Commission 2005 report listed complaints about identity theft as 37% of the 686,683 complaints filed, and recommended strong passwords among the six primary means of deterrence (Federal Trade Commission, 2006). Reports of losses of sensitive personal data have a chilling effect on online commerce and finance, and although most of these losses are the result of poor security strategies on the part of vendors and data aggregators, weak passwords create a significant, but unnecessary, vulnerability. There were also practical impacts of this study for government and national security. The loss of personal information and identity through insecure computing habits is large and growing. The Federal Bureau of Investigation reports that 97% of stolen PCs are never recovered. A notable exception was the potentially catastrophic theft of a notebook computer from a Department of Veterans Affairs employee that resulted in the return of the computer, which contained sensitive details on more than 26 million U.S. military veterans (C|net News.com, 2006). Mandated robust password protection and hard drive encryption can greatly inhibit access to such sensitive data, even in extreme cases such as these, when the attacker has physical possession of the computer. Another serious vulnerability to government systems is the insider attack. As a salient example, four times in 2004, Joseph Colon, a government consultant working on the FBI’s

21

“Trilogy” computer system upgrade who became frustrated over bureaucratic obstacles, used open-source password hash cracking tools to breech the FBI’s classified computer system and access the passwords of 38,000 employees, including that of FBI Director Robert S. Mueller III. Using a password that an agent gave him, Colon accessed an encrypted database in March 2004, and later returned to the system three times to re-crack the password list. To assess the potential for sensitive information loss or misuse, the FBI temporarily shut down its network and committed thousands of man-hours and millions of dollars (Weiss, 2006). This is an example of social engineering combined with a sophisticated pre-computing attack that could be averted by long (i.e. 20+ character) passwords that are currently unfeasible to pre-compute.

1.9 Study Limitations and Assumptions Available resources limited this exploratory study. It tested the effectiveness of five widely published textual password-generation schemes directly, but other, more effective, schemes may have been in use clandestinely by security conscious organizations. As partial compensation, it allowed each participant to use an alternate scheme, along with a description of it and rationale for its use, with the aim of discovering new schemes with promise. The duration of the study imposed another limitation, since the very long-term memorability of passwords can only be measured over a very long time. However, the security practice of periodically demanding new passwords from users and password aging in organizational environments, and the reuse of privately used passwords by users both served to partially overcome this limitation. This exploratory study tested the effects of different types of assistance given to participants who had to generate, recall, and input passwords that some considered very long. A significant limitation of this study was the generalizability of its findings as discussed in Section 3.8. Another limitation was the relatively small sample sizes used, which limit the extent of statistical inference that can be drawn from the findings. This study did not directly address the “multiple password” problem, instead limiting itself to facilitating new password generation. Another limitation was the study’s exclusive focus on long password usage. No previous password study tested long passwords exclusively, and there is very little data on the ways that people generate and use long passwords (Zviran & Haga, 1993). This study of long password use was exploratory, as previous password studies that tested the effects of password selection advice

22

did not focus on long passwords or passphrases. Zviran and Haga tested the usability of passphrases against four other password-generation schemes, including self-generated, “associative,” “cognitive,” and system-generated schemes. They limited passwords to a maximum of twenty characters in their study, but suggested a maximum of eighty characters for passphrases, while actually providing a write-in space of forty characters for participants on their instrument (Zviran & Haga, 1993, p. 236). Yan et al. tested a passphrase-based group against a control group that self-selected passwords with eight characters including at least one number, and a group using 8-character random passwords. All the passwords used in their study were 8- character, and even the passphrase group reduced a phrase containing at least eight words to an 8-character acrostic (Yan et al., 2004, p. 28). This study assumed that users can generate, input, and recall long test-based passwords if adequate guidance and practice were provided, and the following methods explored the effects of guidance and practice as introduced by the study. This study used Tulving’s GAPS framework purposively. This cognitive psychological approach required neither the intrusive manipulation of the brain, nor the study of the abnormal. It did, however, assume that some mental processes could be explained without recourse, in strict terms, to material substances or events. This study assumed that, although security and usability are seemingly opposite trajectories in human-computer interaction, a change in security culture is possible in which users are more closely involved and trained in keyboard-based password management practices needed in secure networked environments. Second, it assumed that memometrics – at least in the form of long textual passphrases – would remain a viable component in future authentication mechanisms. Third, it assumed that effective and usable password-generation schemes will exploit prominent human memory characteristics that have been identified by memory research. Fourth, it assumed that the necessary increase in the length of textual passwords need not unduly increase the cognitive load on password users.

1.10 Chapter Summary As computers and computer networks become increasingly connected and vital to modern life, they become more vulnerable to emerging threats on insecure networks. The field of HCI has in many ways increased the usability of networked computers while hiding the underlying complexity of their operation and vulnerabilities caused by their interconnectedness.

23

This increase in usability and the widespread use of internet-connected computers has created unforeseen vulnerabilities. HCISec has arisen as a sub-discipline of HCI to cope with the problems that security countermeasures introduce. Attackers have long considered the user as the weakest link in network security, and have used a wide variety of social engineering and hacking techniques to exploit user-caused vulnerabilities. Passwords are the traditional shared secret used to authenticate authorized users to remote or secure computer systems, and are likely to remain important despite emerging alternatives such as biometrics, behaviorometrics, tokenometrics, cognometrics, etc. Weak passwords are the perhaps the greatest single vulnerability that users introduce into the endeavor of network security, and attackers have used increasingly sophisticated techniques to crack them. Users choose weak passwords because they consider security a low priority and because the requirements for a robust password impose an unwanted hurdle that impedes their primary work. There is a clear need to assist users in the generation of passwords that are strong, yet easy to remember and input. Memory research has investigated the large capacity of human memory and the phenomena involved in memory. Tulving suggested a framework to conceptualize the process of remembering as a series of events that showed promise for overcoming the problem of password generation, recall, and long-term remembrance. This study sought to contribute to the usability of human interaction with authentication systems that require strong passwords. To this end, it identified, operationalized, and evaluated five password-generation schemes that assist users in the process of composing unique passwords that are strong, easy to remember, and easy to reliably input. To evaluate these schemes, five methodologies collected data from four groups of participants selected to explore the effects of (i) the five schemes or alternative schemes preferred by participants, (ii) exposure to practical example passwords, and (iii) mandatory password reentry during the password generation stage on subsequent password performance. The specific methodology is detailed in Chapter 3 and was designed to provide multiple measures of these two effects as well as the effectiveness and user preference of the five purposefully selected password-generation schemes.

1.11 Structure of Subsequent Chapters This chapter serves as an introduction to the study and guide for the details of this research. Chapter 2 presents the literature review of the study, detailing the historical and current

24

issues in password research, and provides the context and justification for the present study. Chapter 3 details the specific methodological components and research design used in this project. Chapter 4 establishes a threshold of cryptographic strength for contemporary secure passwords, and reports the findings of cryptanalysis and expert testing of participant passwords. Chapter 5 tests the five prominent password-generation schemes in terms of the security and usability of the passwords that participants generated using them. Chapter 6 explores the effects of password examples and multiple input during the password generation stage on subsequent recall and input of robust passwords. Chapter 7 assesses the applicability of the Password GAPS conceptual framework for the study of robust password use. Chapter 8 summarizes all findings, assesses the achievement of study objectives, suggests future research, and makes practical recommendations for password usage.

25

CHAPTER 2 LITERATURE REVIEW

2.1 Introduction This study investigated the effectiveness of various types of assistance in the generation, recall, input, and long-term retention of robust passwords. To frame this investigation, this chapter reviews a wide range of previous literature surrounding password research, including: • the need and application of computer access control; • the role that authentication of authorized users plays in network security; • the nature of the threat environment; • the increasing cryptographic strength requirements of passwords; • hardware and operating system requirements and limitations; • human user requirements and limitations, including: o shared secret retention, o accurate input of brittle secret, o accurate recall despite infrequent use, o demands caused by multiple passwords, o demands caused by policies of password aging, and o methods used in HCISec user research; • computer security culture; and • password-generation schemes developed to help users. This chapter surveys previous research in these areas to frame the discourse, and to identify gaps in password practice, research, and knowledge. Because the passwords that study participants generated needed to exceed a minimum cryptographic threshold, this chapter includes a discussion of how those criteria have changed over the short history of networked computing. Because of the importance of passwords as the final result of the password-generation techniques assessed, and because they are the shared secret between the user and the computer system, a review of literature delineating their characteristics and describing their validation by the system is necessary. Because this study

26

explored the effects of password-generation schemes on user memory, this chapter traces the history of memory research by cognitive psychologists and the development of memory theory. Because this study sought to contribute to theoretical, practical, and methodological knowledge surrounding password research, this chapter includes a review of similar user-centered studies measuring password strength, usability, and memorability, and the methodologies used in those studies.

2.2 Authentication Without Passwords Prior to a survey of extant literature in password research, it is important to note that some security experts consider passwords to be obsolescent, obsolete, or even dangerous, and strongly advocate alternatives (Summers & Bosworth, 2004; Ives et al., 2004; Renaud, 2005). In 2002, security experts considered passwords to be the second most serious computing vulnerability (SANS Institute, 2002), and Schrage argues that, “password protection is pervasive, annoying, inconvenient, and does little to deter anyone intent on doing harm.” Deeming password authentication schemes “little more than security placebos,” he argues that, “they perversely inspire abuse, misuse, and criminal mischief by deliberately making users the weakest link in the security chain” (Schrage, 2005). The widespread recognition of the vulnerability of passwords and the state of password practice has inspired a great deal of research into other memometric mechanisms based on recall of sounds (Kung et al., 2004), or graphic images (Bastroff & Sasse, 2000; Jansen et al., 2003; Thorpe et al., 2005; Wiedenbeck et al., 2005). Other, non-memometric mechanisms include cognometrics, based on the thought process and recognition capabilities of the user (Monrose & Reiter, 2005; Renaud, 2005); biometrics, based on the physiology of the user, tokenometrics, based on tokens held by the user (rsasecurity.com, 2006; Verisign.com, 2006), behaviorometrics, (one form of which is also called keystroke biometrics), based on the behavior of the user (Monrose et al., 1999; Peacock et al., 2005), and locometrics, based on the location of the user (Bishop, 2005). All of these alternatives come with their own advantages and disadvantages, but ongoing research is underway to refine each of them (Coventry, 2005; Just, 2005; Monrose & Reiter, 2005; Peacock et al., 2005; Renaud, 2005). A common strategy used to overcome specific limitations of these mechanisms is to combine multiple authentication mechanisms together in multi-factor schemes. FIPS 201-1

27

specifies the latest U.S. federal Personal Identity Verification (PIV) system for use by federal employees and contractors (NIST, 2006). The PIV-II standard requires three factors: 1. a tamper-proof PIV card and readers, 2. biometric data and readers, and 3. a Personal Identification Number (PIN) and input mechanism as part of its highest security regimen to verify the identity of users seeking physical access to government facilities or electronic access to government information systems. In this system, the use of a short PIN instead of a strong password is possible because of the combined strength of the other factors. The National Security Agency (NSA) also mandates multifactor authentication for high security applications. Although much research into alternatives to passwords is ongoing to circumvent the usability and security issues of passwords, no single alternative has been as widely deployed as passwords. Long, yet usable, passwords maintain clear advantages over alternatives.

2.3 Technical Methods of Assisting Users with Passwords Despite their detractors, passwords remain the most commonly used authentication method for networked computer systems. Deterministic computer systems are well suited to reliably exchange robust cryptographic keys and messages, but the process of positively authenticating human beings using memometric secrets remains problematic, primarily because of human indifference to information security, lack of education in the matter, and memory limitations. This study assumes that the bulk of the password challenge lies in the prevailing security culture surrounding their use, and investigates means to assist users with robust password management. Nevertheless, effective HCI designs typically include both human-factor and technical components, and it is important to note the progress in technological methods of assisting users with passwords, specifically with the additional requirements of long passwords. Authentication policies and mechanisms typically lock account after a predetermined number of failures, in the attempt of keeping out unauthorized imposters. A simple method of increasing the usability of passwords is to increase the number of failed attempts allowed by the authentication mechanism (Brostoff & Sasse, 2003). Although typical security policies allow only three failed attempts before locking the user out, there is very little vulnerability introduced

28

and a great usability gain achieved simply by increasing this number by a factor of tenfold or more. Studies have found that multiple passwords significantly increase cognitive load for the user (Adams & Sasse, 1999; Sasse et al., 2001). A common solution to the problem of multiple passwords is the password “key ring,” or “wallet,” either as an encrypted spreadsheet or portable storage device with a simple database. This solution still requires a master password, however, and creates a single point of failure if adequate backup regimes are not used. As longer passwords are required by authentication mechanisms and the security policies that drive them, the probability of user input error increases, even when the user otherwise correctly recalls the exact formulation of the password. To counter the threats of “shoulder surfing” by malicious persons on site, and of screen capture software that can clandestinely send screenshots to remote attackers, authentication mechanisms have long obscured, even to the user herself, the input of the password on the monitor. Command-line terminals typically suppress the echoing of password characters to the standard output on the monitor, while GUI access control mechanisms typically reveal only asterisk characters – “marching dots” – in lieu of the actual password characters. While this technique reduces some vulnerability to malicious voyeurs, it increases the possibility of input error and adds to the cognitive load of authorized users. This load increases significantly as the user is required to enter ever-longer passwords containing difficult to type characters. Tognazzini and Blaser developed a passphrase entry system that replaces the marching dots with a “rolling blackout,” that reveals the last few entered characters in a low contrast font for a few seconds. This adjustable rendering of the password on the fly is ideally long enough to help the user notice input errors as they occur, but too short to leak significant information to a voyeur (Tognazinni, 2005, p. 34). This customizable temporary display of input on the fly is an excellent example of using technology to assist users in security, and systems that incorporate such functionality could greatly increase the reliability of long password input. Using technology and computing power to assist users, designers, administrators, and other stakeholders with HCI challenges is a central thrust in HCI research, and shows promise for helping users with difficult HCISec challenges. The following sections trace the increasing demands that the threat environment has placed on users of memometric authentication mechanisms.

29

2.4 A Brief History of Computer Passwords and Attacks on Them Passwords are important to information security because they typically provide the first, and often only, line of defense in the access control of most computing systems. In terms of length, they can range from zero (or null) to very long passphrases containing hundreds of characters. In terms of complexity, they can include characters drawn from expanded sets such as numbers, non-alphanumeric characters, and even the 65K character Unicode set. In the early days of computing, when machines were application- and location-specific, and generally not interconnected, there was little need for anything beyond physical access control. With the increased use of multi-user, business-oriented, and networked computers, and with the advent of portable operating systems such as UNIX, access control became a necessary component of computer security. Metcalf recognized the weakness of ARPANET passwords very early in the history of networked computing (Metcalfe, 1973), and the problem has only grown with the increased sophistication and resourcefulness of the attackers, the massive interconnectedness of the internet, the importance of networked resources, and the value of the content. The first widely used, interconnected, multi-user computer operating system was UNIX. UNIX systems initially wrote the plaintext passwords of all users to a single file (viz. /etc/passwd), and this simple technique soon proved inadequate for many reasons. First, all privileged users could view and copy the contents of this file. Second, the file was corruptible when being edited. Third, all system backups containing this file became potential sources of information leakage. Fourth, because it required that essential user information be distributed among several files, system updates were problematic. In response to the vulnerabilities of plaintext passwords, system designers applied cryptographic techniques to maintain the confidentiality of the UNIX password file. The first UNIX password encryption algorithm emulated the World War II-era U.S. Army M-209 cipher machine in software (Morris & Thompson, 1979, p. 594). This fast algorithm allowed even an early PDP-11/70 computer to crack 800 passwords per second, and its speed was soon considered to be a security threat. In 1976, to increase cracking time, UNIX incorporated the cyrpt() program, which uses the Data Encryption Standard (DES) algorithm, an NSA modification of the IBM “Lucifer” algorithm, to perform a one-way cryptographic “hash” function (that is nearly impossible to reverse) to transform the plaintext password into an

30

obfuscated, yet thoroughly deterministic, ciphertext. To do this, crypt() applies the DES algorithm 25 times to a constant (typically a string of zeros), using the user-supplied password as the cryptographic key, and mixing in a random 2-character “salt,” to generate an 11-character hash string used by the system to authenticate the authorized user. This scheme proved relatively difficult to decrypt despite its relatively short (by today’s standards) 56-bit key. It was so slow that even the much faster µVAX-II computer could check only 3.6 passwords per second (Morris & Thompson, 1979, p. 595). Using this algorithm on this computer took 19 hours to check a 250,000-word dictionary, and, because of the appended salt component, checking the passwords on a 50-user system took on average 40 days (Morris & Thompson, 1979, p. 596). Computational power increased dramatically by 1991, when hardware implementations of the DES algorithm enabled decryption times of 6µs (Leung, 1991), and cracking times of 1.5 seconds using a 250,000-word dictionary. To stop such “dictionary attacks,” UNIX vendors, beginning with Sun, wrote the encrypted passwords from the traditional /etc/passwd file into a special “shadow password” (viz. /etc/shadow) file that is readable only (i.e. not writeable) by the root super-user. UNIX incorporated password security features in an ad hoc and reactionary manner as engineers responded to security threats that were unforeseen by the original designers. Morris and Thompson called them a “collection of programs whose elaborate and strange design is the outgrowth of many years of experience with earlier versions” (Morris & Thompson, 1979, p. 594). Despite these systemic measures to harden systems from attacks, users, when left to their own devices, introduced vulnerabilities by using weak passwords. In response, UNIX systems began adding proactive password checkers to analyze a suggested password in plaintext upon enrollment, and assess its strength against the security policy of the organization and basic parameters of password security (e.g. length, upper/lower case requirements, non-alphanumeric character requirements, etc). Using pattern-matching techniques, password checkers also look for characteristics of weak passwords such as “license plate” formats, simple words or word combinations in common languages, keyboard-derived patterns such as “qwertyu” or “zzzzzzz,” etc.

31

2.5 Attacks on Passwords As system designers and administrators deployed memometric access control in response to growing threats, crackers devised a variety of attacks upon passwords and the mechanisms that process them. These attacks fit into three major groups: social engineering attacks, technical subterfuges, and guessing attacks. Social engineering attacks bypass security measures by using psychological manipulation or other non-technical means to detect leaked information and include dumpster diving, shoulder surfing, burglary, extortion, blackmail, and phishing. Subterfuge attacks include TCP/IP packet sniffing, passing-the-hash, rootkits, pharming, keylogging, screenscraping, wiretapping, login spoofing, timing attacks, acoustic cryptanalysis, identity management system attacks, and other means of compromising host security. Guessing attacks include educated guesses, dictionary attacks, brute force guessing, and pre-computing. It is important to note that no authentication mechanism, memometric or otherwise, is safe against the first two groups of attacks, which bypass the security of the mechanism itself. The vast majority of password research, including this study, is concerned with foiling guessing attacks. Smart guesses are only successful against default, short, or obvious passwords chosen by the user. Dictionary attacks attempt a probabilistic match with common words and simple permutations. Brute force attacks extend the dictionary technique to include all possible character permutations. Pre-computing attacks use readily available lookup, or rainbow tables that are pre-compiled and indexed to reduce the problem to simple hash pattern matching. The following sections describe these guessing attacks in greater detail.

2.5.1 Dictionary and Brute force Attacks. Dictionary and brute force attacks are among the most common guessing attacks on passwords. Dictionary attacks attempt to discover passwords in real time by hashing words drawn from a dictionary and comparing the result with the hashed passwords of users found on the authentication server. This is a relatively simple probabilistic technique that is effective only when users choose weak passwords. Brute force attacks attempt all permutations of the characters in the search space, also in real time, and are also feasible only with short passwords. Because password security has historically arisen as a reaction to unforeseen security threats, early passwords were frequently very weak, and early systems naïvely allowed them. The majority of previous password research focused on the strength of the password against dictionary and brute force attacks.

32

In 1979, Morris and Thompson conducted experiments to determine the strength of passwords when users were given no guidance by the system or security policy. Table 2.1 shows the computation times of passwords up to six characters, and the effects of greater search space on attack times.

Table 2.1: Computation times with a PDP-11/70 Search 26 36 62 95 128 Space lower-case lower-case all-case printable ASCII letters letters + digits letters + digits ASCII characters Password characters Length 1 .030 s .040 s .080 s .120 s .160 s 2 .800 s 2 s 5 s 11 s 20 s 3 22 s 58 s 300 s 100 s 2640 s 4 600 s 2,100 s 18,000 s 100,800 s 334,800 s 5 14,400 s 75,600 s 1,144,800 s 9,676,800 s 43,200,000 s 6 385,200 s 2,736,00 s 69,379,200 s 914,544,000 s 5,487,264,000 s (Source: Morris & Thompson, 1979, p. 596)

It is clear from this that doubling password length provides more security than doubling the search space. For example, a 3-character password from a search space of 36 can be cracked in 58 seconds, a 3-character password from a search space to 62 can be cracked in 300 seconds, while a 6-character password can only be cracked in 2,736,000 seconds, even when drawn from a search space of 36. Morris and Thompson found many weak examples among the 3,289 passwords they tested, and 86% fell into the following categories: • 15 were a single ASCII character • 72 were strings of two ASCII characters • 464 were strings of three ASCII characters • 477 were strings of four alphanumeric characters • 706 were five letters, all upper-case or all lower-case • 605 were six letters, all lower-case • 492 could be found in dictionaries, name lists, and the like.

Because of the extreme weakness of the passwords they studied, Morris and Thompson were able to find one third of them with a five-minute dictionary search, even with a PDP-11/70 computer (Morris & Thompson, 1979, p. 596). Clearly, either through indifference or ignorance,

33

the users were not participating the overall security of the system, and the system allowed weak passwords. Spafford recognized early on that, “security is tied to the level of sophistication of each user to make appropriate choices” (Spafford, 1992, p. 2), during a study of 19,100 passwords collected at Purdue University. He found that 5,309 of the passwords were duplicates and that only 20% of those tested were not detectable by a dictionary attack using a 30MB list of common words drawn from eleven languages, proper names, an atlas, and slang terms (Spafford, 1992, p. 4). Although the trend has been towards system policies that require stronger passwords, the link between user sophistication and overall system security remains. Table 2.2 shows the raw cracking times at 100,000 encryption operations per second with password lengths from 3 to 12. The numbers in the top row indicate the number of characters in the alphabet, or search space. 26 is the number of lower-case letters, 36 is the number of letters plus digits, 52 is the number of mixed-case letters, 68 is the number of single-case letters with digits, symbols, and punctuation, and 94 is the American keyboard enabled ASCII character set. The numbers in the fist column are the password lengths in characters. The times shown are the times to process the entire set of passwords, but the practical average time to brute force passwords is one half the listed times. On average, a 12-character password drawn from the 94- character ASCII set cannot be broken for 75 billion years at 100,000 operations per second. Thus, a user willing to deploy such a 12-character password is essentially immune to a brute force attack at 100,000 hits per second. On an American English keyboard, alphanumeric passwords can be drawn from 94 ASCII characters. This means that a 7-character password has 947 possibilities, a 14-character password has 9414 possibilities, and so on. Thus, an arithmetic increase in password length yields an exponential gain in security.

34

Table 2.2: The Effect of Search Space on Cracking Time Alphabet Size 26 36 52 68 94

Password Length 3 .18s .47s 1.41s 3.14s 8.3s 4 4.6s 16.8s 73.2s 214s 780s 5 112s 606s 3,820s 3,740s 73,400s 6 3,090s 21,800s 13.7 d 2.24 mo 2.63 mo 7 22.3 h 9.07 d 3.91 mo 2.13 y 20.6y 8 24.2 d 10.7 mo 17.0 y 145 y 1,930y 9 1.72 y 32.2 y 882 y 9,860y 182,000y 10 44.8 y 1,160 y 45,800 y 670,000 y 17,079,000 y 11 11.6 c 41,700 y 2,384,000 y 45,582,000y 1,605,461,000y 12 30,300y 1,503,000y 123,946,000y 3,099,562,000y 150,913,342,000y (Source: GoedSoft, 2005)

Table 2.3: Attack Times versus Password Length Length Combinations Class A Class B Class C Class D Class E Class F 2 9,216 Instant Instant Instant Instant Instant Instant 3 884,736 88.5s 9s Instant Instant Instant Instant 4 85,000,000 2.25h 14m 1.5m 8.5s Instant Instant 5 8,000,000,000 9.5d 22.5h 2.25h 13.5m 1.25m 8s 6 782,000,000,000 2.5y 90d 9d 22h 2 h 13m 7 75,000,000,000,000 238y 24y 2.5y 87d 8.5d 20h 8 7,200,000,000,000,000 22,875y 2,287y 229y 23y 2.25y 83.5d (Source: Lucas, 2006)

The computing power available to attackers also grows exponentially, roughly according to “Moore’s Law,” a persistent observation that that the complexity of similarly priced integrated circuits doubles every 18 months. Remarkably, this tendency has held true for computing resources in general, and it is central to emerging threats to passwords. Tables 2.3 and 2.4 show the result of the following five classes of attack: A. 10,000 Passwords/sec. (Pentium 100) B. 100,000 Passwords/sec. C. 1,000,000 Passwords/sec. D. 10,000,000 Passwords/sec. E. 100,000,000 Passwords/sec. F. 1,000,000,000 Passwords/sec. (Lucas, 2006).

35

Table 2.3 shows average cracking times of sample passwords according to these five classes of attack, and illustrates the exponential effect of password length with the arithmetic effect of these classes. The advantage lies with the defender because a small increase in password length requires an exponential increase in attacking power to compensate, but the exponential growth in computing power consistently raises the threat level.

Table 2.4: Attack Times on Representative Passwords Class Password Combinations Class B Class C Class D Class E Class F A darren 308,900,000 8.5h 51.5m 5m 30s 3s Instant Land3rz 3,500,000,000,000 11y 1y 41d 4d 10h 58m B33r&Mug 7,200,000,000,000,000 22,875y 2,287y 229y 23y 2.25y 83.5d (Source: Lucas, 2006)

Table 2.4 shows examples of the benefits of both password length and complexity. The fact that the formulation “B33r&Mug,” which is relatively easy to remember, can keep a supercomputer busy for well over a month demonstrates the present impracticality of brute force attacks on long passwords. The combination of modest length and complexity thus thwarts dictionary and brute force attacks by all but the most resourceful attacker.

2.5.2 Pre-Computing Attacks. The preceding discussion described simplistic brute force and other online attacks, in which the cracker attempts to match an encrypted and/or hashed password by trying every possible permutation of characters drawn from the search space in real time. This is a very inefficient method of attack because it requires computation on the fly, and designers of hash algorithms have deliberately slowed the necessary computation time since the introduction of DES in 1976. To better utilize available computing power and reduce required connection time, crackers have devised more efficient attacks by pre-computing all possible password permutations, indexing them into lookup tables, and leaving only a simple comparison to be performed on the fly. Thus, “the bad guys are really attacking your keyboard” (Schrage, 2005), and no longer need to mount probabilistic attacks using words likely to be found in dictionaries.

36

An increasingly popular method of pre-computing attack uses rainbow tables. These are readily available look-up tables containing the pre-computed results of every possible password, indexed by permutation. 0phtcrack and RainbowCrack are two salient examples of programs that utilize rainbow tables. Using this method, the cracker must simply find a match of the resultant hash and look up the password in the table. Rainbow table attacks are particularly efficient against weak systems, such as Microsoft’s legacy LM scheme, and others that do not salt the raw password before hashing. Systems that salt the password complicate current rainbow table attacks. As powerful as rainbow tables are, they must be compiled for each hashing algorithm, and require massive amounts of storage to index all combination of passwords to their resultant hashes. For example, the tables able to crack even the weak 40-bit keys used by old versions of Microsoft Office and Adobe Acrobat require 2.7 TB of storage, and tables able to crack the LM hashes require much more (AccessData.com, 2007). To reduce the massive storage requirements of rainbow tables, sophisticated crackers use Markov chain algorithms that compute portions of a hash on the fly, without the need to write all possible combinations to disk. This technique trades some of the speed that pattern-matching pre-computed rainbow tables offer to drastically reduce storage requirements.

2.6 Systemic Password Vulnerabilities The weak passwords chosen by computer users have multiple causes. First, there is a general unawareness among users of the vulnerabilities that weak passwords introduce. Second, users often consider security a nuisance that keeps them from doing productive work, and participate only minimally. Third, the hardware and software of the systems has created vulnerabilities. Table 2.5 shows the password length and cryptographic strength limitations that popular operating systems introduce. Systems have continued the ad hoc addition of security over the years, and designers have been steadily adding security features to operating systems in response to growing threats enabled by greater interconnectedness. Backward-compatible protocols, such as Microsoft’s LANManager perpetuate significant vulnerabilities, and other early design decisions resulted in significant security limitations. For example, UNIX syntax gives white space (viz. spaces and tabs) semantic value, and early systems disallowed passphrases containing spaces. Early Windows and Macintosh systems ran as “root” or “Administrator” out of the box, often with no password protection at all. As multi-user support

37

Table 2.5: Password Restrictions of Common Operating Systems System Restrictions Detail

Windows XP Maximum 127 Unless the weak LANManager scheme is characters enabled. Windows 2000 Windows NT Maximum 14 LANManager scheme converts all Windows 98 characters passwords to uppercase and is weak. OS X 10.3 uses a stronger MD5 encryption algorithm with a 256-character limitation. Maximum 256 Mac OS X 10.3+ The user can choose between DES and characters MD5. Apple has added a 12-bit salt to the pre-hashed password since OS 10.3. OS X 10.2 and prior use the old DES Mac OS X 10.2 Maximum 8 encryption algorithm with 8-character limit. and prior characters Longer passwords are truncated. Most UNIX systems use the old DES DES: maximum 8 encryption algorithm with the 8-character characters limitation. Some systems have custom UNIX enhancements that allow for longer MD5: Maximum 256 passwords. New versions use an MD5 characters algorithm with a 256-character limitation. Varies widely by distribution. Older Linux systems use an old DES DES: 8-256 encryption algorithm with a 8-character characters limitation. New Linux distributions use the Linux newer MD5 algorithm that has a 256- MD5: Maximum 256 character limitation. Varies by distribution, characters but the user can choose between DES and MD5. and internet connectivity became prominent, stricter access control arose. Generally speaking, however, modern operating systems now allow for greater password security than users typically deploy.

2.7 Proactive Password Selection One systemic means of enforcing strong passwords is to proactively check the passwords that users submit upon enrollment into an authentication mechanism. The UNIX adoption of crypt() initiated this technique, in which users enter candidate passwords and a parsing program

38

analyzes the plaintext candidate password according to the parameters of security policy in place (Bishop, 1991, p. 169). Spafford’s OPUS system was the first proactive password checker to use Bloom filters to increase lookup speed (Spafford, 1992b, p. 99). Bloom filters provide a mechanism to compact very large dictionaries into a representational dictionary of integers derived from hashes of the original dictionary entries. Davies and Ganesan incorporated trigrams and a Markov model to achieve even greater speed and compression in their program Bapasswd. Bapasswd extracts information from the Markov trigrams, normalizes the results, and then rejects passwords that can be generated from the Markov model (Davies & Ganesan, 1993, p. 11). Bergadano et al. achieved greater checking speeds by deploying decision trees and high dictionary compression in their ProCheck program (Bergadano et al., 1997). A properly configured proactive password checker can prevent the use of weak passwords submitted by users (Yan, 2001) and systematically enforce organizational password policy. Klein recommends configuring the checker to search for the following password permutations upon enrollment: • Passwords based on the user’s account name; • Passwords based on the user’s initials or given name; • Passwords that exactly match a word in a dictionary (not limited to /usr/dict/words); • Passwords that match a word in the dictionary with some or all letters capitalized; • Passwords that match a reversed word in the dictionary; • Passwords that match a reversed word in the dictionary with some or all letters capitalized; • Passwords that match a word in a dictionary with an arbitrary letter turned into a control character; • Passwords that match a dictionary word with the numbers ‘0’, ‘1’, ‘2’, and ‘5’ substituted for the letters ‘o’, ’l’, ‘z’, ‘s’, etc.; • Passwords that are simple conjugations of a dictionary word (e.g. plurals, adding ‘‘ing’’ or ‘‘ed’’ to the end of the word, etc.); • Passwords that are patterns from the keyboard (e.g. “aaaaaa’’ or ‘‘qwerty’’); • Passwords that are shorter than a specific length (e.g. nothing shorter than six characters); • Passwords that consist solely of numeric characters (e.g. Social Security numbers, telephone numbers, house addresses or office numbers); • Passwords that do not contain mixed upper and lower case, or mixed letters and numbers, or mixed letters and punctuation; and • Passwords that mimic a state-issued license plate number (Klein, 1991, p. 9).

In addition to those on this list, Klein and Bishop also include passwords that:

39

• Match a word in a dictionary with an arbitrary letter turned into a control character; • Are based on repetitions of the user’s account name; • Are based on repetitions of the user’s initials or given name; • Are common misspellings of dictionary words; and • Are based on other local host names (Bishop & Klein, 1992, p. 7).

Most modern operating systems use some form of proactive password checking. Apple’s Mac OSX includes a particularly refined password checker that can suggest many alternative passwords to users. UNIX-based systems (including GNU/Linux, BSD, and System V, with appropriate extensions) now deploy pwcheck, which is a part of the passwd+ password-changing program and which can implement the above policies. Upon password enrollment, it determines the strength of the password according to its string matching capabilities against an internal “little language.” If the candidate password fails the test, it tells the user why the password is unacceptable. In addition to standard password checkers, dual-use programs are available for both attackers and security administrators. Muffet’s popular password-cracking tool crack can be used for brute force and dictionary-based cracking, and his proactive password checking library cracklib can detect weak passwords upon enrollment (Muffet, 2005). Although effective password checkers prevent weak passwords before deployment, there is a common usability problem with their use. When users submit their candidate passwords to the scrutiny of a proactive password checker, they frequently meet with rejection for any of the reasons listed above. Although good checkers will specify the specific problem with the candidate and suggest modifications to strengthen it, problems and frustration can arise for users whose candidates are repeatedly rejected, and the final password to emerge from this iterative process may be esthetically unpleasant to the user, difficult to type, or difficult to remember, and such frustration frequently leads to counterproductive security behavior by users (Adams & Sasse, 1999; Sasse et al., 2001).

2.8 Password Entropy Previous security research has investigated techniques to leverage user strengths with computer strengths. In terms of passwords, usable and memorable passwords are weak, while strong passwords are difficult to use because they must be complex in terms of search space and length. Much of cryptanalysis involves attacking the user's password or phrase because the user's choice of password typically represents a much simpler cryptographic key than optimal for the

40

encryption algorithm being used, which provides a potential cryptanalytic advantage for the cracker. Multiple character elements, such as letters, numbers, and symbols increase strength, especially when non-grammatical and nonsensical sequences are used, while easy to remember chunks, such as words or phrases can decrease strength. Johansson demonstrates simple techniques, such as misspelling, that dramatically increase the strength of passphrases (Johansson, 2006). Cox advocates using “shocking nonsense” within a phrase or sentence that is memorable, nonsensical, and shocking in the culture of the user as a means for increasing strength (Cox, 1998). Since current pre-computing attacks can easily crack all permutations of short passwords drawn from the entire keyboard search space, Burnett argues that secure passwords should now contain a minimum of twenty characters (Burnett, 2006, p. 121-4), continuing the practice of long passwords, or passphrases, that Porter proposed (Porter, 1981), and Pfleeger advocated (Pfleeger, 1989). Password strength arises from two factors: alphabet size and length, and together, these factors determine the entropy, or probability distribution of the password. Tables 2.2, 2.3, and 2.4 directly show the effects of alphabet size and password length on attack times, but the topic of entropy is more complex. Entropy is a measure of disorder in a system, and in information systems it can be viewed as a measure of the lack of information in a sequence. Shannon demonstrated that the entropy of a discrete random variable x, over the set of n is:

or, that the entropy of an event x is the sum, over all possible outcomes i of x, of the product of the probability of outcome i and the log of the inverse of the probability of I (Shannon, 1948). The choice of the logarithmic base of two is arbitrary, but it yields entropy in term of bits, or binary digits, and this is useful when working with digital cryptosystems. The entropy of each random character in a digital text string is the base-2 logarithm of the number of possibilities, and the entropy of the entire string is the product of the number of characters and the entropy per character. Randomness in terms of the character set and sequence of characters determines the entropy in a password, and information entropy is a direct measure of password strength. The usability of the passphrases involved in this study depends on ease of typing and remembrance, and these follow from the use of natural language words arranged in meaningful

41

phrases. For the purpose of this study, the primary natural language discussed is English, but all languages share similar patterns. English letters are mathematically distributed in a unilateral frequency distribution. For example, in normal written English, the percentage of each respective letter is approximately:

Table 2. 6 : Frequ e ncy of E nglish L e tte r s A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 7.3 .9 3.0 4.4 13.0 2.8 1.6 3.5 7.4 .2 .3 3.5 2.5 7.8 7.4 2.7 .3 7.7 6.3 9.3 2.7 1.3 1.6 .5 1.9 .1 (Source: Friedman & Callimahos, 1985)

It is clear to see that the vowels E, I, O, A, and U stand out as the most frequently occurring letters, and that the consonants T, N, R, D, and L follow. K, Q, X, J, and Z are very infrequently occurring letters. The six vowels comprise 40% of English text, the five high frequency consonants (D, N, R, S, and T) comprise 35%, the ten medium frequency consonants (B, C, F, G, H, L, M, P, V, and W) comprise a mere 24%, and the five low frequency consonants (J, K, Q, X, and Z) make up only 1% (Friedman & Callimahos, 1985). The distribution for any particular text, such as a particular password, Beowulf, or the email communication of an enemy, however, can be quite different. If a password were encrypted with a simple, classical cryptosystem relying on mono-alphabetic substitution (e.g. the shift cipher or the substitution cipher) then it will have similar frequencies and Friedman’s Index of Coincidence (IC) can be calculated to statistically analyze the ciphertext. The formula for the index of coincidence is:

where N is the length of the text and fi is the frequency of each the letters (Friedman, 1925). If all n letters of the alphabet are randomly distributed, the index is 1/N. In the case of English’s 26 letters (ignoring case), that would be 1/26, or 0.03846, but actual written English has an IC of around 0.066. The lower the IC, the higher the entropy, and the higher the entropy, the stronger the password. The numbers in normal written English also occur in a unilateral frequency distribution approximately as follows:

42

Table 2.7: Frequency of N umb ers in W ri tten En glis h 0 1 2 3 4 5 6 7 8 9 10% 21% 13% 9%8% 8% 8% 8% 7% 9% (Source: Friedman & Callimahos, 1985)

Clearly, the number 1 stands out, with the number 2 as another outlier. In addition to single letters and numbers, other patterns emerge within natural language usage. For example, the most frequent English digraphs according to their frequency per 200 letters are: TH-50, ER-40, ON- 39, AN-38, RE-36, HE-33, IN-31, ED-30, ND-30, HA-26, AT-25, EN-25, ES-25, OF-25, OR- 25, NT-24, EA-22, TI-22, TO-22, IT-20, ST-20, IO-18, LE-18, IS-17, OU-17, AR-16, AS-16, DE-16, RT-16, and VE-16. The most frequent English trigraphs according to their frequency per 200 letters are: THE-89, AND-54, THA-47, ENT-39, ION-36, TIO-33, FOR-33, NDE-31, HAS- 28, ACE-27, EDT-27, TIS-25, OFT-23, STH-21, and MEN-20 (Friedman & Callimahos, 1985). Distribution patterns emerge at word boundaries in natural language. For example, the frequency of initial and final letters in written English are as follows:

Table 2.8: Frequencies of English letters at Word Boundaries

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Letter

9 6 6 5 2 4 2 3 3 1 1 2 4 2 10 2 - 4 5 17 2 - 7 - 3 - Initial

1 - 10 1 7 6 4 2 - - 161941- 8 911 1 - 1 - 8- Final

(Source: Friedman & Callimahos, 1985)

If the attacker can determine the word boundaries, these characteristics will leak information about the text. The English language is almost 75% redundant (Stinson, 2002, p. 64), and a password made of entirely of English words has only 25% of the entropy of a random sequence. The predictability of simple English is further illustrated by the fact that there are only 17,000 eight- letter words, of which only 500 are in common use, although there are 208,827,064,576 possible combinations of eight characters (Burnett, 2006, p. 30). This lack of randomness is even more

43

apparent in actual user selection of passwords in use, but good passwords can partially compensate for this lack of randomness by incorporating numbers, non-alphanumeric characters, and unexpected syntax. Another means of entropy enhancement is permutation, which represents the number of combinations possible when order is significant and an object can be chosen more than once. The number of possibilities in such a case is:

where n is the total number of objects to choose from and r is the number to be chosen (Knuth, 1997). Thus, on keyboards with 95 available characters, the number of 8-character passwords is (95 + 8 − 1)! / 8! * (95 − 1)! = 102! / 8! * 94! = 2.25517 x 1011 possibilities. A completely random pattern of 8 characters drawn from the 95-character printable ASCII set will normally contain over 52 bits of entropy, while a typical eight character password of single case letters and digits contains 41 bits. The entropy within any text is always additive, so within a particular search space, extra entropy requires extra keystrokes, and, all other factors being equal, longer passwords contain more entropy than short ones. Security experts and administrators have long advocated using the largest alphabet available to increase the entropy and difficulty of attack. In the case of an American English keyboard, the practical limit is the 94-character ASCII set, but it is also possible to include Unicode characters with modern authentication mechanisms, typically by typing a 4-digit number while holding the Alt key. Table 2.9 shows the effect of search space on password entropy. The lower case English alphabet has 26 letters and entropy of 4.7 bits per letter. The 62 possibilities of upper and lower case English letters combined with the 10 numerical digits yields an entropy of 5.95 bits per character, and the addition of the symbols drawn from the keyboard’s 94-character ASCII set yields 6.55 bits per character.

44

Table 2.9: The Effect of Search Space on Entropy Search Space N Entropy Digits only (0-9) 10 3.32 bits/symbol Lower case letters (a-z) 26 4.70 bits/symbol Lower case letters and digits (a-z, 0-9) 36 5.17 bits/symbol Mixed case letters and digits (a-z, A-Z, 0-9) 62 5.95 bits/symbol All standard U.S. keyboard characters 94 6.55 bits/symbol Non-ANSI Unicode characters 700 9.50 bits/symbol (Source: Diceware, 2006)

Researchers have determined the entropy for the 700 non-ANSI Unicode characters at to be 9.5 bits per symbol, and estimate that each subsequent doubling of the search space increases the entropy per symbol by one bit (Neosmart, 2006). Entropy provides a direct measure of password strength whenever natural language is contained in the text. It is not a fixed physical quantity, and the cryptanalyst can exploit cultural biases and contexts, byte frequencies, digraphs, word boundary patterns, and whole-word correlations to reduce the search space, and such intelligent pattern recognition provides a likely future attack vector. Because password entropy is additive with length, this vulnerability is especially significant with short passwords, and this is the reason that Porter initially proposed the long passphrase as a countermeasure in 1981 (Porter, 1981). Pfleeger mathematically demonstrated the difficulty of attacking the one trillion possible permutations of a 30-character passphrase (Pfleeger, 1989). The noted cryptographer and author of the open source cryptosystem PGP, Zimmerman has strongly advocated the use of long passphrases for remote authentication since 1995 (Zimmerman, 1995). Burnett argues for a minimum length of 20 complex characters (Burnett, 2006, p. 41), and Johansson advocates adding substitutions and misspellings to all passphrases to significantly increase entropy (Johansson, 2006). Clearly, the future effectiveness of the password lies in increased entropy, which is primarily a function of its length.

2.9 The Study of Memory This study worked from the assumption that the heart of the keyboard-based password problem lies in user behavior, and that security culture influences user behavior. It also assumed that experimental memory research and practical security experience in the field would reach

45

similar conclusions about strong and usable memorable passwords. This section traces the development of memory research using cognitive psychological methods, culminating in Tulving’s GAPS framework. The cognitive psychological study of memory initially focused on perception, and the uses that the mind makes of perceptions, to conceptualize the higher-order mental processes behind observed behavior. The cognitive approach assumes that “observed patterns of behaviour, together with private subjective experiences, depend on unobservable ‘mental’ events involving mental mechanisms and processes” (Gregg, 1986, p. 2). This cognitive model of memory is the result of a long history of thought on the action of memory and memory models built from self- observation and from observation of memory experimentation. The earliest recorded philosopher of memory, Plato, used the word aletheia for “truth” and for “no forgetting” (Cott, 2005, p. 65), and feared that writing would lead to forgetfulness (lethe). He argued in the dialogue Theaetetus that sensory images leave impressions on the mind just as a signet ring makes an impression on wax. This wax tablet model of memory compares the phenomenon of forgetting to the effacement of a wax surface over time. It allows for the variance in individual memory capacity by allowing for tablets of various sizes and for the persistence of memory by allowing for various consistencies in the wax. Gratoroli, Harris, Locke, Watson, and James adopted the wax table model over the centuries, but even Plato recognized that it contains no mechanism for recall other than recognition of similarity to previous impressions. To deal with this recall issue, Plato suggested an alternative “aviary” model, in which the learner captures ideas (birds) in memory (the aviary), and recalls previous memories just as one retrieves captive birds (Gregg, 1986, p. 9). Aristotle differentiated two aspects of memory: (i) a “likeness” or “image,” and (ii) the emotional resonance that serves as a hook within the network of human experience. On this view, this strong emotion causes strong memories (Cott, 2005, p. 67). Roman orators such as Cicero famously used imaginary visual hooks in the construction of luminous memory palaces to aid in the recall of long speeches. Later thinkers increasingly combined emerging scientific understanding with memory theory. Hartley used Newtonian thermodynamics to attribute memories to the “vibration of medullary particles” (Gregg, 1986, p. 9). Luys suggested that memory retention was analogous to lingering phosphorescence. Mill developed a model based on “mental chemistry” that allowed for the dynamism and synergy observed in human memory.

46

Feineigle pioneered the “warehouse” model of memory, which “enables its owner to find any article that he may require, with the utmost readiness,” and which forms the basis of later spatial theories, generation/recognition theories, and activation theories (Tulving, 1983, p. 5). Ebbinghaus first applied empirical methodology to the study of human memory. Basing his studies on Fechner’s earlier investigations of perception, and using himself as the subject of memory experiments, he attempted “indirectly approaching the problem … in a small and definitely limited sphere, and by means of keeping aloof for awhile from any theory, perhaps of constructing one” (Ebbinghaus, 1913, p, 65). Applying experimental rigor to isolate variables and statistical analysis to interpret findings, he concluded that previous mental states are recalled in three different ways: (i) voluntarily by an exertion of will, (ii) involuntarily or spontaneously with no act of the will, and (iii) subconsciously. Measuring his own tendency to forget items over the course of time, he argued that, “very great is the dependence of retention and reproduction upon the intensity of the attention and interest which were attached to the mental states the first time they were present” (Ebbinghaus, 1913, p. 2). Although Ebbinghaus was aware of wide individual variance in memory performance, he used only himself as the subject of his experiments. Because he limited his studies to a sample size of one, and because he used the experimental method to decouple memory from other mental functions, the external validity of his results can be called into question (Gregg, 1986, p. 14). Despite these limitations, Ebbinghaus’ experimental method has been widely used by cognitive science in the study of memory ever since, and he demonstrated the effectiveness of making numerical determinations of interdependent causes and effects within controlled experimental conditions. Because Ebbinghaus used only nonsense syllables in his experiments, Bartlett argues that he avoided the “search for meaning” central to human cognition (Bartlett, 1932 p. 12). Bartlett emphasized remembering as a cultural, schematic process in which schemata, or world-views based on previous experience, provide an interpretive framework for new stimuli, and memories are retained as schematized version of events used during active recall to reconstruct the event. Because Bartlett’s holistic model of memory allows the interpretation of new material not consistent with these schemata in terms of them, allows for the complexity of memory, and allows for the rememberer’s self-perception, it has enjoyed widespread influence in modern memory theory and experimentation (Gregg, 1986, p. 15).

47

Memory research has long focused on universally recognized limitations of human memory span across different content. Kirkpatrick (1894) and Calkins (1898) found in early psychological studies that people have significantly better recall of concrete words than abstract words. Miller’s pioneer work isolated the persistent recurrence of the “magical number” of seven items as the limit of short-term retention. Noting that many more than seven individual items (e.g. numbers, letters, words) could be readily remembered if further combined into “chunks,” he maintained that such chunking is often used to overcome the limitation by using up to seven chunks (Miller, 1956). More recent studies have suggested that subjects have a practical limit of eight meaningful items that can be repeated with perfect accuracy (Mandler & Pearlstone, 1966; Mandler, 1967; Simon, 1974; Coombs et al., 1986). Other research showed that taxonomical relationships (Postman, 1972) or hierarchical organization (Bower, 1970) could extend Miller’s seemingly universal limit, and that the limit applied more to the retrieval than to the storage of items (Tulving & Pearlstone, 1966). Chunking also demonstrates that human memory is more complex than the purely mathematical series of linked responses that simple behaviorism posits (Roediger & Goff, 1998). The chunking of multiple characters into meaningful aggregates circumvents the limited capacity of short-term memory and has been used for the generation of long passwords. Gasser studied the use of phonemic (i.e. pronounceable) passwords in an early Multics computer system, and his technique involved pronounceable phonemes as chunks that he found subjects could more easily remember (Gasser, 1975). In addition to the above-mentioned general limitations on human memory, researchers have discovered significant individual differences in long-term memory. First, people vary greatly in general knowledge and learning (Gagne, 1967; Cronbach & Snow, 1977; Karis et al., 1984; Kyllonen et al., 1991). Second, there is marked variation in the way individuals organize retained knowledge in memory (Loftus & Loftus, 1974; Underwood et al., 1978; Coltheart & Evans, 1981; Geiselman et al., 1982; Brown et al., 1991; Roznowski, 1993). Third, intellectual ability, especially verbal ability, corresponds directly with retrieval speed of well-known information from long-term memory (Perruche & Baveux, 1989; Reber et al., 1991; Woltz & Schute, 1993). Much laboratory memory study focuses on the subject’s ability to recall items from arbitrary lists of low perceived value to the subject, and this is the cause of low recall rates. Lax security culture often causes users to consider passwords to be of low value as well.

48

Because of the threat of social engineering attacks, passwords based on memories of distant private experience are stronger than those based on general knowledge. Although recent research has isolated many categories of memory, Tulving distinguished two primary types of memory being studied by cognitive psychologists (Tulving, 1972). Episodic memory is “concerned with unique, concrete, personal experiences dated in the rememberer’s past” (Tulving, 1983, p. v). In contrast, semantic memory “refers to a person’s abstract, timeless knowledge of the world that he shares with others” (Tulving, 1983, p. v), that is necessary for the use of language. Episodic memories are much more specific to the experience of the individual, while semantic memories are of a more general and abstract nature. Because memory is enhanced by repetition of recoding and recall, subjects remember frequently used semantic memory items better than rarely recollected episodic memories. Tulving argues that episodic memories necessarily preserve knowledge of an event’s spatio-temporal context, while semantic memories preserve only “factual and conceptual knowledge abstracted from the contexts in which that knowledge had been acquired” (Conway, 1996, p. 166). Because the details of episodic memories are unique, concrete experiences from the user’s past, they are better suited for use at the core of secret passwords than the general knowledge of semantic memories. Later memory research both supports and calls into question Tulving’s initial semantic- episodic distinction. Although Jacoby and Dallas found evidence supporting it (Jacoby & Dallas, 1981), Conway, and Anderson and Ross found evidence to the contrary (Conway, 1996; Anderson & Ross, 1980). For the purpose of this study, however, a means to generate passwords of personal nature is desirable, and a clear-cut distinction between types of memory conceptualized by researchers is less important than the overall process of remembering, as applied to the formulation and recall of a unique password. Later memory models incorporated information processing analogies and the way that computers “accept information from the outside world, store it, manipulate it, and respond to it on the basis of information acquired previously” (Gregg, 1986, p. 11). The information- processing model of memory distinguishes three forms of memory storage: 1. Buffers of inputs from the five senses, 2. Short-term storage not retained after use, and

49

3. Long-term storage of short-term items that have undergone either rehearsal stages or elaborative processing by which new information is connected to organized mass of long- term memories. This model has become so widely accepted that cognitive psychology itself “is often taken to mean the modeling of mental processes based on such devices rather than in the sense of a general concern with mental events” (Gregg, 1986, p. 10). The information-processing model is especially productive because it readily explains how extremely complex hardware (viz. the brain) can be controlled by similarly complex, yet flexible and adaptable software (viz. language and culture). As physiological explanations of cognitive activity became prominent through study of the structure of the brain, “the coming of the computer provided a much-needed reassurance that cognitive processes were real: that they could be studied and perhaps understood” (Neisser, 1976, p. 5-6). Because the processes and components of the computer are deterministic, orderly, and understandable, the information-processing model offered a disciplined approach to mental causality. The determinism of the computer also underscored a significant distinction between humans and machines, however, since it provides the computer with the capacity to do, far better than the human, the two essential tasks involved in authentication: “the ability to store a high-quality cryptographic key, and the ability to perform cryptographic operations” upon such a key (Kaufman et al., 2002, p. 237). In response to this emphasis on process, Tulving notes that, although “the total process of remembering an event, from its original perception to recollective experience, or its expression in behavior, necessarily is extended in time,” its component processes (viz. Encoding, Recoding, Ecphory) are best understood as “instantaneous changes in state rather than activities enduring in time” (Tulving, 1983, p. 139). Thus, the “processes” in the second column of the GAPS framework (see Figure 1.1) are better understood as events. As an example, he suggests that the “act of encoding is like the act of reaching a destination: at one moment in time the event is not encoded, at another it is, denoting a change in state” (Tulving, 1983, p. 139). This study investigated means to meet the password challenge by emphasizing and reiterating the two events of password formulation (Encoding) and recall (Ecphory). Memory testing by cognitive psychologists typically follows Ebbinghaus’ experimental method of supplying a list of items to be remembered (TBR), then giving a cue to invoke recall, and tests are classified by the type of cue given to the subject. Cues in recall tests may be general

50

requests to recall TBR items from the list, or more specifically, invoking rhyme or some type of categorical relationship. Cues in recognition tests include actual TBR items, and the subject need only to decide if an item had been included in a previous presentation context. Subjects in recognition tests score much higher, and recognition is considered easier than cued recall. Cognometric authentication schemes seek to exploit this relative ease of recognition, typically using visual TBR items, to maximize reliability while minimizing cognitive load. The recall of the password as a response to the login prompt of the authentication server can be viewed as a special case of the memory recall experiment pioneered by Ebbinghaus. This study investigates the effect of making password generation and recall a more semantic, holistic experience for the user, as in Bartlett’s model, and reliant upon personally meaningful episodic events in the user’s life as in Tulving’s model. The core of the password problem is defined in large part by user inability to remember and accurately input strong or infrequently used passwords, when facing the login prompt. The relative difficulty of cued recall and the phenomenon of failure to remember have driven much memory research, and it was frequently observed that subjects could later recall items, either unaided or with a cue, that they failed to recall in early tests. Tulving distinguished between memory failure caused by the unavailability of knowledge in memory, and that caused by its inaccessibility (Tulving, 1968, 1974; Tulving & Pearlstone, 1966). Tulving and Osler developed the “encoding specificity hypothesis” to explain the benefit of attaching cue information to the TBR word at the time of encoding (Tulving & Osler, 1968), but also found detrimental effects of supplying retrieval cues. Other studies demonstrated the negative impact on recall caused by supplying “intra-list cues” drawn from the TBR items in the list (Brown, 1968; Hudson & Austin, 1970; Rundus, 1973; Mueller & Watkins, 1977) and “extra-list cues” (Watkins, 1975). Shriffin attributed this effect partially to “output interference” (Shriffin, 1970), while Watkins has suggested “cue-overloading” (Watkins, 1979). Recognition tests, because they supply subjects with actual TBR items, can be considered to provide the most specific retrieval cues possible. Because of the identity of these supplied cues, the contrast of recognition tests vis-à-vis recall tests has driven much of recent memory theory. Kintsch’s “two-stage” model posits that recall consists of (i) a process of generating words from memory using tagged semantic relationships, followed by (ii) a process of recognition of the familiarity of a word’s memory trace (Kintsch, 1970, 1974). Anderson and

51

Bower developed a “contextual” generate-recognize model of recall in which Kintsch’s “familiarity” in the recognition stage is replaced by “contextual elements,” and this accounts for some anomalies caused by the experimental context itself (Anderson & Bower, 1973). Tulving proposed the “episodic ecphory” theory as an alternative to the generate-recognize model. He rejected both the independent stages and the tagging of the semantic memory system of Kintsch, Anderson, and Bower. Instead, he argued that episodic memory alone as the source of both recall and recognition phenomena (Tulving, 1976). In this view, semantic memory is not tagged with episodic information, but episodic memory is tagged with semantic information. Ecphory is central to Tulving’s GAPS framework. He defines it as “the (hypothetical) event-process that converts the relevant information in the retrieval environment and the (Original or Recoded) Engram into Ecphoric Information” (Tulving, 1983, p. 175). He borrows the term (as well as the term engram) from Semon, who used it to describe the activation of a latent engram into an active state in the subject’s consciousness. Ecphory in the GAPS framework results from the matching of the temporal and contextual information implicit in a cue with knowledge in the episodic trace. The result of this interaction is Ecphoric Information, which passes to Recollective Experience, which in turn is converted to observable behavior. Although Tulving argued early on that recall and recognition differ only in the efficiency of the retrieval cues given to subjects during the test (Tulving, 1976), and although he seemingly conflated recall and recognition, his mature “synergistic ecphory” model of retrieval allows for a distinction, particularly in regard to the minimum amount of ecphoric information needed to overcome the “conversion threshold.” There is evidence that humans use different types of memory to remember different stimuli such as faces, melodies, landmarks, lyrics, etc. Burnett found that the most memorable passwords used combinations of these: the words have meaning to us, the characters form some pattern, the password creates a sound in the head, and even the typing of the password forms a kinesthetic pattern (Burnett, 2006, p. 77). This study explored the effects of reiteration and password-generation schemes that combine a holistic series of semantically significant events with the chunking effects of natural language and paradigmatic examples to overcome robust password memory failure.

52

2.10 User-centered Studies of Security and Passwords This study sought effective means for users to satisfy computer security needs with passwords that are strong, meaningful, relatively easy to input, and resistant to forgetting. To this end, a methodology to explore the effects of reiteration and various password-generation schemes was needed. This section surveys previous studies that specifically tested the usability of passwords and memometric authentication systems. The study of the usability of access control mechanisms is a prominent sub-field of HCISec, and it is part of an overall trend among engineers and software developers to make IT more effective. ISO 9241-11 establishes usability as the “extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use” (ISO 9241-11, 1998), and ISO/IEC 9126-1 defines usability more narrowly within the field of software engineering (ISO/IEC 9126-1, 2001). Usability studies have consistently found a conflict between the usability and the security of computing systems (Adams et al., 1997; Schultz et al., 2001; Smetters & Grinter; Dourish & Redmiles, 2002), and HCISec research seeks to reconcile the two. Whitten and Tygar have suggested that security software is usable if the people who are expected to use it: 1. Are reliably made aware of the security tasks they need to perform 2. Are able to figure out how to successfully perform those tasks 3. Don’t make dangerous errors 4. Are sufficiently comfortable with the interface to continue using it (Whitten & Tygar, 1999, p. 173).

This definition includes the education, cognition, practice, and esthetic sensibilities of users as they participate in the overall security process. It addresses the issues of effectiveness, efficiency, and satisfaction delineated in ISO 9241-11 with respect to security applications. Studies have found many vulnerability-causing work practices and indifference to security that need to be addressed (NIST, 1985; DeAlavare, 1990; Davis & Price, 1987; Ford, 1994; Gordon, 1995; Besnard & Arief, 2004). When computer users are ignorant of their important role in security, they create vulnerabilities for all who interact with them (Weirich & Sasse, 2002). Although Whitten and Tygar’s requirements are general, they are also directly applicable to access control, and since access control is a necessary function in the networked computing environment, users of memometric systems need to know the consequences and dangers of weak passwords. The

53

first and fourth requirements are particularly applicable to password security. This study seeks ways to contribute to the usability of security by making passwords that users can generate, recall and input with a minimum of cognitive load, input error, and recall failure. Whitten and Tygar further argue that the HCISec challenge for effective interface design has five properties: 1. The unmotivated user property, 2. The abstraction property, 3. The lack of feedback property, 4. The barn door property, and 5. The weakest link property (Whitten & Tygar, 1999, p. 174).

In the specific case of memometric authentication, the unmotivated user property causes users to bypass good password practice because they see access control as a nuisance. The abstraction property imposes access restriction on users in ways they cannot understand or appreciate, and this results in further loss of motivation and cooperation. The lack of feedback property perpetuates the lack of motivation and misunderstanding of the security situation, particularly in organizational settings, but increasingly among all internet users. The barn door property punishes users who fail to implement good security from the start, since once a breech occurs, data loss is irreversible and drastic measures must be taken. The weakest link property points to the unfortunate truth that attackers and social hackers (e.g. Mitnick) avoid the robust sections of the security façade and focus their attacks on the vulnerabilities (e.g. weak passwords) that users make available. This property is the most salient to those researchers in the HCISec community who seek to engage users more closely within the security process, to overcome condescension among security staff, and to look for ways to overcome the lax work practices and low security motivation that result in weak passwords. Adams and Sasse argue that “users must be shown, proactively, how to construct memorable passwords that do not circumvent security mechanisms” (Adams & Sasse, 1999, p. 46). They contend that social hackers and attackers concentrate more on the human link in the security chain than security designers do. Based on their work with users, they further suggest that “providing feedback during the password construction process not only assists users in the construction of secure passwords, it also is an example of security in action and increases users’ awareness of system security and its importance” (Adams & Sasse, 1999, p. 46). Using an

54

exploratory, grounded theory approach and a web-based questionnaire with 139 respondents in the workforce, they identified four major factors influencing effective password usage: 1. Multiple passwords; 2. Password content; 3. Perceived compatibility with work practices; and 4. Users’ perception of organizational security and information sensitivity (Adams & Sasse, 1999, p. 41).

Multiple passwords further exacerbate the password problem for users, many of whom can produce one or two adequately robust passwords, but struggle to come up with other unique ones for other systems, and Adams and Sasse suggest the technological solutions of single sign-on (SSO) or “Smart Cards” as possible ways to alleviate this memory problem (Adams & Sasse, 1999, p. 46). Working on the problem of multiple passwords from stimulus-response context (Proctor & Vu, 2006), Vu suggests writing a sentence that encapsulates the password, and using the context of that sentence to provide cues for the recall of the password, but finds that “people remember the gist of the sentence [but], without the specific cues, the password cannot be remembered” (Ericson, 2006). Vu also suggests that the key to effective password usage is “less a matter of not forgetting and more a matter of training yourself to remember” (Ericson, 2006). The multiple password problem is also compounded by password aging, a common policy mandating password changes on a periodic basis with the hope of staying ahead of the attackers. Evidence of user misunderstanding of the content of passwords emerges in all studies, but Adams and Sasse focus on the organizational problems caused by a lack of communication and mutual understanding between the users, whose primary concern is their individual or team work in the organization, and the IT staff, whose concern is the overall security of the system, data, and resource availability. They identify two central problems in password usage: “system factors, which users perceive they are forced to circumvent, and external factors, which are perceived as incompatible with working procedures” (Adams & Sasse, 1999, p. 43). Zviran and Haga performed a comparative empirical evaluation of the memorability and subjective preference of various password mechanisms on 103 graduate students. Although they were interested in “secondary” passwords in multi-level access control, they found that “cognitive” and “associative” passwords showed promise in terms of memorability and subjective preference among users (Zviran & Haga, 1993, p. 227). The authors define cognitive passwords as a non-trivial series of five questions and answers, drawn from a pre-established set

55

of twenty passwords, based on an individual user’s perceptions, personal interests, and personal history. These “cultural passwords” use the cued recall of episodic memory to lesson the cognitive load on users (Podd et al., 1996; Renaud, 2005, p. 111). There are several problems with this type of cognitive password. First, the nature of the secret question limits the range of possible answers, thus making contemporary brute force attacks more feasible. Second, the answers are necessarily brittle, so the user must remember and input the exact formulation of twenty different passwords, some of which may be used infrequently. Third, attempts to make the mechanism more flexible (viz. by allowing one or two wrong answers) increase vulnerability to social engineering attacks. Fourth, the answers are usually fixed by the history of the user and draw on a limited character set, and, if that history is researchable, they are vulnerable to social engineering attacks by persons with knowledge of the user (especially disgruntled personal associates). Zviran’s and Haga’s notion of associative passwords is adapted from Smith’s earlier work on authenticating users by word association (Smith, 1987). It is similar to their own conception of the cognitive password in which the user constructs a list of twenty non-trivial single-word cues and corresponding one-word responses. Thus, in spite of the deliberate use of non-trivial questions, the associative password is also vulnerable to social engineering attacks. Despite these shortcomings, Zviran and Haga concur with Smith and recommend associative and cognitive passwords as best suited for “secondary” (i.e. part of a two-factor scheme) authentication on the bases of ease of use and recall (Zviran & Haga, 1993, p. 235). Zviran and Haga found that only 21.4% of respondents were able to recall their passphrases, which averaged 23 characters, compared to 68% of respondents who correctly recalled between 12 and 18 cognitive items, and 58.3% of respondents who correctly recalled 14 or more of the 20 associative items (Zviran & Haga, 1993, p. 233). These findings are problematic for two reasons. First, any good 23-character passphrase is significantly stronger than the partial recall of items drawn from a much more limited range of possible answers. Second, Zviran and Haga may have skewed the result by providing more coaching for the cognitive and associative schemes than they did for passphrase generation. Zviran and Haga’s comparative investigation is methodologically similar to this study. Although they concluded that passphrases were less suited for secondary passwords than cognitive or associative passwords, this study explores the feasibility of primary passwords, and

56

argues that passphrases remain viable candidates for the task. Despite this, their methodology is useful in that it divides subjects into testing groups, supplies the groups with different password- generation protocols, and compares the recall of the various alternative password techniques. Yan et al. performed a controlled trial of the effects of giving users different password- generation advice, and in this way, it is also methodologically similar to this study. Building on Adams and Sasse’s recommendation that organizations “provide instruction and training on how to construct usable and secure passwords” (Adams & Sasse, 1999, p. 45), Sasse et al.’s findings that 90% of users in a survey were frustrated with password mechanisms and welcomed assistance in the process (Sasse et al., 2001, p. 129), and noticing a general tendency to emphasize strength over memorability among organizations, they investigated the trade-off between security and memorability. The subjects included 288 students at the University of Cambridge’s School of Natural Sciences randomly assigned to three groups. The instructions to the control group were “your password should be at least seven characters long and contain at least one non-letter.” The instructions to the random password group were to blindly pick eight characters from a sheet with all the letters and numbers, then keep a written record until they memorized the password. The instructions to the passphrase group were to formulate a password based on a mnemonic phrase. Yan el al. then waited one month and subjected all passwords to four attacks: dictionary, permutation of words and numbers, user information, and brute force. The results of the attacks are listed in Table 2.10.

Table 2.10: Results of Password Attacks, by Test Group Group Cracked in First Three Attacks Cracked with Brute Force Control 30 32% 3 Random 8 8% 3 Passphrase 6 6% 3 Comparison 33 33% 2 (Source: Yan et al., 2004, p. 29)

All three groups produced stronger passwords than the control group, and very few subjects required a password reset. As expected by the authors, random passwords proved stronger than naïvely selected passwords. Significantly, the passphrase group produced passwords roughly as strong as the random group.

57

Four months after the beginning of the experiment, the authors sent out an email survey asking two questions: 1. How hard did you find it to memorize your password, on a scale from 1 (trivial) to 5 (impossible)? 2. For how long did you have to carry around a written copy of the password to refer to? Please estimate the length of time in weeks.

The results of this survey are listed in Table 2.11.

Table 2.11: Responses to Email Survey Group Responses Difficulty Level (1-5) Weeks Control 80 1.52 0.7 Random 71 3.15 4.8 Passphrase 78 1.67 0.6 (Source: Yan et al., 2004, p. 29)

These results confirmed the authors’ initial expectations that random passwords are more difficult to remember, and, significant to the present study, showed that passphrases were no more difficult to remember than weak passwords. They also found a non-compliance rate of 10% among the random and passphrase groups. Because of the weak passwords resulting from this non-compliance, they recognized the challenge of developing compliance enforcement mechanisms effective in mnemonic passphrase generation. They also left the issue of multiple passwords to future study.

2.11 Methodology from Previous Studies This study drew upon four previous password studies for its research agenda. Zviran and Haga, (1993), Brostoff & Sasse (1999), Wiedenbeck et al. (2004), and Yan et al. (2004) investigated the strength, usability, and memorability of different password schemes in various ways, and their methods served as a foundation for this study. The specifics of their methodology, in terms of instruments used, protocols followed, and analyses performed, are included in this section. Zviran and Haga (1993) used two similar questionnaires administered to 103 graduate students. Subjects were 85% male, 15% female, and the average age of was 33, with a 5-year average experience with computers. They administered the questionnaires three months apart to

58

test the recall of four types of passwords: an 8-character system-generated random password, an 8-character system-generated “pronounceable” password, self-generated passphrases, “cognitive” passwords, and “associative” passwords. After collecting data, the authors performed the Cramer’s V test on the null hypothesis (that there is no difference in recall success across the various groups), a Chi-squared test to check for statistical significance, and a Spearman’s rank order correlation to compare “the rankings of respondents’ opinions about ease of recall and liking of different types of passwords with their actual success at recalling the different types” (Zviran & Haga, 1993, p. 234). Brostoff & Sasse (1999) tested 386 undergraduate computer science students by examining system logs of student accounts on web-based courseware during a 10-week UK semester. This log analysis indicated the success of login attempts and the use of a special “password reminder” facility. Wiedenbeck et al. (2004) tested graphical passwords on forty experienced computer users in three sessions. The enrollment phase lasted 35 minutes, during which the researchers explained the procedure, the system enforced minimum requirements, and the password was displayed as feedback. During the learning phase, participants entered the password repeatedly until they achieved ten correct password inputs, with binary feedback confirming the correctness of each input. If subjects had difficulty, a “Show my password” feedback function was available. At the end of the session, subjects completed an online questionnaire. Researchers measured retention three times: at the end of first session, one week later, and four weeks later with only one correct entry required to pass the test. Upon four failures, the “Show my password” was enabled. Yan et al. (2004) tested the effects of different advice given to assist users in password generation using 288 subjects randomly assigned to three groups. They subjected all passwords to four types of attack: dictionary, permutation of words and numbers, user information from password files, and brute force (although brute force attacks were limited to less than 6 characters). They tested validity by launching the same attacks on computer accounts of 100 students not involved in the study at all. Finally, four months after beginning the experiment, they tested the usability of the passwords by means of a survey. This study utilized methodology from all of theses previous studies, as discussed in Chapter 3.

59

2.12 Selected Password-Generation Schemes This study sought effective and usable password generation schemes, and this section surveys the field to identify the most promising candidate schemes in use today. Because many of the details of network security are trade secrets this survey cannot be comprehensive, but serves to isolate representative examples of the types of schemes currently in use. To this end, this section lists five password-generation schemes that have come from a variety of sources including developer, system administrators, security experts, and software vendors. U.S. federal regulation supplies a useful baseline for password-generation. The 1985 Federal Information Processing Standards (FIPS) publication 112 delineated four levels of password security. The highest-level passwords require six to eight characters compiled by an automated password generator from the entire 94-character ASCII set available on the American English keyboard (NIST, 1985). The result is a pseudo-random password of relatively high entropy (e.g. “%*4BY#w3”) that users often find esthetically unpleasant, difficult to remember and type, and that has become vulnerable to modern pre-computing attacks. When security policy imposes such short, cryptic, and difficult to type passwords from above, users learn them though repeated recall and input. Because of the vulnerability of these relatively short passwords to pre-computing attacks, advocates of machine-generated passwords have begun to recommend longer versions. Although computers can easily meet that challenge, long, cryptic passwords imposed from above typically meet with user resistance and result in high authentication failure rates. To meet the need for longer passwords, the following schemes take a more user-centered approach. The example passwords that follow are in the range of 20+ characters.

2.12.1 Password Generation Scheme 1: The Old Address. A commonly used, but seemingly apocryphal, password-generation technique is to simply spell out an old, unforgettable address, such as “819 Ash St., Keokuk, Iowa.” This example contains twenty-six characters, including numbers, uppercase letters, and symbols, in a formulation that makes common sense and that places little additional cognitive load on the former resident.

2.12.2 Password Generation Scheme 2: Unexpected Nonsense. Contending that, “passphrases are certainly the best way of storing a secret in a human brain,” Security experts Ferguson and Schneier recommend unexpectedly nonsensical passphrases (Ferguson & Schneier, 2003, p.

60

350). For example, their nonsense passphrase, “Pink curtains meander across the ocean” is not difficult to recall and input, and its imagery makes it easy to remember. Nevertheless, it contains 38 characters and approximately 57-76 bits of entropy. If expanded slightly to “Pink dotty curtains meander over seas of Xmas wishes,” the result is a phrase of 51 characters with 78-104 bits of entropy (Ferguson & Schneier, 2003, p. 349).

2.12.3 Password Generation Scheme 3: The Acrostic. The above examples contain relatively low entropy per character because of their reliance on grammatical English. One scheme to increase the entropy per character uses the acrostic, which draws only one letter (usually the first) of each word, of an easily remembered phrase. Thus, “Wtnitmtstsaaoof,ottaaasot,aboet.” is a 32-character string, regenerated on the fly, by a user who remembers Hamlet’s line, “Whether ‘tis nobler in the mind to suffer the slings and arrows of outrageous fortune, or to take arms against a sea of troubles, and by opposing end them.” The sequence of characters in this string are much more random than sequences of letters appearing in natural English (Bishop, 2003, p. 318; Ferguson & Schneier, 2003, p. 349). Spafford, however, warns of the potential danger of “using the first letters of a favorite phrase … [because] many of the users may use the exact algorithm, thus making an attack easier” (Spafford, 1992, p. 3).

2.12.4 Password Generation Scheme 4: The Old Memory. Burnett advocates building passwords on personal experience, not simply choosing something arbitrary. Rejecting commonly used secret questions – which are similar to Zviran and Haga’s cognitive passwords because their answers are usually fixed and researchable, he suggests others such as: • What is the first and last name of your first boyfriend or girlfriend? • Which phone number do you remember most from your childhood? • What was your favorite place to visit as a child? • Who was your favorite actor, musician, or artist when you were 16? (Burnett, 2006, p. 91). Answers to these types of questions draw on remote, yet unforgettable episodic memories that are resistant to cursory research. “My favorite place was Lake Geode,” for example, is a 32- character answer, in the form of a sentence, to the third question.

61

2.12.5 Password Generation Scheme 5: The Confession. One of the weaknesses of passwords is that users often share them with others to provide temporary access, either for convenience or emergency. This practice is highly undesirable from a security perspective, and can be discouraged by the use of embarrassing or confessional passwords. Examples include: “I pick my nose at stoplights,” (Burnett, 2006, p. 93) and “The PHB is fat and stupid!”

Developers, administrators, and security experts have suggested the above five schemes for users to generate long passwords. As discussed in Chapter 3, treatment instruments in this study presented these schemes to participants to assist in the generation of study passwords, suggested techniques to bolster entropy, and offered various quantities of practical example passwords to subject groups.

2.13 Chapter Summary This chapter surveyed extant literature in the field of computer authentication and password research, traced the history of memometric authentication in networked computing, and briefly summarized the history of cognitive memory research. Just as networked IT systems have added access control, passwords, cryptographic techniques, etc. in an ad hoc manner in response to unforeseen threats, HCISec increasingly investigates the adequacy of security culture and user training for countering emerging security threats. The majority of previous password research focused overwhelmingly on password strength. Although many in the past suggested increasing password strength through complexity of search space, the cryptanalysis section in Chapter 4 of this study argues that the primary source of password strength is length, not search space. Early security system designers considered users incapable of or resistant to long passwords, but research has shown that chunking and entropy enhancing techniques greatly facilitate users of long passwords. The investigator assumed that users are capable of using long textual passwords, and sought the best methods to assist them in such use. Previous research has suggested that users are insufficiently engaged in the security process and pointed to the negative consequences for password practice. This study used findings from cognitive psychology to inform the proactive engagement of users in the password security process in several ways. As Yan et al. discovered, subjects using passphrases generated

62

passwords with equal or greater strength than those using “random,” system-generated formulations. Such strong passphrases generally include chunking to minimize cognitive load by limiting the number of TBR items to Miller’s “magic” seven or less. Some password-generation schemes use personal, unforgettable memories, as Bartlett suggested, to generate holistic, cultural schemata as the core of the central secret of the password. It was expected that reiteration and salient examples during the generation stage could stimulate the attention and interest in the TBR items that Ebbinghaus found essential for recall success. Tulving’s GAPS framework, although a general model of cue and response memory testing, shows promise in the refinement of password theory and practice. This study tested the applicability of the GAPS model on password use, and explored ways to refine it for that purpose. This review has shown the positive effects of chunking on cue and response that memory research has discovered. The cue and response interaction used to test memory in cognitive experiments is similar to the prompt and authenticate interaction of authentication mechanisms. Tulving proposed the GAPS framework based on thousands of memory experiments, but no HCISec research has used it to inform the inquiry into password research. The GAPS model is readily extensible, and the iterative events of recoding are tested as corresponding events in password generation. This study sought to engage GAPS as an exploratory model for password research.

2.14 Conclusions This study tested the effects of various password-generation schemes, password examples, and mandatory reentry on user password generation, recall, and input, and this chapter traced the history of relevant memory research and the development of memory theory by cognitive psychologists. The prompt and input interaction used by most memometric authentication mechanisms closely resembles the interaction of uncued response in memory research. The findings from such memory research contributed to the conceptual framework used in this study, which sought to contribute to theoretical knowledge and methodology involving password research as well as practical application of password research to security culture in the field. This chapter included a review of similar user-centered studies measuring password strength, usability, and memorability, and relevant methodologies used.

63

Although Porter first proposed long passphrases in 1981, and Zimmermann strongly advocated them for the privacy conscious users of PGP, many in the security community do not currently promote their use among users. This practice perpetuates the vulnerability of the weakest link, and represents a failure to engage users more in the security process. This study hoped to inform users, security professionals, system administrators, and all other information security stakeholders that the use of robust 20-character passphrases is well within the capabilities of users. This literature review has also revealed the ill effects of dysfunctional security culture on memometric authentication. When users are disengaged, they use weak passwords, write passwords in insecure ways, reuse them in unauthorized ways, and become frustrated by denied access. This study tested three means of engaging users in the password generation process: (i) providing a choice of practical password generation schemes for users to use, (ii) providing multiple examples of robust passwords, (iii) requiring reiteration of password upon enrollment to the system. Such exclusive practical usability testing of user engagement with robust password usage was conspicuously absent in the literature. Finally, although practical password culture has in many ways confirmed the results of the cryptanalysis, chunking, and iteration discovered in this literature review and security professionals and researchers in the field have suggested useful schemes to assist users in password generation, no systematic evaluation or comparison of these schemes has been done. This study sought to help users by making such schemes available, refining them when possible, and evaluating their effectiveness in robust password usage.

64

CHAPTER 3 METHODOLOGY

3.1 Introduction This study sought to contribute to the usability of computer security, specifically in the area of memometric authentication interaction. Growing security threats increasingly demand longer passwords from users, many of whom are ill prepared to manage them. In response, users, administrators, security experts, and researchers have proposed a variety of means to assist users in the generation of passwords that meet the seemingly contrary requirements of security and usability. This study sought out the most promising of these, compared their capabilities, contrasted their potential weaknesses, measured their effects on actual authentication interaction with participants, and evaluated them in terms of usability, resultant password security and usability, and user preference. This study used four groups of participants who were free to use any of five suggested password-generation schemes, or to use one of their own choosing, to test (i) the effectiveness of password-generation schemes, (ii) the effects of providing practical examples of passwords generated using these schemes, and (iii) the effects of requiring participants to reenter passwords five times immediately after generating them. To measure these three variables, this study used five primary research methods: cryptanalysis, expert testing, log analysis, think-aloud user testing, and surveys. Table 3.1 outlines the specific steps in the research design of this study. The majority of these steps were taken sequentially, but quantitative log data collection and analysis were continuous during the course of the study. Participants were required to login once per week for the seven-week duration of the study. The initial and final surveys and all password enrollments and logins were administered online.

65

Table 3.1: Research Design Step Task Schedule 1. perform a review of the literature (Chapter 2) 8/06 2. identify five key password-generation schemes to be tested 8/06 3. obtain FSU Human Subjects Committee approval and consent 9/06 4. perform a cryptanalysis on the factors of password strength and usability Week 1 (Chapter 4) 5. establish a threshold of cryptographic strength for all passwords to be Week 1 used in the study 6. identify key password usability issues in terms of user strengths and Week 1 weaknesses 7. compare and contrast the relative strengths and weaknesses of the five Week 1 schemes 8. modify the five password-generation schemes to meet established Week 1 cryptographic threshold (See Appendix E) 9. pretest study instruments with expert researchers and programmers Week 2 (Appendices C & D) 10. enlist participants Weeks 1-3 11. pilot test study instrument with representative subjects Week 3 12. configure automated logging software to assess password input accuracy Weeks 1-3 longitudinally 13. perform expert usability testing of modified password-generation schemes Week 4 (Appendix E) 14. survey participants Week 4 15. perform think-aloud usability testing of password-generation schemes Week 4 (Appendix C) 16. survey participants (Appendix D) Weeks 4-9 17. analyze password input logs (Chapters 4-7) Weeks 4-9 18. compile study findings (Chapters 4-7) Week 10 19. draw conclusions in terms of: Week 10 a. usability of passwords generated, b. memorability of passwords generated, and c. conceptual framework (See Chapter 8) 20. refine conceptual framework (Chapters 7 & 8) Week 11 21. make recommendations in terms of: Week 12 a. security practice, and b. security policy (Chapter 8) 22. suggest further research (Chapter 8) Week 12

3.2 Research Questions The purpose of this study was to contribute to the usability of human interaction with secure memometric authentication systems. The goal of the study was to discover effective

66

means of assisting users generate, input, and recall the cryptographically strong passwords necessary for such secure memometric authentication systems. The specific objectives of this study were to: 1. Identify prominent password-generation schemes; 2. Contrast the relative strengths and weaknesses of identified schemes; 3. Measure recall and input success rates of passwords generated using those schemes; 4. Assess the long-term memorability of generated passwords; 5. Evaluate the relative effectiveness of each scheme; and 6. Propose future research for improving robust password management. In pursuit of these objectives the following four research questions drove the data collection in this study: 1. What level of cryptographic strength is necessary for contemporary secure passwords? 2. What available password-generation schemes best combine security and usability? 3. Can password examples and multiple input during the password-generation improve subsequent recall and input of robust passwords? 4. Does the conceptual framework used in this study contribute to research into the usability of human interaction with secure memometric authentication systems? The multiple methods of data collection used in this study provided meaningful answers to these research questions. Cryptanalysis of password strength and attack feasibility established a reasonable strength threshold for viable passwords in the current threat environment. Side by side analysis of password generation schemes identified relative strengths and weaknesses in terms of potential password strength and usability. Log analysis indicated trends in input accuracy of passwords longitudinally. Log analysis and a final user survey assessed the longer- term memorability and usability of generated passwords, and allowed inference into user perspectives and preferences.

3.3 Justification of Methodology The methodology used in this study provided an exploratory assessment of the effects of password-generation schemes on user interaction with secure authentication mechanisms. Most previous studies of password issues have been concerned primarily with password strength, but modern proactive password checking programs have trivialized such concerns by automating

67

such assessments during the password enrollment stage. Effective proactive password checkers can require any level of password strength from users, and can be configured to enforce the specific requirements of organizational security policy. This study explored the human aspects of the generation, recall, and input of long passwords. Previous studies have repeatedly found that, although authentication mechanisms put few limitations on the possibilities available for the formulation of robust passwords, users typically resort to simple, short, or easily guessed formulations that compromise the security of networked resources and identities. Further, top-down, policy-driven requirements enforced by password checkers often fail to engage users in the security process, with the result that users often create vulnerabilities by bypassing sound security practice whenever possible.

3.4 Relation of Methodology to the Conceptual Framework There were no previously established conceptual frameworks specifically designed for the study of password usability and memorability. As users are required to manage increasingly longer passwords, and present security culture leaves them ill prepared to do so, the usability and memorability of strong passwords has become an important issue in information security. This study used Tulving’s GAPS framework to isolate discrete events in the phenomenon of password remembrance, as observed in cue-response memory testing. Tulving conceived the GAPS framework to be general in scope, and it readily lends itself to extensibility. Based on the cognitive psychological methodology of treating subjects in memory tests within a controlled environment, GAPS conceptualizes the sequences of events and mental states required to explain observations made during memory testing using the standard cue and response of TBR items. The password-generation, recall, and input events that occur in modern authentication mechanisms correspond well with the Interpolated Event, cue, and response sequence in the controlled cue-response experiments that Tulving and other cognitive psychologists, following pioneers such as Ebbinghaus and Semon, perform in the laboratory. This study initially modified the GAPS framework in two ways. First, it suggested practical examples to encourage five iterations of the interpolated event stage per password- generation scheme. Second, it reconceptualized the GAPS Recoding-Ecphory-Conversion sequence into a generation-prompt-input loop to test the effect of multiple password reentry

68

during the generation stage. Thus, the password generation and enrollment stage of this study was specifically designed to: 1. test the effect of supplying different quantities of password examples, and 2. test the effect of reentry iteration on the Recoding, Ecphory, and Conversion of the password. The first test explored the effect of providing assistance in the form of practical password examples during the critical password generation stage. The five or ten, respectively, provided examples per password-generation scheme reiterated the GAPS interpolated event-recoding sequence to appeal to the participant’s Original Engrams. The second test explored the effect of iteration of GAPS’ central Recoding, Ecphory, and Conversion events that participants undergo in response to cues given by researchers. The reiteration imposed on treatment groups in this study operationalized this sequence as an iterative loop to test its effect on fixing the new password generation sequence in the participant’s recollective experience. Participants in two treatment groups reentered their passwords repeatedly to create a series of events that reinforce the idea, the format, and the tactile input unique to the generated test-based password with the assumption that improved subsequent recall and input would result. Table 3.2 shows the treatments introduced by subject group. Using both quantitative and qualitative techniques to collect data on the effects of two interventions on participant password generation, input, and recall, this study tested the effects of engaging users in the password- generation process by (i) providing numerous examples, and (ii) requiring multiple password reentries during the generation stage.

Table 3.2: Subject Groups by Treatment Group Example Passwords Provided Password Entries Required 1 1 2 2 5 2 3 1 5 4 10 5

The investigator allowed all participants to select one of the suggested password-generation schemes introduced in Section 2.12, or use a scheme of their own, and then tested all candidate passwords to see if they exceeded the cryptographic strength threshold established in Chapter 4. Group 1 served as a control group, in which the password enrollment script provided participants

69

with a single example password and required participants to enter the candidate password only twice, as per the industry norm. The treatment instrument for group 2 supplied participants with five example passwords per scheme, and the treatment instrument for group 4 supplied participants with ten example passwords per scheme, all to explore the effects of practical coaching during the creative stage of password generation. The treatment instrument for groups 3 and 4 required participants to reenter their candidate passwords four additional times to explore the effect of reiteration of the prompt, recall, and input sequence in terms of subsequent password performance. This study used 139 participants drawn from students at the Florida State University College of Information, five information security expert testers, and four doctoral student participants drawn from the Florida State University College of Information doctoral program. Although there is no universal consensus concerning the number of participants to use in usability testing, the range of three to ten is typically recommended (Lazar, 2006; Krug, 2000, Nielsen, 2000), and the investigator randomly assigned participants to the four test groups. The use of more than fifty participants resulted in groups larger than ten, and allowed for expected dropouts.

3.5 Purpose of Methodology This study used multiple methods to explore means to improve the usability of robust passwords, and went beyond the assessment of password strength that was the focus of most previous password research. To this end, the cryptanalysis section in Chapter 4 illustrated the usability gains provided by long, easy to type passwords, and suggested means to increase the password strength with minimal loss of usability. The majority of the data collected concerned the ease of the password generation and subsequent recall and input over the test period. Because infrequently used passwords are a source of great frustration for users, the frequency of subsequent input of generated passwords was logged by software behind the authentication mechanism developed for this study. Because forgetfulness is one of the most significant problems users have with passwords, the recall of the password at the end of the study was an important indicator of the effectiveness of the treatments under test. A final survey solicited user experience of password recall and input throughout the study.

70

All preliminary methodological components of this study were preludes to the survey questionnaire that concluded it. Usability studies have increasingly utilized experts and cognitive walkthroughs for expediency (Nielson, 2000), but the password challenge remains a central feature of both security culture and HCISec research, and this study used both experts and users to uncover usability issues and user perspectives associated with long password management.

3.6 Relation of Research Questions to Method The research questions of this study determined the methodology used, and this section shows the relation between research question and methodology. Avoiding the emphasis on cryptographic strength that characterized the majority of previous password research, this study purposely tested the usability and recall of passwords that surpass a high cryptographic strength threshold. Table 3.3 shows the data collection methods used to answer research questions.

Table 3.3: Relation of Research Questions to Data Collection Methods. Data Collection Method Crypt- Expert Log User User analysis Testing Analysis Testing Testing (Think (Survey) Research Question Aloud) 1. What level of cryptographic strength is Yes Yes necessary for secure passwords? 2. What available password-generation Yes Yes Yes Yes schemes best combine security and usability? 3. Can password examples and multiple input during the password-generation improve Yes Yes Yes subsequent recall and input of robust passwords? 4. Does the conceptual framework used in this study contribute to the usability of human Yes Yes Yes interaction with secure memometric authentication systems?

Cryptographic analysis and expert testing determined the cryptographic strength threshold of passwords used for this study’s usability testing. Pilot testing isolated potential validity problems with treatment instruments and software configuration. Think-aloud protocols used with password enrollment schemes collected data about participants’ thought processes at each stage of the password enrollment, and an initial survey solicited user feelings, preferences, and suggestions based on their previous experiences with password management. Log analysis software using a web interface and background database (viz. a customizable open-source

71

PHP/MySQL UCCASS package) collected numerical data concerning participants, passwords, and login accuracy. A final survey solicited user feelings, preferences, and suggestions based on their experience during the study. Thus, four methods collected data concerning the usability of schemes to generate passwords: expert testing, log analysis, think aloud, and survey, and three methods collected data determining the relative memorability of passwords generated by the various schemes: log analysis, think aloud interviews, and surveys.

3.7 Data Collection This study explored the usability of password-generation schemes that generate long passwords or passphrases with a minimum of twenty characters. To test these schemes, and means to engage users in the use of them, five methods were used: cryptanalysis, expert testing, log analysis, think aloud user testing, and survey questionnaires. This section delineates the use of each of these methods in terms of participant selection, unit of analysis, validity of data, and reliability of data, where applicable.

3.7.1 Cryptanalysis. Researchers and security experts, beginning with Porter, have long argued that short passwords are vulnerable, and advances in computing power and storage capabilities have greatly enabled attacks on passwords shorter than 15 characters. The units of cryptanalysis were 20-character passwords typical of those generated by participants in the study. Cryptanalysis of password search space and length determined a pragmatic strength threshold for passwords viable for present systems and systems of the foreseeable future. A mathematical comparison of the relative gains to password security available through increasing password search space and length demonstrated the advantage of length, and the validity of these findings is expressed in mathematical terms.

3.7.2 Expert Testing. Information security expert testing assessed (i) the password-generation schemes used in this study, and (ii) the cryptographic strength of passwords that users generate using them, in terms of strength, input accuracy, and usability. In the first and second cases, the unit of analysis was the password-generation scheme, and in the second, the resultant password. Expert testing determined the specific merits of each scheme, and looked for potential weaknesses in passwords generated.

72

Expert testing searched for specific cryptographic and usability problems, and provided a better overall understanding of the relative merits of the schemes than strict user testing alone could have provided, by identifying the breadth and depth of the usability problems associated with each scheme and its corresponding passwords. This study also used two existing testing instruments developed to quantify password strength. In addition to password strength and usability analysis, Chapter 4 determined modifications to each scheme for use in this study with the aim of further increasing the entropy of resultant passwords. The results of expert testing on the schemes were applied to modify them for use in the treatment instruments used in the pre- testing and password-generation stages.

3.7.3 Pilot-testing. An initial small-scale pilot test preceded the usability testing. Pre-testing used four participants, chosen from the same participant population as the main study, with the aim of identifying problems with the treatment instruments. The units of analysis were the four specific treatment instruments developed for the four participant groups. Feedback from pilot test subjects allowed the investigator to refine the instruments for the password generation stage of this study.

3.7.4 Usability Testing: Think Aloud Method. This study sought feedback from participants during the password generation stage through the think-aloud method. The first stage of the password generation process used a scripted walk-through, which is a set list of instructions, questions, and tasks to guide participants through the process to determine what problems would arise. This walkthrough process ensured that participants were informed of the security requirements of long passwords and the variety of assistance available for generating them. In the second stage, participants were free to choose a scheme of their preference, use it, and enroll the candidate password. During and after the performance of these steps, the investigator asked participants to express their thoughts, reactions, and preferences in terms of their own experiences. The specific actions in the scripted protocol for this study tested participants’ preferences of the schemes by leading them to discover the range of assistance available and exposing them to practical examples of passwords generated using the various schemes.

73

In addition to the five suggested password-generation schemes, participants were free to use a password-generation scheme of their own choosing. While fulfilling tasks, the investigator asked them to express thoughts and reactions to the schemes and the process of generating their passwords. Thus, the think-aloud method solicited input from user perspectives of their experiences during the process. The unit of analysis for the think-aloud stage was the password-generation scheme chosen by the participant. The think aloud process provided insight into the thought processes of users as they performed each step of the process of choosing a preferential scheme, generating a password, using it, and reentering the new password into the system. Thus, think aloud testing provided qualitative data about preferences and usability issues as they happened during the process. This study chose participants purposively from students at the Florida State University College of Information. These students had repeatedly dealt with the authentication requirements of networked computer systems that are common to the very large population of the networked computing world. Thus, the study assumed that such student experiences with the treatments involved could uncover usability issues and provide meaningful findings for the population at large.

3.7.5 Log Analysis. This study used web-based delivery of password enrollment, initial survey questionnaire, subsequent password input, and final survey questionnaire. A dedicated, custom website ran the Linux 2.6.16-25 kernel, the Apache 2.06 web server, PHP/MySQL database software, and the Unit Command Climate Assessment and Survey System (UCCASS) survey software to collect and authenticate usernames, passwords, and survey input. The Apache webserver collected data regarding the number of .htaccess attempts, the time of login attempts, and password input successes and failures and logged this data into a database for further analysis. The units of analysis were the individual records within the server logs, and analysis of these logs provided circumstantial data about the participant password performance in terms of both input accuracy and overall memorability. Linux kernel and TCP/IP traffic logs, combined with final questionnaire data confirmed the validity of these logs.

74

3.7.6 Usability Testing: Survey Questionnaires. The second and the fifth data collection method used in this study were surveys in the form of web-based questionnaires. These questionnaires solicited information from users about the decisions they made in choosing one scheme, rejecting others, generating a password, inputting the password, and recalling the password over the long term. Their purpose was to gain a better understanding of user preferences and decision-making steps when faced with the need to generate and use robust passwords. They gathered data to compare user perceptions of the usability of long passwords in each stage of use with the data collected using other data collection methods. The unit of analysis in the usability testing phase was the password-generation scheme chosen by the participant. Standardized stimuli and carefully worded questions revised with findings from the expert testing stage served to increase the reliability of data collected. The validity of surveys is always problematic, and the investigator limited analysis of exploratory survey results to participant perceptions of the usability of the schemes and treatments under test.

3.8 Data Analysis and Reporting The data collected using different methods of usability testing formed a composite picture of (i) the various password-generation schemes under test, (ii) different levels of assistance, (iii) the reiteration of password input, and (iv) the usability of passwords generated. This study used five data collection methods and the analysis and reporting of each was appropriate to the nature of the data. Cryptanalysis examined the scope of password strength in terms of alphabet (search space), length, and other entropy considerations. This positive analysis examined the 20- character passwords in plaintext to identify weaknesses and other characteristics. Expert testing relied on the expertise of information security experts to assess the strength and usability of passwords generated by the five password-generation schemes under test. The units of analysis were the example passwords supplied by treatment instruments and the actual password generated by participants in the study. It also determined the relative merits of the various password-generation schemes, and the analysis was reported side by side in terms of the relative merits of the five schemes.

75

Think-aloud user testing sought qualitative data of participant experience of the process of password generation. It was used in both the pre-testing and usability testing stages of the study, and the units of analysis were the treatments under test. This study administered eight survey questionnaires that were online and stored the data collected in a database. The questionnaires sought participant perspectives acquired during password generation, recall, and input stages in the study. Whenever appropriate, the questions solicited answers in terms of Likert psychometric scales, user comments, or both. This allowed discernment of participant agreement with statements according to an ordinal level of measurement, and normative analysis of participants’ opinions. Additionally, some of the questions in the initial survey reappeared in the final survey, providing the longitudinal advantage of excluding time-invariant unobserved individual differences.

3.9 Quality of Data This study used five data collection methods to answer its research questions, and used various techniques to assure the quality of data derived from each in terms of reliability and validity. In this multimethod approach, triangulation of findings from the various methods used to answer the same questions pointed out inconsistencies in the results. The use of multiple methods in the measurement of a psychological trait ensured that any variance observed was not the result of their methods, but accurately represented the trait (Brewer & Hunter, 1989), and triangulation reduced potential bias introduced by the individual data sources (Jick, 1979). This study utilized multiple methods to enhance the validity of its findings. The multi- method approach offered five distinct advantages: 1. triangulation for convergence of results, 2. discovery of complimentary facets of a phenomenon, 3. use of the first method to inform the second sequentially and developmentally, 4. discovery of emerging perspectives and contradictions, and 5. expansion of the scope and breadth of the study (Creswell, 1994, p. 175). This study purposively utilized multiple data collection methods to allow triangulation of password usability results from quantitative and qualitative methods. Specifically, comparison of participant perspectives of their password remembrance and input accuracy with actual input logs compiled by the web server and actual input passwords in a database revealed bias in

76

participant self-perception. The think-aloud protocol, use of multiple password-generation schemes, and the ability of participants to use their own schemes all served to uncover multiple facets and personal preferences of contemporary password management. Expert testing, cryptanalysis, and think-aloud testing sequentially informed password parameters and treatment instruments, and aided in the development and refinement of web-based protocols for the study. This study extended over seven weeks and participants logged in and reported on their successes and difficulties weekly in short, access-controlled surveys, and this approach provided a diachronic perspective of participant experiences of long password remembrance and input that serves to test current perceptions of the user as the weakest link in information security. Finally, the multiple methods contrasted actual password performance with participant perception of password performance, and allowed a forum for participant voices regarding long password strategy, preference, and usability. Because participants in this study were free to use the password-generation of their choice, the investigator made efforts to assess the consistency of the findings of the individual methods of the various tests. By focusing on a relatively small number (five suggested, plus alternatives preferred by participants) of password-generation schemes, user-testing instruments were able to delve into greater detail and dedicate more time to each scheme, thus providing a more complete picture of the usability of each scheme. This research also assessed the findings from each individual participant for internal consistency to limit problems of internal validity. Because testing did not continue longer than four months, maturation, historical issues, and experimental mortality errors were limited. Because participants were not required to engage in the process more than once, maturation, mortality, regression to the mean, retesting, and instrumentation errors were minimized. The methods used in this study were designed to ensure that the data collected was valid and reliable. In addition to the parameters noted above, this study addressed issues of validity, reliability, and consistency in terms of the following specific criteria: • Content validity – Testing accounted for each primary component of password usability: ease of generation, ease of input, and memorability. The initial test of ease of password generation occurred during the think-aloud sessions, held in late September, 2006 among 19 respondents, and the initial survey, issued in early October, 2006, that requested opinions from all participants regarding the usability of the schemes (Appendix C.2).

77

Subsequent weekly surveys and the final questionnaire (Appendix D) solicited participants’ opinions about ease of input and memorability; • Criterion validity – Cross-comparison of related data from each method determined if they made sense in terms of the data provided by other methods, and established that the methods were performing complimentary assessments. This study distinguished three aspects of password usability: generation, recall, and input, and to gain a complimentary assessment of these phenomena, it compared quantitative measures of password input accuracy and memorability webserver logs and database entries with qualitative participant perceptions expressed in surveys. This assessment was performed at the end of the data collection stage in early December, 2006; • Construct validity – This study clearly defined and measured password usability by the standards of generation ease, recall ability, and input ease. After the pilot test in late September, 2006, the investigator checked the reliability of the scales used in surveys for internal consistency, and refined the scales to better capture the range of participant response; • Face validity – During the expert testing phase in mid-September, 2006, outside researchers and programmers with expertise in information security evaluated the study instruments and made recommendations. The investigator then refined the instruments, and pilot-tested them in late September, 2006 to improve the questions, format, and scales used; • Internal validity – The investigator compared data provided by different tests within same method (e.g. multiple user tests of the same password-generation scheme), looked for consistent results, and interviewed nineteen key informants were interviewed in think- aloud sessions in late September, 2006, and the initial survey, weekly short surveys, and the final survey in early December, 2006 solicited feedback from participants, as member checks, to ask whether conclusions were correct; • External validity – In recognition of the specifics of the participant groups, which were purposively chosen from students at the Florida State University College of Information, this study limited the generalization of findings to the parameters of the data. Despite this limitation, the investigator randomly assigned participants to treatment groups, and made

78

repeated efforts to reengage non-responders with a sequence of status notifications via email correspondence; • Reliability – To increase the reliability of data, the investigator: pre-tested all instruments for clarity of language and relevance to the topic in pilot tests in mid-September, 2006, performed all assessments of the usability of password-generation schemes during the generation stage by a single investigator in September, 2006, recorded participant comments during the think-aloud sessions for later verification, posed questions to participants that were relevant to their direct experiences during the stages of password generation, recall, and input over the course of the study, repeated some questions from the initial survey in late September, 2006 on the final survey in early December, 2006 (i.e. the test-retest method) to make the same measurement twice, and the answers checked for variation, checked the reliability of the scales used in surveys for internal consistency, used statistics appropriate to the units of measurement to compare groups, used web-based format and delivery of this study and its instruments (except consent forms) to increase its replicability, regardless of geographical limitations, and measured the input accuracy of generated passwords in terms of participant perception, automated logging, and participant entry into an online database; • Consistency – Comparison of data derived from different methods checked for agreement about the usability (in terms of recall and input accuracy) and security (in terms of entropy) of password-generation schemes and their resultant passwords, and The investigator checked the reliability of the scales used in surveys for internal consistency across the pilot test (mid-September, 2006), initial survey (late September, 2006) final survey (early December, 2006), and a survey of non- respondents and dropouts (early December, 2006). This study included pilot testing of survey instruments to establish their face validity and to improve the questions, format, and scales used. This pilot testing involved social researchers and the investigator incorporated its results into subsequent instrument revisions.

79

Because this study required participants to interact multiple times – consent, password generation, initial survey, six logins with short surveys, and a final survey – all using their generated passwords, it was expected that the dropout rate will be high. Table 5.1 summarizes the number of returns and non-returns, and a response/non-response bias procedure determined the effect of non-responses and dropouts on survey estimates by contacting non-respondents to determine if their responses differed from those of respondents (Creswell, 1994, p. 124). Additionally, to enhance validity, reliability, and overall consistency of data, the investigator reevaluated these criteria during the course of the study to inform the data collection process and the instruments used.

3.10 Methodological Limitations and Assumptions This study used multiple methods of usability testing on the assumption that they collectively would provide more consistency in results. The investigator assumed that quantitative authentication logs would produce results consistent with results from surveys and think-aloud procedures. Following previous studies, this study assumed that the user survey is the best means of assessing password usability (Zviran & Haga, 1993; Yan et al., 2004), but it also used expert testing and log analysis to acquire data from both user and expert perspectives. Cryptanalysis provided further breadth to the data by establishing the context of password strength as a cryptographic threshold. This study used Florida State University students as participants purposively because they represent the population of networked computer users in many ways. Their exclusive use, however, limited of the generalizability of findings onto the general public because there was no assurance that they were typical of the universe of secure networked computer users in unforeseen ways. The small number of subjects also limited the statistical significance of findings to the population at large. This study limited testing to the two password-generation advice interventions considered most important by the investigator: (i) password examples provided and (ii) password reentries required. This consideration stemmed from research that uncovered security vulnerabilities resulting from inadequate user involvement in the security process (Adams & Sasse, 1999; Sasse et al., 2001), and from research that demonstrated the recall gains resulting from reuse of TBR items (Tulving, 1982). Because this study purposively relied on participant self-selection of the

80

five password-generation schemes used, it did not directly test the schemes by assigning them to groups randomly. The assumptions behind this were that (i) self-selection would reveal participant preferences among the individual schemes and (ii) participant interest in a particular scheme would increase usability. This study assumed that the bulk of the password problem is situated at the human end of the HCISec spectrum. If this assumption is correct, meeting the keyboard based password challenge requires extensive usability testing with actual users. This exploratory study used data collection methods designed to uncover usability problems and user perspectives associated with schemes to assist users in password generation.

3.11 Chapter Summary The methodology used in this study was intended to test the effect of assistance, in the form of password-generation schemes, practical examples, and multiple reentries during the generation stage, on the usability and ultimate memorability of robust passwords. To this end, it allowed participants to use the password-generation scheme of their choice, and used five primary data collection techniques: cryptanalysis, expert testing, log analysis, think-aloud user testing, and two surveys of user experience with the process. The methodological components of this study were complimentary. Together, they sought answers to fundamental research questions surrounding the generation, recall, and input of robust passwords. For a clear understanding of current threats to password security, the cryptographic analysis in Chapter 4 determines the minimum requirements of robust passwords to endure contemporary and projected attacks. This study required that all passwords surpass this cryptographic threshold, and all instruments and authentication mechanisms used in this study incorporated this requirement. This study followed a strict research design. Randomly selected groups from the study’s participants used different instruments instructing them in password generation. Instruments suggested five password-generation schemes to assist participants while generating passwords, but also allowed participants to use a scheme of their own. Treatment instruments facilitated this password-generation process with examples and encouragement provided to help subjects in the process. Participants were also free to printout or write the password as a backup to use in the case of forgetfulness. The specific differences between the four groups’ instruments were (i) the

81

number of examples password provided, and (ii) the amount of iteration used to replicate the Recoding, Ecphory, and Conversion events central to the GAPS model. Instruments supplied two of the treatment groups a single example of passwords generated under each scheme, five to the third group, and ten to the fourth group. Instruments also required two groups to reenter their new passwords successfully only once, as is the common practice, to confirm its accuracy, and groups to reenter their new password successfully four times. The methods used in this study collectively evaluated the effect of engaging the user in the security process in select ways, and thus went beyond the assessment of password strength that was the focus of most previous password research. To this end, the data collected concerned the ease of the password generation and subsequent recall and input over the test period. An important purpose of this study’s methodology was the use, assessment, and refinement of Tulving’s GAPS framework as specifically applied to the password challenge. In particular, the password generation stage of two treatment groups was designed to make a deliberate loop out of the cueing and memory performance stages of GAPS by reiterating the Recoding, Ecphory, and Conversion sequence five times, using the password as the TBR item. Using Tulving’s GAPS as a framework, the study engaged subjects in creating a series of purposeful events to reinforce the concept, format, and the kinesthetic input of the password.

82

CHAPTER 4: PASSWORD STRENGTH TESTING

4.1 Introduction The standard view among security professionals is that usable and memorable passwords are weak, while strong passwords are difficult to use because they must be long and complex in terms of random letters, numbers, and symbols. On this view, long passwords, while more resistant to cracking than trivial passwords, place an unacceptable cognitive load on users in terms of memorability and input effort and accuracy. This study researched the question of what level of cryptographic strength is necessary for secure passwords. To answer this research question, this study used cryptanalysis and expert testing, and this chapter reports the findings from these two methods. Section 4.2 provides an overview of the cryptographic strength that passwords require to withstand specific current attacks, and establishes the threshold of cryptographic strength that all passwords under investigation must exceed. Section 4.3 examines the desirable password property of entropy and effective methods to increase it while minimizing input errors and recall failures. Section 4.4 presents two empirical methods of measuring password entropy. Section 4.5 presents the results of expert testing on selected example passwords generated by participants during the study.

4.2 Password Cryptographic Strength Because information security has proven persistently and inversely proportional to information system usability, techniques to combine user strengths with computer strengths are desirable for the design and operation of secure information systems. Spafford isolates seven “failure modes” of passwords, depending on where they are used and the threats arrayed against them. These are: disclosure, inference, exposure, loss, guessing, cracking, and snooping (Spafford, 2006). All passwords, regardless of cryptographic strength, are subject to failure in the modes of disclosure, exposure, loss, and snooping, but the long passwords tested in this study are more resistant to the remaining modes of inference, guessing, and cracking than the typical

83

random 8-character password. Inference occurs when there is a pattern to the way the passwords are generated that can be discerned or when the generation algorithm is predictable. Guessing is the primitive application of likely password formulations based on common passwords used in the past, or on personal knowledge of the user. Inference and guessing rely primarily on social engineering and the probability that the password is naïve, and they favor an attacker with personal knowledge of the user. Cracking involves the capture and algorithmic attack on a processed form of the password such as a hash (Spafford, 2006). The steadily increasing power of attackers’ machines enables cracking on ever higher levels, but the state of the art in password cracking essentially ignores long passphrases, so passphrases, especially those over fourteen characters, are currently highly resistant to even modern cracking techniques. Password cracking techniques are increasingly required because modern authentication mechanisms use unkeyed cryptographic hash functions to assure password integrity. If h is a hash function, and x a password, the fingerprint, or “message digest,” y can be represented as: y = h(x). If x is changed to x’, even by one bit, the result, y’, derived from computing y’ = h(x’) is dramatically changed. Thus y’ ≠ y. The security of hash functions is based on resistance to pre- image, second pre-image, and collision attacks (Stinson, 2002, p. 119). The strength of the hashing algorithm is variable: MD5 sums are typically 128 bits, while the SHA family of hashes provides 128, 192, and 256 bit sums, and this resistance forces attackers to resort to pre- computing or more advanced Markov chain algorithms. Although many advanced cryptographic hashing techniques are available for use today, the ubiquity of the Microsoft Windows operating system and the advantages of backward compatibility results in the fact that a large percentage of the systems today use the Microsoft LanManager (LM) security hash, which truncates passwords to 14 case-insensitive characters. In addition, the LM scheme splits the password into two seven-digit halves, and both can be compared against a single hash table, further reducing the effective number of characters to seven. Thus, the total number of permutations that results from these limitations is: 68 + 682 + 683 + 684 + 685 + 686 + 687 = 6.823331935 x 1012 (NeoSmart, 2006, p. 4). This number is well within the processing power of current machines using rainbow tables and Markov chain algorithms. Despite this limitation perpetuated by backward compatibility with legacy LM systems, UNIX, Linux, Apple MacOS, and modern Windows OS’s allow up 256-character

84

passwords, including Unicode characters, and all but Windows further salt the password before hashing. In an early study, Klein found that 25% of 14,000 passwords in his seminal study could be found in a dictionary of only 3 × 106 words (Klein, 1990). This suggests that the users were choosing primarily easily remembered and guessable passwords from the much larger potential password space, which, for 8-character passwords, is about 2 × 1014. Because of this, the password remained a primary target of the attacker’s cryptanalysis because the user’s choice of password was typically much weaker than optimal for the hashing algorithm being used. Because of advancements in pre-computing attacks, attackers can now easily crack all permutations of short passwords drawn from the entire keyboard search space, and all passwords protecting non- trivial resources must now be substantially longer than eight characters. Porter foresaw this, and introduced the modern concept of passphrases in 1981 to overcome the weaknesses of short, guessable passwords (Porter, 1981), and security experts now recommend that a secure password protecting valuable resources should now contain a minimum of twenty characters (Burnett, 2006, p. 121-4). In addition to the security requirements of robust passwords, which are essential for keeping intruders out of the system, their effectiveness also depends on the ability of the authorized user to recall and accurately input them. The previous password studies surveyed in Chapter 2 discovered strong user aversion to strong passwords, and if the longer passwords required for security are to gain acceptance among users, methods of increasing their usability are desirable. As discussed initially in Chapter 2, password strength can be viewed as the product of two main factors: 1. a – The size of the alphabet, or character set from which the password is drawn, and 2. n – The length of, or number of characters used in, the password, To crack an unknown password, the attacker must compute an possibilities, although, on average, an/2 attempts are required, and this indicates a significant defender’s advantage provided by the long password. To illustrate the correlation of password length to password strength, consider the effect of increasing the size of the alphabet by a factor of d on the search. It is clear that the result is: (da)n. On the other hand, if d is factored into the length, the result is: and. Since (da)n = dn x an, and and = an x a(n-1)d, the common an can be factored out, leaving dn, a polynomial gain, for

85

alphabet enhancement, and a(n-1)d, an exponential gain for length enhancement. Thus, it is clear that increasing the password alphabet size results in a polynomial gain in security, while increasing the password length results in an exponential gain. Leveraging this exponential gain is essential to maintain the defender’s advantage over attackers, whose computing power, as noted in the previous chapter, also increases exponentially, roughly according to Moore’s law. Despite the clear advantages of length on password strength, many security experts advocate increasing the alphabet size to increase security. For example, NeoSmart suggests the use of an expanded set of non-printable Unicode characters, including commonly used symbols, accented characters, and nonsense symbols. The reasoning is that any password employing these characters is secure, since the cracker is unaware of the huge alphabet in use. Although the number of possible permutations using the ~700 common Unicode characters is approximately 8.25 x 1019 (NeoSmart, 2006, p.4), this is generally considered bad security practice. According to Kerchkhoffs’ principle, cryptographic security must be based on the assumption that the opponent knows everything about the cryptosystem except the key (Kerchkhoffs, 1886), which, in this case, is the password itself. Conformance with Kerchkhoffs’ principle makes it is a fallacy to assume that the attacker is unaware of the use of Unicode characters, even though their use undoubtedly makes passwords stronger and cracking them more difficult. The gain from incorporating Unicode characters into a password is to increase the search space from 94 to approximately 700, but this gain comes at the expense of the four combination keystrokes required to generate them. For example, to enter Unicode characters in Windows OS, the user must type four numeric keystrokes while holding the Alt key, or Alt+(n+n+n+n). Because the Unicode character set is potentially 15 times larger than the 94-character ASCII set, this trade-off is approximately 15an+1 versus an+4. This represents a significant polynomial gain, but only at the expense of the even greater exponential gain achievable by simply using four additional characters instead of Alt+(n+n+n+n). Also, in terms of input accuracy, the use of Alt- key combinations requires either two hands or a high degree of dexterity with one hand and is thus a potential source of password input error. Unicode is also technically problematic because it is unevenly implemented across natural languages, it provides multiple means to generate identical symbols, and it is not backwardly compatible with some legacy systems. In the absence of Unicode, modest improvements to password strength are available by including at least one character from: (i) lower-case letters, (ii) upper-case letters, (iii) numbers,

86

and (iv) symbols from the 94 printable ASCII characters. The gain from such inclusion remains polynomial, however, since the search space increases less than four-fold from the 26 lower- case, easy to type, letters to the full 94-character set, many of which are also difficult to type reliably. A simple passphrase composed solely of six English dictionary words is semantically weak but cryptographically strong, since a long passphrase can easily compensate for its lack of character complexity, which decreases the reliability of input, with its sheer length. Even if attackers had access to a hash of the passphrase and a priori knowledge that the 32-character password was constructed entirely of lowercase characters and the space key, they would still need to attempt an average of 3.18 x 1045 guesses to crack it using brute force. As a reference, a 3 GHz Pentium-class computer generating 5,000,000 attempts per second would require an average of 2 x 1011 years to crack it. A more sophisticated passphrase cracker that used words instead of characters would need to test 7.8 x 1021 guesses, on average, to brute force a six-word passphrase drawn from a 5000-word vocabulary, and the 3 GHz Pentium machine would require only 5 x 107 years. Thus, even with a reduced character set (plain English), the 6-word passphrase is roughly equivalent to an 11-character random password (itso.iu.edu, 2006), but is typically easier for users to input reliably and to remember than an arbitrary password. Porter, Zimmerman, and others in the security community were early advocates of long passphrases made of memorable phrases because they are stronger than the shorter, but much more random, passwords in common usage. Thus, a long passphrase, such as “A day that will live in infamy,” is not only stronger than a typical 8-character random password, such as “&3Tw9#p!,” but easier to remember and to type. Such a passphrase is strong, but not as strong as it could be because it naïvely uses words instead of random characters. This has the effect of greatly increasing the alphabet to a dictionary of modest length, but the number of words is only seven, and there is no doubt that crackers are developing means to attack such naïve formulations. The next section introduces simple means to dramatically increase the cryptographic strength, or entropy, of simple passphrases.

4.3 Password Entropy Enhancement Entropy is a measure of the probability distribution of the disorder within a system, and in information systems it can be viewed as a measure of the lack of information in a sequence.

87

Alphabet size, length, and randomness together determine the cumulative entropy of a password. Shannon proved that the entropy, H, of a discrete random variable x, over the set of n is:

or that the entropy of an event x is the sum, over all possible outcomes i of x, of the product of the probability of outcome i times the log of the inverse of the probability of i (Shannon, 1948). The choice of the logarithmic base of two is arbitrary, but is useful because it yields entropy directly in term of bits, or binary digits. This is because the entropy, in bits, of each random character or symbol in a password is the base-2 logarithm of the number of possibilities. If all characters of a password are truly random, the entropy of the password is the number of characters times the entropy per character. The total entropy in a password is cumulative and determined by the number, order, and variety of characters within it. Tables 2.1, 2.2, 2.3, and 2.4 directly show the effects of alphabet size and password length on attack times, but the determination of a password’s entropy is more complex, as illustrated in Table 2.9. Because the elusive quality of randomness, in terms of the character set and sequence of characters, determines the entropy in a password, information entropy is a difficult parameter of password strength to measure. Just as a password’s strength can be increased with random characters, a passphrase can be strengthened by using unexpected sequences of words with no semantic or historical relation to each other. Although the entropy of natural language text is low compared to random text, (See Section 2.8), the use of odd or unexpected punctuation, capitalization, and intentional misspelling can strengthen even common sentences or phrases. Johansson considers a 6-word passphrase roughly as strong as a completely random 9-character password, and argues that, since most people are better able to remember a 6-word passphrase than a totally random 9- character password, the passphrase may be advantageous (Johansson, 2006). Length is the single most important security consideration for passwords because it readily compensates for the lower entropy caused by using a smaller character space and the patterns occurring in natural language, illustrated in Tables 2.6, 2.7, and 2.8, that are typical of passphrases. Although length is by far the most important factor in password entropy enhancement, the inclusion of only a few unexpected special characters remains highly desirable if it is not the cause of undue additional input error.

88

Table 4.1 shows the effect of cryptographic search space on entropy. The lower case alphabet has 26 letters, and entropy of 4.7 bits per symbol. The 62 permutations of upper and lower case letters combined with the 10 numbers yields entropy of 5.95 bits per character. The addition of symbols in the 94-character ASCII set yields 6.55 bits per character. Thus, an

Table 4.1: The Effect of Search Space on Entropy Search Space N Entropy Digits only (0-9) 10 3.32 bits/symbol Single case letters (a-z) 26 4.70 bits/symbol Single case letters and digits (a-z, 0-9) 36 5.17 bits/symbol Mixed case letters and digits (a-z, A-Z, 0-9) 62 5.95 bits/symbol All standard U.S. keyboard characters 94 6.55 bits/symbol Non-ANSI Unicode characters 700 9.50 bits/symbol (Source: Diceware, 2006)

eight-character password of single case letters and digits would have approximately 41 bits of entropy. The same length password selected at random from all U.S. computer keyboard characters would have approximately 52 bits of entropy; however such a password would be harder to memorize and enter on keyboards. Researchers have determined the entropy for the approximately 700 non-ANSI Unicode characters to be 9.5 bits per symbol, and estimate that each doubling the number of symbols increases the entropy per symbol by one bit (Neosmart, 2006). A relatively new form of brute force attack on compromised password hashes uses rainbow tables. Rainbow tables are pre-computed lookup tables of specific hash sets indexed by all possible passwords within pre-defined search space and length parameters. If the attacker can acquire the password hashes of authorized users, it is relatively easy to compare them to those in the rainbow table and drill down to the original password. There are several impediments to this attack, however. To gain even the relatively weak Microsoft LM hash is difficult for the “man in the middle” on modern, switched networks, and, even if hashes are known, the rainbow table for the LM hash computed from the alphanumeric keyspace and the 14 symbols on top of a US English keyboard takes about 17 GB of storage (Johansson, 2005). The Rainbow Crack LM configuration #6 produces a rainbow table capable of cracking 14-character LM passwords. With a keyspace of 243, and a success probability of 99.9%, the table is 64 GB, and takes years to

89

generate with a PC-class computer. Rainbow Crack uses Oechslin’s time-memory trade-off technique, but it is not practical when applied to stronger hash algorithms, salted hashes, or to LM or NTLM “challenge-response” style hashes (Project Rainbow Crack, 2006). For LM passwords beyond 14-characters, rainbow tables become extremely unwieldy to generate, as the required table size and storage requirements increase exponentially. These are enormous hurdles for casual attackers, but attacks are possible for those with vast resources, and there are sophisticated techniques, such as Markov chain algorithms, that substantially reduce the amount of intermediate storage required, and attackers with overwhelming expertise, computing power, and storage can no doubt pre-compute permutations beyond 14 characters. The threat posed by the dictionary attack is another justification for the use of long passwords. Cox argues that a naïve password, such as ‘david’, although five characters chosen from an alphabet of twenty-six, with 265, or 1.19 x 107 possibilities, is actually very vulnerable to simple dictionary attacks because common names appear in short 6,500-word dictionaries, yielding an effective entropy of only 6.5 x 103 (Cox, 1998). Because attackers can exploit knowledge of cultural biases and personal contexts, passwords remain preferred targets of dictionary attacks if they contain fewer bits of entropy than the cryptographic key of the underlying algorithm being used. For example, even the obsolescent Data Encryption Standard (DES) is considered to have about 55 bits (~4 x 1016) of entropy and the IDEA, MD5, and SHA- 1 algorithms are considered to have about 128 bits (~3.5 x 1038). Because of dictionary attacks, the use of common English words and punctuation as passphrase elements is problematic because such distinct and common elements are guessable, and attackers can design cracking tools that use words rather than individual characters (Epps, 2006). If the words, phrases, or components of a passphrase are findable in a dictionary, an attacker has a greater chance of cracking the password by an automated dictionary attack. However, the required effort is dramatically increased if enough words are used in the passphrase and if those words are pseudo- random or unexpected. Current password cracking tools are necessarily limited in terms of computation and storage, but they have displayed remarkable improvement in recent years. Still, the mathematical advantage always lies with the defender; especially those who use simple techniques, such as misspelling and bad grammar, to dramatically increase the strength of usable passphrases (Johansson, 2006). For ease of typing, password characters may be mostly lower- case, but capitals, numbers, punctuation, spaces, etc. should be included nonsensically to the

90

attacker, yet meaningfully to the user, to increase strength. For example, to increase the entropy of a 29-character passphrase, Johansson shows that adding substitutions to the words in the passphrase, essentially removes them from the dictionary. This makes cracking much more difficult, because adding 10 symbols to the 26 letters gives a 36-character set with 5.17 bits of entropy, yielding a cumulative 150 bits of entropy (Johansson, 2005). In addition to the above methods of strengthening passphrases, Ellison et al. suggest a promising threshold cryptography scheme in which users strengthen their passwords by using “personal entropy” from their own lives, into multiple, simple passphrases that can be combined in various ways. This technique promises cumulative entropy that is the sum of the entropy of the individual passphrases. The minimum number of “shares” required to recover the secret must be protected with answers that have at least as much combined entropy as an adequate brute force work factor. This is a practical application of Shamir's secret sharing scheme, which in turn is based on a Lagrange interpolation (Shamir, 1979). This study required all participants to generate and use passwords containing at least twenty characters. This is in part to overcome the industry standard weakness caused by the LM scheme, but also to determine if users could make and share secrets as strong as the hashing algorithm used. When a password’s entropy exceeds the entropy of its resultant hash, the cracker is reduced to mere guessing. A robust 128-bit hash is equivalent to 16 random bytes, so any password shorter than 17 characters cannot be stronger than its hash. This study imposed the 20- character minimum with the aim of obtaining passwords stronger than an MD5 or SHA-1 hash.

4.4 Password Entropy Measurement Techniques To compare the strength of passwords, it is desirable to establish a methodology to measure entropy empirically, even though, as discussed above, information security is both art and science (Bishop, 2003). Although the factors of password strength – alphabet, length, and overall entropy – are mathematically demonstrable using Shannon’s formula, researchers have suggested only two practical methods of measuring password strength in terms of entropy. Reinhold (1996) estimates the entropy of passphrases using the following simple formula: Entropy = 15 x dictionary_words + 5.5 x chars x (1 - dictionary_words / total_words). This estimate assigns 15 bits of entropy to each English dictionary word and 5.5 bits per character to non-dictionary words, Reinhold used this formula to analyze data from a small

91

survey of PGP passphrases, and Table 4.2 lists the resultant passphrase entropy for the responses. This reveals the tendency of most users to use passphrases with

Table 4.2: Reinhold’s Survey Results Minimum 30 bits First Quartile 60 bits Median 75 bits Third Quartile 157 bits Maximum 473 bits (Source: Reinhold, 1996)

substantially less entropy than the standard PGP IDEA 128-bit session key. It should be noted that Reinhold considers this formula to be crude and preliminary, and that the respondents to this small survey were advocates of privacy and active users of the PGP cryptosystem. As an alternative, Engelfriet (2005) suggests the following formula to measure the security of PGP passphrases containing entropy enhancement: PS = RW/8 + RC/20 + RL/28 + LC/107 x FF in which: • PS = passphrase security; • FF = fudge factor (this is an attempt to include variables like nonsense phrases, odd spelling, punctuation, capitalization and numbers); • RW = random words (not a nonsense phrase); • RC = random characters; • RL = random letters; • OC = odd characters (not lower case letters); • LC = total character count (letters in whole words, spaces ignored) (don't count if a totally random system is used.); • F1 = 0.5 = nonsensical phrases hooked together; • F2 = odd spelling/misspelling, punctuation and capitalization (This is a permutation dependent on the number of characters changed and the length of the words used. To simplify, use F2 = 4 x OC/LC); • F3 = .09 = random numbers (exclude if F2 is used); and

92

• FF = 1 + F1 + F2 + F3 (Engelfriet, 2005). Using this formula, he estimates that passphrases with PS values less than .35 are unacceptably weak for use in a robust PGP system, and can currently be broken in less than a year. Table 4.3 lists his examples of passphrases and the PS numbers associated with them.

Table 4.3 Engelfriet’s Passphrase Security Scores PS Phrase Description .280 There is a sucker born every minute Average phrase Ignorance is bliss. spgemxk Education cures Phrases with some random .761 ignorance. letters betty was smoking tires in her peace of pipe Nonsense phrase .855 organs and playing tuna fish A6:o@6 Ls+\` uGX%3y[k A random bunch of 1.050 characters paper factors difference votes behind chain Random words 1.125 treaties never group Web oF thE Trust is BrokEn cAn You Glue it Odd capitalization, 1.340 Back ToGether? and give it xRays punctuation, and nonsense (Source: Engelfriet, 2005)

The PS score is thus the sum of the various entropy enhancing components within a password, and this is consistent with Shannon’s determination of entropy as the probability distribution of the uncertainty within a system. The following section measures the entropy of this study’s participant passwords using both Reinhold’s and Engelfriet’s passphrase strength measurement formulae.

4.5 Expert Testing Based on the cryptanalysis in section 4.3 and the recommendations of security experts, the investigator established twenty characters as the minimum length of passwords allowed for use in this study. The choice of twenty characters is somewhat arbitrary, but it well exceeds the cracking capabilities of brute force, dictionary, and basic rainbow table attacks, many of which are limited to passwords within the LM 14-character cut-off point. In addition to this minimum length parameter, instructions given to participants also required them to enhance the entropy of their passwords in several ways. These enhancements included unusual punctuation and

93

capitalization, inclusion of numbers and symbols, and the elision of spaces between passphrase words. Five information security graduate students evaluated passwords generated by participants in the study in terms of cryptographic strength and plaintext entropy. These experts were able to view the passwords in plaintext and to assess relative strengths and weaknesses directly. They selected the eight strongest, the eight weakest passwords, and the eight passwords most cryptic in plaintext, as listed in Table 4.5, for further testing. In addition to the expert testing, Reinhold’s and Engelfriet’s formulae provided other means to evaluate the strength of the passwords selected by the experts as the weakest and strongest generated by participants in the study. Table 4.5 also lists the scores of the passwords according to entropy scores using Engelfriet’s and Reinhold’s formulae. The first category contains those examples selected as the weakest eight, the second category contains the eight that are most cryptic in plaintext, and the third category contains the strongest eight. There is some overlap between categories, such as Passwords 22 and 23, which are in the Strongest 8, although they are also cryptic acrostics. The experts had the advantage of viewing these passwords in plaintext for their assessments, and this made it relatively easy for them to compare the cryptographic strength of the passwords. All experts agreed that even Passwords 1 and 4, the weakest among all generated by study participants, would be difficult to crack, even using rainbow tables because of their length and high entropy. As displayed in this table, the PS column lists the scores according to Engelfriet’s Password Security index, the RE column lists the scores according to Reinhold’s estimation of entropy, and the SG column indicates the study treatment group of the participant who generated and used the password. These two entropy-measuring formulae, previously discussed in Section 4.4, provided the method of deriving the comparative scores listed, and the passwords listed are

94

Table 4.5: Entropy Scores of Selected Passwords Password PS RE SG 1 94PontiacGrandPrixSE .69 57.5 2 2 BUTCHERblockcantstop .68 60.0 1 3 1505WestTharpeStreet. .63 62.2 4 4 FredfredfredwasDEAD! .52 75.1 3 5 ShoesOffTheOtherFoot .62 75.9 3 6 Weakest 8 130390THStNorthBergen .86 77.7 1 7 IneedtogototheBATHROOM .77 82.2 3 8 801SW56TERR,Plantation,FL .88 85.0 1 9 1994Wrefilwe!waaka19 .99 110.0 1 10 n3hbwesittiobingenaJ .97 110.0 2 11 ILT#MGITa3DBSIaBWASG 1.03 110.0 1 12 Mmlmlnoeitwww#1Ilhsm 1.13 110.0 3 13 MyLoBaCa2SuMeThWoWee 1.51 110.3 4 14 7tsd3fru3fubar@063084 1.34 115.5 2

8 Most Cryptic 15 1iptLuiDutiptrmB2iaatbz! 1.44 132.0 2 16 HYETTAS?ynu...tnsnihniwyta 1.56 137.5 2 17 TakeiThomechewiTiT’sdelicious .96 92.9 4 18 OpinionsRlikeASSHOLES-everybodyhas1 1.12 96.2 3 19 14straightBraves>12,14sinceSEMINOLES 1.58 109.5 3 20 Meetmeatourfavoritespot2672blue#147 1.26 128.3 4 21 iloveasianfoodfromFAREASTCUISINE4073443502 1.95 144.4 3 22 1PUMPKINayanna@MAMlamar#KIKIvere Strongest 8 2.04 147.0 3 23 tml,tmltmwicsidc,isfycttsfm 1.64 148.5 2 24 thePurPlePorPoiselosthis19

by group in ascending RE scores. Although Engelfriet’s PS formula allows for more sophisticated input, and perhaps for greater precision, the RE score has the advantage of being expressed directly in bits of entropy, and, as a consequence, a password with an RE score over 128 is theoretically stronger than its corresponding 128-bit hash. Table 4.5 clearly shows three well-known aspects of password strength in terms of entropy. First, the weakest passwords are generally short, while the strongest are long. Second, passwords that are cryptic in plaintext are stronger, in terms of entropy bits per symbol, than passphrases using dictionary words. Third, passwords containing no natural language sequences and odd characters contain the most entropy per character. It is also significant to note that the weakest passwords generated for this study had entropies greater than 57.5 bits, and even this compares favorably to the 52-bit maximum of the truly random 8-character password discussed

95

in Section 4.3. This is in spite of the increased relative input accuracy and memorability of these passwords relative to those of the random 8-character password. The PS and RE scores of the above selected passwords demonstrate a high degree of correlation, although with notable exceptions. As examples, Passwords 4 and 5 achieved a low PS score because of common or repeated words, and Password 19 scored a low RE because it uses only four common words and eight random characters. Such discrepancies between the two entropy measurement methods increase when measuring longer passwords. As examples, the random capitalization in Passwords 13, 15, 19, and 22 significantly boosts their PS scores. Among the Weakest 8 passwords, those generated using the Old Address scheme, such as Passwords 3, 6 and 8, were relatively weak for their length and susceptible to social engineering attacks. The same is true of Password 1, which used a very commonly used car model theme that is highly susceptible to social engineering. The weakest of the passwords generated in this study were also the minimum 20-character length, except Passwords 3 and 6, which contained 21 characters. The passwords included in the 8 Most Cryptic category were all generated using versions of the Acrostic scheme. As discussed in Section 4.5, passwords created in this way have a high degree of plaintext entropy, making them difficult to crack, even when partially compromised. As suggested by this analysis, they also are relatively strong because of the unexpected relationship of adjacent characters. The strongest passwords selected by the researchers include some of the longest used in the study, ranging from 35 to 42 characters. A significant exception is Password 23, an acrostic containing 27 characters. Passwords 13 and 24 illustrate the advantage of unexpected capitalization and other nonsense, and Password 21 shows the strengthening effect of sheer length. This analysis suggests a slight effect of this study’s treatments on password strength. Although one participant from group 4 produced one of the Weakest 8 passwords in the study, Password 3, and participants from group 1 produced the highly cryptic and strong Passwords 9 and 11, none of the Strongest 8 Passwords were from group 1. Participants from groups 2 and 4 produced five of the 8 Most Cryptic, and this may indicate a positive effect of the provision of example passwords during the generation stage. In addition, seven of the Strongest 8 were from groups 3 and 4, and this suggests that the requirement to enter the new password five times may have led to stronger passwords.

96

4.6 Chapter Summary This chapter established a cryptographic strength threshold, dictated by the current threat environment, that provides high resistance to contemporary brute force, dictionary, and rainbow table attacks that all passwords under investigation had to exceed. Section 4.3 examined the metric of cryptographic strength known as entropy and methods of increasing it in passwords, without unduly increasing recall failure or input error. Section 4.4 introduced Reinhold’s and Engelfriet’s empirical methods of measuring password strength. Section 4.5 presented findings from expert testing on selected example passwords generated by participants during the study. This chapter made the argument that password length, when combined with ease recall and input, shows the greatest promise for increasing both the robustness of passwords in the projected threat environment without unduly reducing such passwords usability in terms of recall and input accuracy. Clearly, passwords gain strength exponentially with length, but naïve passphrases suffer from the low entropy caused by the predictable patterns within natural language. These patterns leave naïve passphrases relatively vulnerable, so this chapter also included a discussion of practical techniques to bolster the entropy of passphrases without unduly affecting their usability and memorability. This chapter concluded with an analysis of actual passwords generated and used by participants. This analysis pointed to the weaknesses of short passwords and those not using entropy-enhancing techniques. It also suggested that this study’s interventions during the password generation stage had a slight positive effect on resultant password entropy.

97

CHAPTER 5 PASSWORD-GENERATION SCHEME TESTING

5.1 Introduction This study was based on the assumption that robust passwords need not place an unacceptable cognitive load on users in terms of recall and accurate input. Ever since Porter foresaw the advantage of the long passphrase, security conscious users increasingly have preferred long, memorable passphrases to short, cryptic passwords. Researchers, system administrators, and security experts have suggested various mnemonic schemes to assist users in the generation of robust passphrases to protect IT resources and identities, and this study asked the question of what available password-generation schemes best combined the two desirable and generally incommensurable features of security and usability. To answer this research question, this study identified five prominent password-generation schemes currently in use, and tested them using expert analysis, log analysis, think-aloud interviews, and surveys. This chapter continues the cryptographic strength analysis begun in Chapter 4, and then focuses on the usability of the schemes used to generate those passwords by reporting the findings of these four methods of data collection. Section 5.2 revisits the quality of data issues introduced in Chapter 3 that are relevant to the remaining chapters. Section 5.3 discusses the findings from expert testing. Section 5.4 reports the findings from server log analysis. Section 5.5 includes participant input obtained during think-aloud interviews. Section 5.6 presents self-report data collected by online surveys during the seven-week course of the study.

5.2 Quality of Data Issues This exploratory study tested the effects of different types of assistance given to participants who were required to generate, recall, and input passwords that were much stronger that those they currently used. This exclusive focus on long password usability is unprecedented and driven by advances in password cracking technology. Very few previous password studies testing password performance focused on long passwords or passphrases. This section addresses the specific quality of data issues germane to the remaining chapters of this study.

98

The investigator assumed that participants could generate, input, and recall long text- based passwords if adequate guidance and practice were provided. This study tested the effectiveness of five widely published, but untested, textual password-generation schemes. It also allowed each participant to use an alternate scheme, along with a description of it and rationale for its use, with the aim of discovering new ways to assist users. Because computers and IT systems are designed to manipulate, store, and retrieve data (e.g. memometric authentication secrets) deterministically, this study assumed that the bulk of the password problem is situated at the human end of the HCISec spectrum, and that meeting the keyboard-based password challenge requires training of, and testing with, actual users. The research design of this study specifically tested the effect of assistance in the form of password-generation schemes, practical examples, and multiple reentries during the generation process on the usability of robust passwords. The investigator used random selection techniques to assign the participants to the four treatment groups to ensure that group assignment had no effect on observed findings. This study used five complimentary methods to uncover facets of participants’ experiences and personal preferences of contemporary password management. It used expert testing, cryptanalysis, pilot testing, and think-aloud testing to establish password strength criteria and to develop online instruments sequentially. It used eight access-controlled surveys to discover emerging perspectives and to expand data collection longitudinally. Finally, the multiple methods triangulated data drawn from actual password performance as recorded in server logs with self-report data of participant password performance, while providing a forum for participant voices regarding robust password strategy, preference, and usability. As discussed in Section 3.7, historical issues, and internal validity, maturation, and experimental mortality errors were not significant because testing lasted less than four months. Further, maturation, mortality, regression to the mean, retesting, and instrumentation errors were minimal because participants were not required to engage in the process more than once. Specific testing of three primary components of password usability: ease of generation, ease of input, and memorability bolstered content validity. The initial test of ease of password generation occurred during the think-aloud sessions, held in late September 2006 among 19 respondents, and the initial survey, issued in early October 2006, that requested opinions from all participants regarding the usability of the schemes. In addition, subsequent weekly surveys

99

solicited participants’ opinions about ease of input and memorability to provide a longitudinal perspective. Cross-comparison of related data collected using five different methods determined if they were performing complimentary assessments and increased criterion validity. This study obtained a complimentary assessment of password usability by comparing quantitative measures of password input accuracy and memorability against webserver logs and database entries from participants during the study with qualitative participant perceptions expressed in surveys. The investigator performed this assessment at the end of the data collection stage in early December 2006. To improve construct validity, a pilot test in late September 2006 checked the reliability of the scales used in surveys for internal consistency, and the investigator refined the scales to best capture the range of participant response. During the expert testing phase in mid-September, 2006, outside researchers and programmers with expertise in password strength and usability evaluated the study instruments and the passwords produced by using the schemes under test to establish face validity. The investigator then refined the instruments, and pilot-tested them in late September 2006 to improve the questions, format, and scales used. The investigator increased internal validity by comparing data provided by different tests within same method (e.g. multiple user tests of the same password-generation scheme) and looking for consistent results. The investigator interviewed nineteen key informants in think- aloud sessions in late September 2006, and solicited feedback from participants as member checks to ask whether conclusions were correct via the initial survey, weekly short surveys, and the final survey in early December 2006. This exploratory study increased external validity by strictly limiting the generalization of findings to the parameters of the data in recognition of the specifics of the participant groups, which were purposively chosen from students at the Florida State University College of Information. The investigator randomly assigned participants to treatment groups, and multiple efforts were made to reengage non-responders with a sequence of status notifications via personal contact and email correspondence. The investigator took multiple measures to ensure data reliability. He pre-tested all instruments for clarity of language and relevance to the topic in mid-September, 2006, recorded all participant comments during the think-aloud sessions for later verification, and performed all

100

assessments of the usability of password-generation schemes during the generation stage. He repeated some questions from the initial survey in late September, 2006 were repeated on the final survey in early December, 2006 (i.e. the test-retest method) to make the same measurement twice, and the answers checked for variation, checked the reliability of the scales used in surveys were checked for internal consistency. He measured the input accuracy of generated passwords in terms of participant perception, automated logging, and participant entry into an online database to check for inconsistencies. All questions posed to participants were relevant to their direct experiences during the stages of password generation, recall, and input over the course of the study, and all statistics used to compare groups were appropriate to the units of measurement. In addition, the web-based format and delivery of this study and its instruments (except consent forms) increased its replicability, regardless of geographical limitations. The investigator assured consistency of data through comparison of findings derived from different methods, and checked for agreement about the usability (in terms of ease of generation, ease of input, and remembrance) and security (in terms of cryptographic strength) of password-generation schemes and their resultant passwords. In addition, he checked the reliability of the scales used in surveys for internal consistency across the pilot test (mid- September, 2006), initial survey (late September, 2006) final survey (early December, 2006), and a survey of non-respondents and dropouts (early December, 2006). He used multiple methods of usability testing on the assumption that they collectively provide more consistency in results. It was assumed that quantitative authentication logs would produce results consistent with results from surveys and think-aloud procedures. As discussed in Section 5.3, server authentication logs required some interpretation to overcome the ambiguities introduced by the .htaccess protocol. Following previous studies, this study assumed that the user survey is the best means of assessing password usability (Zviran & Haga, 1993; Yan et al., 2004), but this study also used expert testing and log analysis to acquire data from both user and expert perspectives. Cryptanalysis provided further breadth to the data by establishing the context of password strength as a cryptographic threshold for strong passwords that participant were required to manage, and by providing means to measure password strength. For a clear understanding of the threat to memometric authentication, the cryptographic analysis in Chapter 4 indicated the minimum requirements of robust passwords if they are to endure sophisticated contemporary and projected attacks. This study required that all

101

passwords surpass this somewhat arbitrary cryptographic threshold, and instruments and authentication mechanisms used in this study enforced this requirement. Available resources limited this exploratory study. This study used Florida State University College of Information students as participants purposively because they represented the population of networked computer users in many ways, because they were familiar with networked computer and password usage, and because they were easily contacted via email and personally in the classroom. Their exclusive use, of course, limited the generalizability of findings onto the general public because the participants were undoubtedly atypical of the population of secure networked computer users in unforeseen ways. The small number of subjects and high dropout rate encountered in this exploratory study also limited the statistical significance of findings in the population at large. The duration of the study also imposed a limitation, since the very long-term memorability of passwords can only be measured over a very long time. Although this study introduced participants to five password-generation schemes and allowed them to use alternative schemes of their choice, it limited testing to the two password- generation advice interventions considered most important by the investigator: (i) password examples provided and (ii) password reentries required. This consideration stemmed from research that uncovered security vulnerabilities resulting from inadequate user involvement in the security process (Adams & Sasse, 1999; Sasse et al., 2001), and from research that demonstrated the recall gains resulting from reuse of TBR items (Tulving, 1982). This study experienced several unforeseen external difficulties. First, the use of email to communicate with and remind participants proved problematic because of aggressive spam filtering enforced by IT policy within the school. Many participants did not reliably receive messages from the investigator and dropped out of the study. Although internet technology proved useful for traffic and authentication failure analysis, communication was frequently disrupted by FSU IT policy and practice. Second, because the passwords in this study did not protect valuable resources or accounts, participants undoubtedly gave them lower priority and attention than active passwords, and this could be overcome by investigating active passwords. Third, in spite of raffle rewards offered for completion of the study, many participants lost interest because of other priorities and lack of time, although only one formally requested to be dropped.

102

5.3 Expert Analysis of Password-generation Schemes The five password-generation schemes presented to participants in this study are widely recommended by security experts, researchers, and administrators, but they are by no means exhaustive. Because of the secrecy prevalent in the information security industry, there may be effective password-generation schemes in use that are not publicly disclosed. With the goal of exploring the range of available schemes, participants in this study were free to use alternative schemes, and, although many of these alternatives were merely combinations of the five suggested schemes, some distinct alternatives emerged. Table 5.1 presents the findings of expert analysis of the five schemes in terms of extensibility, plaintext entropy, input ease, and memorability. Alphabet extensibility is

Table 5.1: Analysis of Five Password-generation Schemes Scheme Acrostic Address Confession Memory Nonsense Feature Extensibility – Alphabet High High Moderate Moderate Moderate Extensibility – Length Moderate High Low Low Low Plaintext Entropy High Low Low Low Moderate Input Ease Low Moderate High High Moderate Memorability Moderate High High Moderate High

the degree to which numbers, symbols, Unicode characters, etc. can be included meaningfully in generated passwords. Length extensibility refers to the relative ease of meaningfully extending the password beyond the 20-character threshold used in this study. Plaintext entropy is the ability to obfuscate the user-centric semantic content of the password, even if viewed by an attacker in plaintext before the hashing and salting processes. This is a desirable property for passwords that are archived on physical media such as paper because it obscures the algorithm used to generate it and allows for further plaintext obfuscation on the part of the user. Input ease is a general measure of the reliability of password entry, with consideration of the potential for errors introduced by incorporating unusual keys, multiple key combinations, and other patterns not common in natural language typing. Memorability is the ability of the user to correctly recall and input the precise formulation of the password, a feature especially useful for infrequently used passwords.

103

Passwords generated using the Old Address scheme are highly extensible in terms of alphabet, since they contain punctuation and numbers that can be easily altered, but the numbers and punctuation characters in them can be relatively difficult to input. They are very easy to remember, but are highly vulnerable to social engineering attacks and very obvious in plaintext. Passwords from the Unexpected Nonsense scheme are moderate in most categories, but they are easy to remember because they use well-known phrases, which make them somewhat difficult to extend vulnerable to social engineering attacks without entropy enhancement. The Acrostic makes passwords that are highly extensible in terms of alphabet substitutions and that are highly cryptic even in plaintext, but that are relatively difficult to input because users must be regenerate them from their source phrase on the fly. Passwords from the Old Memory scheme are typically easy phrases to input, but can difficult to extend because they rely on a specific event in the past. They are also typically very obvious in plaintext. Finally, the Confession/Embarrassment scheme tends to produce passwords that are easy to input and remember, but difficult to extend and obvious in plaintext. This study suggested the five schemes discussed above, but it also allowed participants to use alternative password-generation schemes, and 39% of participants reported using their own scheme to generate their passwords. Despite the 20-character threshold, there was great variance in the strength of passwords generated using those alternative schemes. Examples of alternative schemes that yielded relatively weak passwords were: • an old street address combined with the user’s year of birth; • mother’s name with an added symbol; • a basic password, favorite numbers, and the user’s middle name; • common words modified with special character and numbers; • a short phrase or a medium size word, with some numbers or characters substituted for letters (e.g. 3 for e, $ for s); • symbols, capital letters and substituting some words such as “are” and “one” to “R” and “1”; and • a nickname plus favorite numbers; • the names of family members; and • the birthplace of the user.

104

These are susceptible to basic social engineering or use common character substitution methods that are trivial to attack. Examples of other user-selected schemes that yielded stronger passwords were: • comic book nonsense; • combinations of many old passwords; • upper- and lower-case sections with appended symbols; • an aspiration from the bartender trade; • a mix of old girlfriend names, phone numbers, and favorite systems • a conscious thought at the time of password generation; • a television commercial; • a phrase of upper- and lower-case letters with intermittent symbols; • parent’s native tongue, numbers, uppercase letters, and English words; The user-defined password-generation schemes that yielded the strongest passwords included: • a hated video game character with the numeric code of rival soccer country; • an unusual phrase heard in a recurring dream; • an inside joke shared with a deceased grandfather; • a combination of acrostic and short familiar phrases; • the names and nicknames of familiar people plus the first two symbols on an acrostic of the first letters from a combination of two numbered quotes, with meaningful words capitalized. Despite the relative distinctions drawn here, it is important to note that all passwords generated and used in this study were very strong by industry standards. This is because all passwords had to exceed the cryptographic strength threshold discussed in Chapter 4. As determined in Section 4.5, even the password selected by experts as the weakest of all contained more entropy than the totally random 8-character password. All participants seemed to understand the need to obfuscate the password, and most used personally meaningful elements to do so. An unusual scheme suggested by one participant was to use the left hand only for the password (viz. fredfredfredwasdead!) and leave the right hand free to operate the mouse. Although this makes prior or subsequent mouse interaction more efficient, it limits the character set, resulted in one of the Weakest 8 Passwords (See Table 4.5), and could make key combination input (i.e. ‘!’) more difficult for some users.

105

Chapter 4 reported the findings of an expert analysis of the cryptographic strength of actual passwords generated and used by study participants in terms of industry standards. As listed in Table 4.5, information security experts categorized study passwords as the Weakest 8, the 8 Most Cryptic, and the Strongest 8. Table 5.2 displays those same 24 passwords, along with the generation scheme used, the passwords’ Reinhold Entropy (RE) scores, and the failure rate experienced by participants as they attempted to authenticate themselves with them. Predictably, the Acrostic scheme

Table 5.2: Selected Passwords by Generation Scheme

Scheme Password Category RE Failures Attempts % n3hbwesittiobingenaJ 8 Most Cryptic 110 4 9 44 Mmlmlnoeitwww#1Ilhsm 8 Most Cryptic 110 2 2 100 1994Wrefilwe!waaka19 8 Most Cryptic 110 - - - ILT#MGITa3DBSIaBWASG 8 Most Cryptic 110 - - - MyLoBaCa2SuMeThWoWee 8 Most Cryptic 110.3 23 25 92 7tsd3fru3fubar@063084 8 Most Cryptic 115.5 - - - Acrostic 1iptLuiDutiptrmB2iaatbz! 8 Most Cryptic 132 0 1 0 HYETTAS?ynu...tnsnihniwyta 8 Most Cryptic 137.5 - - - tml,tmltmwicsidc,isfycttsfm Strongest 8 148.5 1 1 100 30 38 79 1505WestTharpesStreet. Weakest 8 62.2 1 8 12.5 130390THStNorthBergen Weakest 8 77.7 0 6 0 801SW56TERR,Plantation,FL Weakest 8

Address Address 85 1 3 33 2 17 12 BUTCHERblockcantstop Weakest 8 60 - - - IneedtogototheBATHROOM Weakest 8 82 4 8 50 TakeiThomechewiTiT'sdelicious Strongest 8 92.9 2 9 22 iloveasianfoodfromFAREASTCUISINE4073443502 Strongest 8 Confession Confession 144.4 4 5 80 10 22 45 94PontiacGrandPrixSE Weakest 8 57.5 - - - 14sraightBraves>12,14sinceSEMINOLES Strongest 8 109.5 - - - 1PUMPKINayanna@MANlamar#KIKIvere Strongest 8 147 0 2 0 Memory 0 2 0 ShoesOffTheOtherFoot Weakest 8 75.9 - - - OpinionsRlikeASSHOLES-everybodyhas1 Strongest 8 96.2 0 1 0

Nonsense 0 1 0 fredfredfredwasDEAD! Weakest 8 75.1 1 1 100 Meetmeatourfavoritespot2672blue#147 Strongest 8 128.3 2 6 33

Own thePurPlePorPoiselosthis19

106

produced the most cryptic passwords, and those passwords were relatively strong for their length. 44% of participants using these acrostic passwords dropped out of the study early, and the rest experienced the highest failure rate of 79%. The Old Address scheme produced three of the Weakest 8 passwords, but participants using them experienced a low 12% failure rate. The Confession scheme produced two of the Weakest 8 and two of the Strongest 8 with a 45% failure rate. The Old Memory scheme produced one of the Weakest 8 and two of the Strongest 8, but a high dropout rate among participants led to an inconclusive 0% failure rate. The Unexpected Nonsense scheme produced one of the Weakest 8 and one of the Strongest 8, but a high dropout rate among participants also led to an inconclusive 0% failure rate. Participants using their own schemes produced one of the Weakest 8 and two of the Strongest 8 with a relatively high 71% failure rate. 65 participants completed Step 3 and generated passwords for this study, and the investigator categorized them as reported in Table 5.3. Participants generally favored the

Table 5.3: Study Passwords by Scheme Scheme Number of Passwords Acrostic 7 Address 14 Confession 14 Memory 14 Nonsense 5 Own 11 Total 65

Address, Confession, Memory, and their own schemes over the Acrostic and Nonsense schemes. The analysis in this section has shown that the Acrostic and the participants’ own schemes produced both the strongest passwords and the highest failure rates. In contrast, the Old Address scheme produced weak, but memorable, passwords. Participants favored the Old Address, Confession, Old Memory, and their own schemes over the Acrostic and Unexpected Nonsense scheme. Additionally, this analysis found that participants using their own schemes generated passwords of widely varying strength levels, although all exceeded the cryptographic strength threshold established for this study. The remainder of this chapter reports the usability

107

of these schemes in terms of input success and inferred memorability of the passwords that participants generated using them.

5.4 Log Analysis of Scheme Usability The instruments, surveys, and authentication mechanisms used in this study were all web based and hosted on web server located off-campus. The server ran Ubuntu Linux 6.06, Linux kernel 2.6.16-25, and the Apache web server 2.06. The survey application was Unit Command Climate Assessment Survey System (UCCASS v.1.8.1), open-source software using Personal Home Page (PHP) scripting and the MySQL 5.02 open-source database. The investigator configured the Linux operating system, the Apache web server, the UCCASS application, and the MySQL database to generate copious logs of all participant interaction with the Apache web server, its .htaccess authentication mechanism, and the UCCASS online survey application. This section presents a quantitative perspective scheme usability through analysis of the web-based activity of all participants as they performed the eleven steps of this study. In addition to authentication logs, which indicate password input success and failure in binary terms, the online surveys required participants to type their passwords into a text box during each step of the study. This text box revealed the password in plaintext rather than the “marching dots” that the .htaccess protocol and the user’s browser use to obfuscate the password during input. Section 5.5 reports the results of these plaintext inputs of the passwords. Although the systems and applications used in this study generated over 19.7 MB of logs over the course of the study, the Apache server logs proved particularly germane to this analysis since they directly indicated the failure rates of the participant authentication attempts. Even after the deletion of irrelevant Apache log entries, most of which represented attacks on the server, 11,246 entries remained for analysis. The investigator parsed these logs looking for authentication failures on the part of study participants. In the following examples drawn from the logs, the username of the investigator (viz. pth03) replaces the actual username of the participant. The first component of the log is the IP address of the GET request, the second is a hyphen representing the obfuscated password of the user, and the third is either a hyphen or the username entered into the user’s browser’s dialog box. Within the square braces are the date, time, and time zone of the server. In a successful authentication exchange, as listed in below, the first GET request is for the authentication dialog box, the second includes the obfuscated

108

password and the username, and the third begins the serving of the access restricted components of the survey (viz. background.gif): 69.252.189.213 - - [04/Nov/2006:20:46:02 -0500] "GET /UCCASSv1.8.1/survey.php?sid=32 HTTP/1.1" 401 490 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" 69.252.189.213 - pth03 [04/Nov/2006:20:46:03 -0500] "GET /UCCASSv1.8.1/survey.php?sid=32 HTTP/1.1" 200 5151 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" 69.252.189.213 - pth03 [04/Nov/2006:20:46:04 -0500] "GET /UCCASSv1.8.1/templates/Default/images/background.gif HTTP/1.1" 304 - "http://68.35.244.224/UCCASSv1.8.1/survey.php?sid=32" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

In contrast to this successful exchange, the following example represents two unsuccessful authentication attempts: 68.84.18.219 - - [24/Oct/2006:12:38:10 -0400] "GET /UCCASSv1.8.1/survey.php?sid=30 HTTP/1.1" 401 490 "http://68.35.244.224/steps.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)" 68.84.18.219 – [email protected] [24/Oct/2006:12:38:40 -0400] "GET /UCCASSv1.8.1/survey.php?sid=30 HTTP/1.1" 401 490 "http://68.35.244.224/steps.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)" 68.84.18.219 - [email protected] [24/Oct/2006:12:39:01 -0400] "GET /UCCASSv1.8.1/survey.php?sid=30 HTTP/1.1" 401 490 "http://68.35.244.224/steps.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)" 68.84.18.219 - pth03 [24/Oct/2006:12:39:27 -0400] "GET /UCCASSv1.8.1/survey.php?sid=30 HTTP/1.1" 200 4546 "http://68.35.244.224/steps.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)"

In this example, the password entered by the user was incorrect, and Apache served the authentication dialog box again. This is a clear distinction for log analysis, but the use of the .htaccess protocol introduced a problem because it allows the user’s browser to “remember” or “manage” the password. The following is an example of a participant’s use of browser password management: 69.252.189.213 - - [04/Nov/2006:20:46:02 -0500] "GET /UCCASSv1.8.1/survey.php?sid=32 HTTP/1.1" 401 490 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" 69.252.189.213 - pth03 [04/Nov/2006:20:46:03 -0500] "GET /UCCASSv1.8.1/survey.php?sid=32 HTTP/1.1" 200 5151 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" 69.252.189.213 - pth03 [04/Nov/2006:20:46:04 -0500] "GET /UCCASSv1.8.1/templates/Default/images/background.gif HTTP/1.1" 304 - "http://68.35.244.224/UCCASSv1.8.1/survey.php?sid=32" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

On first blush, this appears as a successful authentication sequence, and it is, but the time difference between the first and second GET requests is only one second. Since this study required participants to use at least twenty characters in their passwords, it is reasonable to assume that one second is insufficient for successful input. Because of this phenomenon, each survey asked participants if they were using password management, and, if so, to disable it. In the above example, it is clear that the participant could not have entered the password, but the following example is more problematic: 72.236.184.250 - - [10/Nov/2006:18:43:52 -0500] "GET /UCCASSv1.8.1/survey.php?sid=33 HTTP/1.1" 401 490 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0"

109

72.236.184.250 - pth03 [10/Nov/2006:18:43:58 -0500] "GET /UCCASSv1.8.1/survey.php?sid=33 HTTP/1.1" 200 5151 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0" 72.236.184.250 - pth03 [10/Nov/2006:18:43:58 -0500] "GET /UCCASSv1.8.1/templates/Default/style.css HTTP/1.1" 200 1930 "http://68.35.244.224/UCCASSv1.8.1/survey.php?sid=33" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0"

Here the time difference between the first and second entries is six seconds, and it may have been possible for a fast and accurate typist to successfully enter a long password in that time. For consistency, the following log analysis considered successful authentication sequences with gaps of seven seconds or less as ‘Automated,’ and did not count them as actual participant password inputs. Table 5.4 presents the findings of authentication failure rate from server log analysis. In this table a “+” sign indicates a successful authentication attempt, an “A” indicates a successful automated authentication using the browser’s password management capability, a number indicates unsuccessful authentication attempts, and a hyphen indicates that the participant made no attempt to login to the survey. All automated and unsuccessful attempts are not considered in the calculation of the failure rates. The columns S1 through S8 represent access to the eight online surveys involved in this study, and the numbers in them represent the number of authentication failures. As indicated, the seven participants using the Acrostic scheme experienced a 65% failure rate, the fourteen using the Old Address scheme had a 68% failure rate, the fourteen using the Confession scheme had a 32% failure rate, the fourteen using the Old Memory scheme had a 47% failure rate, the five using the Unexpected Nonsense scheme had a 31% failure rate, and the eleven using their own scheme had a 52% failure rate. This analysis suggests a memorability advantage for passwords made using the Unexpected Nonsense scheme, and relative disadvantages for the Acrostic and Old Address schemes. Several participants had difficulty remembering and inputting their long passwords, and nine experienced total failure, necessitating the investigator to manually reset their passwords during the study. Table 5.5 lists those resets by scheme used by participant, Step involved, and date. Participants using the Confession and Old Address schemes made over half of the password reset requests, but participants using all schemes found it necessary to make requests for resets. Table 5.6 lists the number of participants completing each of the eleven steps in this study. A total of 133 participants initially signed consent forms, 83 of these continued to participate by

110

Table 5.4: Authentication Failure Rate by Scheme Scheme S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 Failures Attempts % + + + + + - + + 0 7 0 1 6 1 + 1 A 14 + 23 25 92 1 + + + 2 + + 1 4 9 44 2 ------2 2 100 1 ------1 1 100 Acrostic + ------0 1 0 + ------0 1 0 30 46 65 + + A A + + + + 0 6 0 1 + + + + + + + 1 8 12 2 + A + 1 + + + 3 8 38 1 ------1 1 100 + 8 + + + + + + 8 15 53 + + + + + + + + 0 8 0 1 + + A A - A - 1 3 33 1 + A + - - - - 1 3 33

Address + + + + + - + 1 1 7 14 9 + + 1 + + + A 10 15 67 + A - - - + + + 0 4 0 8 1 - A 3 - - + 12 13 92 4 A A A - - - - 4 5 80 + + + + + + + + 0 8 0 72 106 68 + + + + + + + + 0 8 0 6 + + + + + 3 + 11 16 69 + + 1 + + + + + 1 8 12 4 + + + A + A A 4 8 50 + + - - A - + A 0 3 0 2 + 1 - - 1 - - 4 5 80 + + 1 + + + + + 1 8 12 + + + + + 1 + + 1 8 12 2 + + + + + - + 2 9 22 Confession Confession 2 - - 1 + + + - 3 6 50 + + + 1 + + + + 1 8 12 + 1 + + + + + + 1 8 12 + 2 + + + + + + 2 9 22 + 1 3 A - A A 1 4 5 80 35 109 32 + + ------0 2 0 + + + + + + + + 0 8 0 + - + + + A + A 0 6 0

Memory + + + + + - + 1 1 7 14 4 + - + + + - + 4 11 44

111

Table 5.4 – Continued Scheme S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 Failures Attempts % + + + 1 + + - + 1 7 14 4 - + 2 + 1 + + 7 11 63 + + - + 1 - A A 1 4 25 4 A 1 1 + - + 1 7 9 78 2 A + + + + + + 2 8 25 + A + + + + + + 7 7 100 Memory 6 A - + A + A - 6 8 75 3 4 - - - - - 1 8 8 100 2 ------2 2 100 46 98 47 3 + + + + + + + 3 10 30 + + + 1 + + + A 1 7 14 + + A + + 1 + + 1 7 14 1 A 1 + - 1 + 2 5 7 71

Nonsense + ------0 1 0 10 32 31 1 + 1 2 6 + + 2 12 14 86 4 1 A A A A A A 5 5 100 + + + + + + + + 0 8 0 + - + + + + + + 7 7 100 4 + + - + + + + 4 10 40 2 + A + + - A + 2 6 33

Own + + 4 1 + + + + 5 10 50 + + + + + + + + 7 7 100 + + + + + + + + 0 8 0 + A + + + + + + 0 7 0 + ------1 1 100 43 83 52

Table 5.5: Manual Password Resets Scheme Step Date Acrostic 7 11-30-06 4 10-20-06 Address 4 10-23-06 3 10-15-06 Confession 3 10-16-06 6 11-11-06 Memory 6 11-20-06 N/A 6 11-14-06 Own 6 11-8-06

112

generating a password, 61 completed Survey 1, and 51 completed Survey 8. 82 participants dropped out of the study, and 14 dropped after Survey 1. Table 5.7 lists percentage of these 14 dropouts by scheme. Participants using the Acrostic scheme had the highest incidence of

Table 5.6: Participants completing each Step Step 1 2 3 4 5 6 7 8 9 10 11 Password Survey Survey Survey Survey Survey Survey Survey Survey Consent Interview Description Generation 1 2 3 4 5 6 7 8 Date 10-10 10-10 10-10 10-17 10-24 10-31 11-07 11-14 11-21 11-28 12-5 Participants 133 19 65 61 61 47 52 52 47 51 51

dropout, and those using the Confession scheme had the lowest. This log analysis has found that participants using the Confession and Unexpected Nonsense schemes had the lowest

Table 5.7: Dropouts by Scheme Scheme Dropout Percentage Acrostic 57 Address 21 Confession 7 Memory 14 Nonsense 20 Own 18

authentication failure rates, while those using the Acrostic and Old Address schemes experienced the highest failure rates. Although server authentication logs are highly reliable sources of data, they require some interpretation to overcome the ambiguities introduced by the .htaccess protocol, and this is a potential source of error.

5.5 Think-aloud Interview Findings of Scheme Usability During the initial enrollment in late September 2006, nineteen participants agreed to think-aloud interviews to test the web based password-generation instruments. During two sessions, on September 26 and September 28, 2006, these participants met one-on-one with the investigator in a usability lab located in the Louis Shores Building on the main campus of the Florida State University. Participants accessed the password-generation instruments via the web and a browser on a computer in the lab, and they were free to navigate the interface as desired.

113

The investigator made hand-written field notes and a digital audio recorder recorded all nineteen interviews. The investigator instructed the nineteen volunteer think-aloud participants to read their group’s respective instrument and to comment out loud while doing so. After reading the instrument and responding to it, all participants generated the password to be used in the course of the study. In addition, the investigator asked the participants for their reactions, impressions, and responses to the idea of robust password management, the five schemes suggested in the instruments, and the process of generating a password on the spot using either one of the five schemes or one of their choosing. If a participant decided to use a scheme of their own, the investigator requested them to explain their choice. Five of the participants used the Old Address scheme, three used the Unexpected Nonsense scheme, two used the Acrostic scheme, three used the Old Memory scheme, and one used the Confession scheme. Five participants used their own password-generation schemes. One used and old phone number, one used an obscure email address, one combined Unexpected Nonsense with Old Memory, one combined an Old Memory with an Acrostic, and one combined an Acrostic of the first letter of family members with a favorite number. All participants of the think-aloud interview wrote their passwords on notepaper and were allowed to refer to that note for the remainder of the study. In addition, the investigator sent them their passwords via email correspondence. The vast majority of participants found the instruments and password-generation schemes to be straightforward and acceptable for use online. One student found the Acrostic to be confusing, and used the Old Address scheme to generate the password. Two respondents thought the 20-character minimum length to be excessive, but all others considered it to be an appropriate request. One respondent inquired about the case-sensitivity requirements of the passwords under study, and noted that the requirement that all passwords in this study are case-sensitive was not made explicit in the instrument. Several students were confused about some of the wording in the instruments and the investigator corrected these identified points of confusion. The think-aloud protocol offered a glimpse at the thought processes of participants as they used the schemes to generate passwords. Many of the challenges and experiences facing users who must use long passwords emerged for observation. The input received from interviewees served to refine the instruments for the rest of the study participants.

114

5.6 Survey Findings of Scheme Usability This study required participants to complete eleven distinct steps, including password generation, an initial survey, six short interim surveys, and a final survey, all of which necessitated successful input of the passwords generated for this study via the .htaccess protocol enforced by the Apache web server. As listed in Table 5.6, a total of 133 participants initially signed consent forms, 83 of these continued to participate by generating a password, 61 completed Survey 1, and 51 completed Survey 8. The Interview and Password Generation Steps used instruments that suggested the five password generation schemes listed in Table 5.1: the Old Address, Unexpected Nonsense, the Acrostic, the Old Memory, and the Confession. In addition to the five schemes, study participants were free to use an alternate password-generation scheme of their choosing or to combine techniques from multiple schemes. Some of these were simple modifications or combinations of the five suggested schemes, while others demonstrated previous experience or mnemonic creativity on the part of the participant. The investigator examined the passwords and categorized them as listed in Table 5.8, with descriptions and reasons paraphrased from self-report survey data. No clear trends emerge from these data, but several participants using their own or the Confession scheme listed humor as a reason, but those using the Acrostic and Address did not. Familiarity was popular for users of the Old Address scheme, remembrance was the reason for those using the Acrostic, Confession, Unexpected Nonsense, and Own schemes, and ease was important for those using the Acrostic and Own schemes. Table 5.9 lists the self-report data of plaintext password recall. In addition to the authentication step to access each survey, each survey instrument required participants to reenter their passwords in a plaintext box, and then inquired whether the participant could recall

115

Table 5.8: Password Generation Scheme by Reason Scheme Password Reason myveryedumjsu9PIZZAS familiarity MyLoBaCa2SuMeThWoWee ease of use

n3hbwesittiobingenaJ security Mmlmlnoeitwww#1Ilhsm ease of use and remembrance tml,tmltmwicsidc,isfycttsfm

Acrostic familiarity 1994Wrefilwe!waaka19 ease of use 1iptLuiDutiptrmB2iaatbz! ease of remembrance 130390THStNorthBergen familiarity and ease of use 1505W.TharpesStFL32303 familiarity and ease of use 244066thTerraceSouth ease of use BrianRode>351WestLafayetteStreet security and remembrance 8080ShadyGroveRdGrandridge familiarity 2apenn-adamsburgroad familiarity 3677Ballestero>Dr.S. familiarity 801SW56TERR, Plantation, FL ease of remembrance Address 8805Norfolkblvd,jaxfl32208 familiarity POBox188BristolFL!@# familiarity 8HooVerAvenuePeaBodyMass ease of remembrance and use 926A$port$treetPomonaCA ease of remembrance 2021MartinKingjrTallyFlorida security tupacSHAKUR&BONE!east893st. familiarity myBOYFRIENDandiareENGAGED ease of remembrance !/IloveWATCHINGtheSIMPSONS!? ease of use and familiarity BartendingGetsMeLaid5TimesaDay ease of remembrance TakeiThomechewiTiT'sdelicious humor 1TIME,ifellinthepool! ease of use Idontlikegoingtoclassat8am! humor I-will-WIN-the-iPOD!!! spontaneity Mypasswordis8505753502 familiarity i1/2beenaN0lefan5inceiwas9 familiarity Confession IloveWATCHINGtheSIMPSONS ease of use and familiarity IneedtogototheBATHROOM!? humor and remembrance sleeping6through4classISfun remembrance and humor Thisisareallylongpassword! ease of remembrance IlovevisitingZabcikville ease of use and security 1Chri$ti@nAm@zingAll familiarity

BenneTt*AmanDa2256+osx5 security The*Loveless^5quall! familiarity shakespaereLeon850307 ease of remembrance Memory iGodakaBarefooter314 security.

116

Table 5.8 – Continued Scheme Password Reason ?ChocolateSmellsvERyweird? remembrance and security mydogweighs65lbsinthewinter ease of remembrance

FLYINGbearscantstoptheMustardMan! humor and security C0mm@nder1PWN1$yourG@wd ease of remembrance Battlestar33POUNDYdrums!

Memory ease of remembrance 1PUMPKINayanna@MANlamar#KIKIvere ease of use BabyBlue87HondaAccordDX familiarity thebigpeacockDROVEMYVAN

humor and remembrance SupermanCouldKickBatmans@$$ ease of remembrance SHEsellsSEASHELLSbytheSEASHORE rhyme and familiarity APPLY_DIRECTLY_TO_FOREHEAD>>HEAD_ON ease of remembrance Nonsense OpinionsRlikeASSHOLES-everybodyhas1 ease of remembrance 954748MY0ldNum1s3298 ease of remembrance Bluejay1CB323038675309 ease of remembrance AREWETHEREYET?nostopasking! ease of use, humor thePurPlePorPoiselosthis19MYSELF ease of remembrance fredfredfredwasDEAD!1 familiarity and right hand freedom the password without reference to a written version. The Logins column lists the total number of attempted logins by participants using each scheme, and the Ciphertext column lists the failure rate from Table 5.4. The S1 through S8 columns indicate a failure of plaintext input with a “No”, and the Plaintext column lists the total from columns S1 through S8 divided by the total logins per scheme. Participants using the Confession and Old Address schemes clearly experienced less failure in plaintext entry. Survey 8 requested participants to generate an additional 20-character password to explore scheme preferences at the end of the study. Table 5.10 displays the initial and final scheme choices of these participants. Of the 51 respondents to Survey 8, three had initially used the Acrostic scheme, ten the Old Address scheme, twelve the Confession scheme, ten the Old Memory scheme, 6 the Unexpected Nonsense scheme, and ten had used their own. Free to use the scheme of their choice to generate their final password, one chose the Acrostic scheme,

117

Table 5.9: Plaintext Password Failure by Scheme Scheme Logins Ciphertext Plaintext S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 No No No No No No No Acrostic No No 46 65 20 No No No No No No Address No No No No 106 68 9 No Confession 109 32 >1 No No No No No No No No No No No No No No Memory No No No 98 47 17 No No No No

Nonsense No 32 31 16 No No No No No No No No No No No No No

Own No No No No 83 52 20 used their own. Thus, participants tended to shun the Acrostic and participants’ own schemes and to favor the Old Address and Unexpected Nonsense. Although survey data show no overwhelming trends in participant scheme choice, several participants using their own or the Confession scheme listed humor as a reason, and none of those using the Acrostic and Address did. Familiarity was popular for users of the Old Address scheme, remembrance was the reason for those using the Acrostic, Confession, Unexpected Nonsense, and Own schemes, and ease was important for those using the Acrostic and Own schemes. Participants using the Confession and Old Address schemes clearly experienced less failure in plaintext entry. At the end of the study, participants tended to shun the Acrostic and sixteen the Old Address scheme, ten the Confession scheme, ten the Old Memory scheme, and 4

118

Table 5.10: Final Scheme Choice Final Scheme Acrostic Address Confession Memory Nonsense Own Initial Initial Scheme Total 1

1 3 Acrostic 1 6 1 Address 1 10 1 1 1 5 Confession 12 3 3 1 4 Memory 2 10 2 1 2 Nonsense 6 4 2 2 Own 10 1 5 Final Total 1 16 10 10 10 4 51 participants’own schemes and to favor the Old Address and Unexpected Nonsense when faced with the challenge of generating a new password.

5.7 Chapter Summary This chapter reported findings on the usability of various schemes to assist users in the generation of robust passphrases and on the security of the passwords that participants generated using those schemes. Chapter 4 identified five prominent password-generation schemes currently in use, and tested their security and usability using cryptanalysis and expert analysis. Based on these analyses, this chapter reported the findings from log analysis, think-aloud interviews, and surveys. Section 5.2 addressed the quality of data issues surrounding the data collection methods used in this study. Section 5.3 continued the discussion of the findings of expert testing. Section

119

5.4 analyzed the findings from server log analysis. Section 5.5 reported participant input obtained during think-aloud interviews. Section 5.6 presented self-report data collected by online surveys during the study. In terms of password security, all exceeded the cryptographic strength threshold established for this study. Cryptanalysis and expert testing showed that the Acrostic scheme, and certain of the participants’ own schemes produced the strongest passwords, and that the Old Address scheme produced the weakest passwords. In terms of password usability, log analysis found that participants using the Confession and Unexpected Nonsense schemes had the lowest authentication failure rates, while those using the Acrostic and Old Address schemes experienced the highest failure rates. Think-aloud interview and survey data showed no clear trends among participant scheme choice, but several participants reported humor, familiarity, and ease of remembrance were common reasons for scheme choice. Surveys also indicated that participants using the Confession and Old Address schemes experienced less failure in plaintext recall and input of their passwords. At the end of the study, participants tended to shun the Acrostic and participants’ own schemes and to favor the Old Address and Unexpected Nonsense when faced with the challenge of generating a new password.

120

CHAPTER 6 GENERATION STAGE INTERVENTION TESTING

6.1 Introduction This study explored the effects of three interventions designed to assist users in the generation, input, and memorability of strong passwords. These interventions, provided to treatment groups during the password generation stage, were (i) introducing multiple password- generation schemes, (ii) supplying a variety of practical examples of strong passwords generated with them, and (iii) requiring multiple re-input of the newly generated password. Chapter 5 reported findings about the usability of the password-generation schemes and the strength of the passwords that participants produced using them, and other issues arising during the process. This chapter explores the effects of the provision of various numbers of practical example passwords and the requirement of password reentry on password recall and input. To explore possible benefits of these interventions in terms of recall and input, this study used four data collection methods: think-aloud user interviews, expert testing of generated passwords, access log analysis, and eight surveys. Section 6.2 describes the participants and groups in terms of demographics, treatment, and dropout rates. Section 6.3 describes the experiences of nineteen participants during think- aloud interviews. Section 6.4 reports the findings of expert testing of the passwords generated by participants in the various groups. Section 6.5 reports the effects of treatments as reflected in access and authentication logs on the web server in terms of the failure rates of participant authentication with the access controlled surveys on a web server. Section 6.6 presents the effects of the study’s treatments as indicated by participant responses to web-based surveys. These surveys included initial and final questionnaires and six short interim questionnaires, all of which were access controlled.

6.2 Participants and Groups This study demanded much input from participants. It required them to complete eleven distinct steps, including password generation, an initial survey, six short interim surveys, and a

121

final survey, all of which necessitated successful input of the passwords generated for this study into a web-based authentication server. To encourage participation and to lessen the expected high dropout rate, the investigator weekly requested interaction via email correspondence with each participant, offered three prizes in a raffle among participants completing all steps at the end of the data collection stage on December 5, 2006, and gave two entries in the raffle to participants who engaged in the think-aloud interview. Participant recruitment began on September 26, 2006 and ended on October 10, 2006. Table 6.1 shows the number of participants completing each of the eleven steps in this

Table 6.1: Participants completing each Step Step 1 2 3 4 5 6 7 8 9 10 11 Password Survey Survey Survey Survey Survey Survey Survey Survey Consent Interview Description Generation (1) 2 3 4 5 6 7 (8) Date 10-10 10-10 10-10 10-17 10-24 10-31 11-07 11-14 11-21 11-28 12-5 Participants 133 19 83 61 61 47 52 52 47 51 51 Group 1 33 7 17 15 17 13 13 14 11 13 13 Group 2 32 6 18 16 16 10 13 13 12 13 13 Group 3 34 - 14 15 14 12 13 12 11 11 11 Group 4 34 6 19 15 14 12 13 13 13 14 14 study. A total of 133 participants initially enrolled and signed a consent form. The investigator randomly assigned them to four groups. Step 2 was an interview using a think-aloud protocol with only nineteen volunteers. Despite the promise of raffle prizes and the weekly email encouragement of the investigator, only 83 of the 133 originally enrolled participants actually completed Step 3 via email correspondence. Of the 62 participants who completed Step 4, the Initial Survey, 42 (68%) were male and 20 (32%) were female. Figure 6.1 shows the age distribution of study participants. Of the 51 participants who completed Survey 8, there were two clusters: a large one from age 20 to 24 and a smaller one from ages 26 to 28. These graduate and undergraduate students were enrolled in a highly technological major and were well above average in computer literacy and competence. The majority of respondents replied that they currently managed four or more passwords, as displayed in Figure 6.2, and these numbers represent all passwords, regardless of the value of

122

14

12

10

8 Number 6

4

2

0 18 19 20 21 22 23 24 25 26 27 28 29 30-35 36-40 41-45 Age

Figure 6.1: Participants by Age

14

12

10

8 Participants 6

4

2

0 0123456789101112131415161718192020+ Passwords

Figure 6.2: Current Passwords

the resources protected. Respondents also reported that the frequency of their password change varied, and that the phenomenon of password sharing was widespread. 45% of respondents indicated that they had used a password given to them by someone else, and 55% indicated that they had given a password to someone else. Less than 14% of respondents indicated that they had made provision for others to access password protected accounts in an emergency or should they become incapacitated.

123

To explore the effects of study interventions, the investigator divided the participants into the four groups listed in Tables 6.1 and 6.2. Group 1 served as a control group, for which the password enrollment instrument provided only a single example password and required entry of the candidate password only twice, as per the industry standard. The treatment instrument for groups 2 and 4 supplied participants with five and ten example passwords, respectively, to explore their effect during the creative stage of password generation. The treatment

Table 6.2: Subject Groups by Treatment Group Example Passwords Provided Password Entries Required 1 1 2 2 5 2 3 1 5 4 10 5

instrument for groups 3 and 4 required participants to reenter their candidate passwords four additional times to explore the effect of reiteration of the prompt, recall, and input sequence. As presented in Table 6.1, 51 of the 133 enlisted participants completed the 11 steps of the study. Of these 51, 13 were in group 1, 13 were in group 2, 11 were in group 3, and 14 were in group 4. After participants had generated and submitted their passwords via email correspondence, the investigator tested all plaintext candidate passwords to ensure that all passwords exceeded the cryptographic strength threshold established in Chapter 4. Chapter 5 reported the experiences of participants according to the password-generation scheme they used to make their passwords, and this chapter specifically contrasts the findings across these four test groups.

6.3 Interview Findings The investigator developed four password-generation instruments to be administered to the four respective groups in this study in late August 2006, and refined them in collaboration with experienced social researchers. As reported in Section 5.3, nineteen participants agreed to think-aloud interviews to test the web based password-generation instruments. The investigator assigned nineteen volunteer think-aloud participants to the four treatment groups listed in Table 5.1 and instructed them to read their group’s respective instrument and to comment out loud while doing so. After reading the instrument and responding to it, all participants generated the

124

20-character password to be used in the course of the study. In addition, the investigator asked the participants for their reactions, impressions, and responses to the idea of long password management, the five schemes suggested in the instruments, and the process of generating a password on the spot using either one of the five schemes or one of their choosing. After completion of the think-aloud interviews, the investigator also applied the recommendations of the participants to further revise the instruments. Appendices E, F, G, and H contain the resultant instruments for the four respective groups. The think-aloud interviews used preliminary versions of these same instruments, but participants were not required to reenter their passwords five times, as required by the instruments for groups 3 and 4. Upon enrollment in the larger, online study, the investigator required those interview participants assigned to their groups to reenter their passwords according to the groups’ respective protocols. The vast majority of participants found the password-generation schemes and example passwords to be straightforward and acceptable for use for other participants online. Although two respondents thought the 20-character minimum length to be excessive, others considered it to be an appropriate request. One respondent inquired about the case-sensitivity requirements of the passwords under study, and noted that the requirement that all passwords in this study are case-sensitive was not made explicit in the instrument.

6.4 Log Analysis Section 5.4 discusses the survey and authentication mechanisms used in this study, and this section differentiates the performance of the eleven steps of this study by the four participant groups. Table 5.4 presented the findings of authentication failure rate in terms of password generation scheme from server log analysis, and Table 6.3 displays the findings of authentication

125

Table 6.3: Authentication Failure Rate by Group Group S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 Failures Attempts % + ------0 1 0 2 ------2 2 100 1 + A + - - - - 1 3 33 + + - + 1 - A A 1 4 25 4 1 A A A A A A 5 5 100 + + A A + + + + 0 6 0 + - + + + A + A 0 6 0 + A + + + + + + 0 7 0 + + + + + - + 1 1 7 14 1 + + + 1 + + + A 1 7 14 + A + + + + + + 7 7 100 + + + + + + + + 0 8 0 + + + + + + + + 0 8 0 + + + + + + + + 0 8 0 2 + A + 1 + + + 3 8 38 4 + + - + + + + 4 10 40 6 + + + + + 3 + 11 16 69 36 113 32 + ------0 1 0 1 ------1 1 100 + + - - A - + A 0 3 0 + 1 3 A - A A 1 4 5 80 4 A A A - - - - 4 5 80 1 A 1 + - 1 + 2 5 7 71 + + + + + + + + 7 7 100 + + + + + + + + 0 8 0 2 + + + + + + + + 0 8 0 + + 1 + + + + + 1 8 12 + + 1 + + + + + 1 8 12 6 A - + A + A - 6 8 75 2 + + + + + - + 2 9 22 1 + + + 2 + + 1 4 9 44 3 + + + + + + + 3 10 30 1 + 1 2 6 + + 2 12 14 86 50 111 45 + ------0 1 0 + ------1 1 100 + + ------0 2 0 2 ------2 2 100 3 1 + + A A - A - 1 3 33 2 + 1 - - 1 - - 4 5 80 + + A + + 1 + + 1 7 14 + + + 1 + + - + 1 7 14 + + + + + 1 + + 1 8 12

126

Table 6.3 – Continued Group S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 Failures Attempts % 2 A + + + + + + 2 8 25 4 + + + A + A A 4 8 50 + + 4 1 + + + + 5 10 50 3 4 + - + + + - + 4 11 36 4 - + 2 + 1 + + 7 11 63 9 + + 1 + + + A 10 15 67 43 99 43 1 ------1 1 100 + A - - - + + + 0 4 0 2 + A + + - A + 2 6 33 2 - - 1 + + + - 3 6 50 + + + + + - + + 0 7 0 + + + + + - + 1 1 7 14 + - + + + + + + 7 7 100 + + + + + + + + 0 8 0 + 1 + + + + + + 1 8 12 4 + + + 1 + + + + 1 8 12 1 + + + + + + + 1 8 12 3 4 - - - - - 1 8 8 100 + 2 + + + + + + 2 9 22 4 A 1 1 + - + 1 7 9 78 8 1 - A 3 - - + 12 13 92 + 8 + + + + + + 8 15 53 1 6 1 + 1 A 14 + 23 25 92 76 148 51 Total 205 471 44 failure rate in terms of treatment groups. As in Table 5.4, this table uses a “+” sign to indicate a successful authentication attempt, an “A” to indicate a successful automated authentication using the browser’s password management capability, a number to indicate unsuccessful authentication attempts, and a hyphen to indicate that the participant made no attempt to login to the survey. All automated and unsuccessful attempts are displayed in shaded cells, and are not considered in the calculation of the failure rates. The columns S1 through S8 represent access to the eight online surveys involved in this study, and the numbers in them represent the number of authentication failures. As indicated, the seventeen participants in group 1 experienced a 32% failure rate, the sixteen in group 2 had a 45% failure rate, the fifteen in group 3 had a 43% failure rate, and the seventeen in group 4 had a 51% failure rate. This analysis suggests that participants in groups 2

127

and 4, who were supplied with 5 and 10 example passwords, respectively, had higher failure rates than those in groups 1 and 3. Despite expectations, participants in group 1, which received no special assistance during the password-generation process, outperformed all others in terms of password recall and successful input.

6.5 Survey Results Section 5.5 discussed the delivery details of this study’s eight surveys. This section indicates the success rates of participants in the recall and input of the passwords they used during the study. The data for this analysis came from weekly, four-question surveys during the seven-week run of the study. The instruments for these surveys are included in Appendices H and I. Because of the length of the password, participants were free to write their passwords and refer to that written version in the case of recall failure. Figure 6.3 lists participant ability to recall their password from memory, without the need

120

100

80

% 60

40

20

0 234567Average

Gr oup 1 73 86 79 93 75 71 84 Gr oup 2 77 50 85 85 92 83 78 Gr oup 3 54 64 91 90 90 100 80 Gr oup 4 50 54 93 93 85 93 77 Survey

Figure 6.3: Ciphertext Password Recall to refer to a written version, and input it into their browsers using the .htaccess protocol. As expected, recall of these regularly used passwords improved with usage. These data show mixed results from the reentry of the password during the generation stage since groups 3 and 4 underperformed in Surveys 2 and 4 and outperformed on Survey 7. On average, all groups, regardless of treatment, performed equally.

128

Figure 6.4 displays participant ability to recall their password from memory and input it into a plaintext text box in the online survey. Plaintext input is less prone to error because the

120

100

80

% 60

40

20

0 12345678

Group 1 81 73 93 86 87 92 86 85 Group 2 62 78 50 85 92 92 92 86 Group 3 86 77 82 75 100 100 91 91 Group 4 80 67 85 86 86 78 93 93 Survey

Figure 6.4: Plaintext Password Recall

25

20

15 Participants

10

5

0 123456789101112131415161718192021-2526-30 Characters

Figure 6.5: Length of Important Passwords

129

password is clearly visible to the participant during input, and despite this, the data show a slight advantage over cipher text recall. There is also no clear distinction between the success rates of the various groups. To frame the issue of password security, the Initial Survey inquired into participants’ experience with online data and identity theft and previous password management. 98.4% of participants indicated a familiarity with the phenomenon of online identity theft, 46.8% reported that they personally knew a victim, and 8% said that they had personally been victims of identity theft. Among the crimes reported were six physical or insider credit card thefts, eleven online credit card thefts, four bank credential thefts, two online community credential thefts, one eBay credential theft, and one PayPal credential theft. The eBay and PayPal thefts were via online “phishing” scams. All of these instances of identity theft reportedly involved the compromise of passwords. Survey 8 requested more detail in participants’ personal management of passwords, and Figure 6.5 displays self-report data on the length of participants’ important passwords. The majority of participants reported that their important passwords were in the six- to eleven-character range. The sharp spike at eight is no doubt the result of the recommended industry 8-character password standard. The outliers at 3 and 4 represent very weak security, just as the outliers at 14, 17, and 24+ represent rare instances of robust password usage.

Table 6.4: Password Resets by Group Group Date Resets 1 10/16/06, 11/8/06 2 2 10/20/06, 11/20/06 2 3 10/15/06, 10/23/06, 11/11/06 3 4 11/14/06, 11/30/06 2

One of the goals of this study was to reduce the number of requests for password resets by users. Resetting passwords requires IT staff resources and time and is a source of frustration for users. During this study the investigator assumed the role of IT staff and manually reset passwords upon request by participants nine times. As indicated in Table 6.4, no significant distinctions in terms of groups emerged from the reset requests.

6.6 Chapter Summary This chapter explored the effects of two interventions designed to assist users in the generation, input, and memorability of strong passwords. These interventions, provided to

130

treatment groups during the password generation stage, were (i) giving a variety of practical examples of strong passwords produced using the password-generation schemes, and (ii) requiring multiple re-input of the newly generated password. To answer the research questions surrounding possible effects of these interventions, this study used four data collection methods: think-aloud user interviews, expert testing of generated passwords, access log analysis, and eight surveys. Section 6.2 described the participants and groups in terms of demographics, treatment, and dropout rates. Section 6.3 introduced relevant experiences of nineteen participants during think-aloud interviews. Section 6.4 reported the findings of expert testing of the passwords generated by participants in the various groups. Section 6.5 reported the effects of treatments as reflected in access and authentication logs on the web server used to deliver the surveys. Section 6.6 presented the effects of the study’s treatments as indicated by participant self-report data in web-based surveys. The students participating in this study were above average in computer competence and were familiar with the management of multiple passwords. Although very few of them previously had used long passwords, they successfully authenticated themselves online 205 times, out of 471 attempts, using 20+ character passwords. This chapter showed that the results of multiple examples and multiple reentries on subsequent password recall and input accuracy were negative. Log analysis suggested that group 1, which received only one example password, and which required only one reentry of the candidate password, had the lowest failure rate during subsequent authentication attempts, although survey data indicated no significant recall rate differences between the respective groups. These findings suggest that the ability of users to meet the password challenge is determined by further complex factors that are not affected by the simple interventions used in Step 3 of this study.

131

CHAPTER 7 CONCEPTUAL FRAMEWORK ASSESSMENT

7.1 Introduction This study hoped to contribute to the usability of human interaction with secure memometric authentication systems, to help users generate, input, and remember strong passwords, to inform the field of information security of the potential for robust password management on the part of everyday users, and to identify user education issues and suggest strategies for improvement and future research. To these ends, this study used conceptual guidance from theories used in human memory research. Specifically, it utilized a modified version of Endel Tulving’s GAPS framework, discussed in Chapters 1 and 2, to investigate areas of potential improvement in the security and usability of robust password interaction. This study’s final research question surrounds the applicability of this GAPS framework for the study of the usability of secure memometric authentication systems. Section 7.2 describes the GAPS framework and the modifications made to it for the purpose of this study. Section 7.3 reports think-aloud data from the interviewees in Step 3, and Section 7.4 discusses the log analysis of Sections 5.3 and 6.5 on the conceptual level, and Section 7.5 revisits the survey findings from Sections 5.5 and 6.6 in the assessment of the Password GAPS model.

7.2 The Password GAPS Framework This study viewed the password problem as a Human-Computer Interaction Security (HCISec) problem, in terms of cognitive load, that authentication with IT systems places on users to remember and accurately input precise secrets, in the form of cryptographically strong passwords. It conceptualized the core of the password problem primarily as user inability to recall and reliably input robust passwords when prompted by the IT authentication mechanism. To frame its concepts, it looked primarily to the field of cognitive psychology, where much memory research has investigated the problems of cued recall and recall failure.

132

Cognitive memory research distinguishes three primary ways that humans remember: uncued recall, cued recall, and recognition. Uncued recall makes no provision, or “hook,” for the user to recall an item from memory, is the least efficient way to remember because it relies on rote memorization, and is unfortunately typical of legacy username/password authentication mechanisms. This study was concerned with the memometric challenge that the uncued recall of the login prompt presents to users. In most authentication mechanisms, the user is presented with only a login prompt as a cue, and must recall and accurately input the precise password formulation. The vulnerabilities of increasingly networked IT devices have created the need for stronger passwords that, in turn, place ever-greater cognitive load on authorized users. Within the field of cognitive psychology, the two fundamental categories of memory are procedural and propositional. Procedural memories are task-based, have no truth values, are acquired through extensive practice, and are exhibited in skilled behavior that can appear almost automatic, propositional memories are demonstrated in a wide variety of behaviors, typically have discernable truth values, can be acquired during a single event, and require directed attention for expression. This study conceptualized the password challenge as an exercise of uncued propositional memory recall because of (i) the specific behavior demanded by IT access control mechanisms, (ii) the binary pass / fail modes of authentication, (iii) the ad hoc way that most users compose passwords, and (iv) the directed attention that they require of the user facing the login prompt. Episodic memory and semantic memory are sub-categories of propositional memory. Tulving argued as early as 1972 that most research in verbal learning and memory since Ebbinghaus actually had been concerned with episodic memory, or events (Tulving, 1972). Although he maintained that the episodic memory system is functionally different from the semantic memory (Tulving, 1982, p. 8), he nevertheless viewed them as parallel and partially overlapping. Table 7.1 paraphrases Tulving’s synopsis of the episodic/semantic distinction. Chapter 4 argued that robust passwords derive their strength from entropy, or unexpectedness, and that entropy enhancement of a phrase a personal memory with unexpected (to the attacker) characters, words, or phrases is ideally a unique creative event on the part of the user. The localized and autobiographical aspects of episodic memory seemingly make it better suited than semantic memory for passwords, since unique, personal, shared secrets are resistant to social engineering and are not part of the common knowledge. For password application, the temporal

133

and localized features of episodic knowledge contain more entropy than the common words, phrases, concepts, and other relations of semantic knowledge, and the autobiographical and experiential nature of episodic memories are superior to the cognitive and socially constructed

Table 7.1: Episodic v. Semantic Memory Episodic Memory Semantic Memory Knowledge Temporal, Localized Words, Concepts, Relations Reference Autobiographical, Experiential Cognitive, Social Retrieval Recodes Engram Reinforces Engram Interference Vulnerable Relatively Immune references implicit in semantic memory. Thus, according to these criteria, episodic memory seemed well suited for password usage. Despite the apparent advantages of episodic memory, according to Tulving, the retrieval of episodic memory necessarily recodes the Engram, and the episodic Engram is more vulnerable to interference from other mental processes. These are highly undesirable features for effective and repeated password recall and input, a specific retrieval task in which the precise formulation of the shared secret must be accurately input for successful authentication and which should be distinguished from other secrets shared with other authentication systems. Despite these tendencies, this study sought to utilize episodic memory for robust passwords by introducing password-generation schemes to situate the generation, recall, and input of a password as a memorable and repeatable event within the participant’s unique life experience. Tulving developed the GAPS model, Figure 1.1, to identify and explain unobservable mental mechanisms and processes in light of patterns of behavior and private subjective experiences observed in memory research. GAPS is a black box approach to the study of episodic memory that is logically compatible with many specific models of drawn from empirical evidence obtained in laboratory experiments employing verbal stimulus events, especially the psychological paradigm of remembering as the function of an information- processing system. Although it is an internally consistent set of concepts, it contains no original elements, and Tulving insists that it is neither a theory nor a Kuhnian paradigm and that it makes no attempt at explanation or prediction of specific phenomena (Tulving, 1982, p. 129). The GAPS model is unique in the fact that it focuses on episodic memory and it conceptualizes Ecphory and Conversion as distinct sub-processes of retrieval (Tulving, 1982, p. 130).

134

Tulving developed GAPS as a general, unconstrained model to be challenged and improved by further memory research. GAPS is abstract in that many of the processes within it are unexplained black boxes. Although based loosely on the information-processing view of memory, it de-emphasizes the preoccupation with the structure of the mind as a machine in favor of a more evolutionary approach in which the user’s cognitive environment and recollective experience cannot be ignored. GAPS focuses on episodic memories that are specific, and often unique, to the individual, and this study investigated the possibilities that episodic memories show for use as the shared secret between the human and the IT system. As discussed in Chapter 1, the GAPS framework includes four Observables in memory research: the Original Event, the Interpolated Event, the Retrieval Cue, and Memory Performance. The GAPS includes two distinct modes: Encoding and Retrieval, and both are necessary to complete an act of remembering. Despite this conceptual distinction, Tulving argues that rememberers typically cannot detect the difference between the two modes (Tulving, 1982, p. 142), and that the processes of Recoding and Ecphory are closely related to, and almost indistinguishable from, each other. He conceptualizes them as implying each other, although Ecphory is necessarily a conscious event, while Recoding is often unconscious (Tulving, 1982, p. 178). The unconscious Recoding of the Engram is problematic for password performance since binary pass / fail mechanisms typically rely on the precise entry of the password for their security. The GAPS framework guided this study on practical, theoretical and methodological grounds, and Figure 1.2 shows the preliminary Password GAPS Model used in this study. Because the GAPS model focuses on events of cue and recall, the investigator considered it well suited for the purpose of this exploratory study of schemes designed to assist users with the specific events of password generation, recall, and prompted input. Its observables correspond closely with the generation, recall, and input of the password on prompt, and they are actions that can be empirically observed and tested for performance in real world IT authentication situations. Its processes and states provide a framework to understand the internal events necessary for successful recall, and it provides a temporal framework for the methodological treatment of subject groups for this study.

135

Figure 7.1: Password-generation Schemes within the Password GAPS Model

The preliminary Password GAPS Model differs from Tulving’s GAPS in several significant aspects. First, Tulving’s Original Event is ideally combined with other events or memories to increase the unexpectedness of the Password Engram. Second, Tulving’s Interpolated Event consists of the password-generation schemes and password examples

136

Figure 7.2: Study Treatments within the Password GAPS Model introduced by the study’s password-generation instruments, and can optionally include the formulation and writing of the password on physical media as an Observable. Third, Tulving’s Recoding Process is limited to password generation, an event that is optionally observable as well. Fourth, all Recoded Engrams are Password Engrams, and participants were free to write their passwords on physical media, thus blurring the distinction between Hypothetical State and

137

Observable. Fifth, the Retrieval Cue is limited to the Login Prompt. Sixth, Memory Performance is limited to the specific case of successful Password Input. This study utilized three interventions to assist users in the generation, input, and memorability of strong passwords. These interventions, provided to treatment groups during the password generation stage, were (i) providing multiple password-generation schemes, (ii) giving a variety of practical examples of strong passwords generated with them, and (iii) requiring multiple reentry of the newly generated password. Figure 7.1 displays the password-generation schemes within the Password GAPS Model. This study introduced participants to five password- generation schemes, which take the place of Tulving’s Interpolated Event. Participants recoded their Original Events using these schemes to generate their Password Engrams. Because participants could use alternative schemes with which they were familiar, these schemes also qualify as Original Engrams. Figure 7.2 includes the Step 3 password examples and reentries within the Password GAPS Model. Step 3 instruments pro vided gro ups 1 an d 3 with onl y o ne exam ple password per scheme, group 2 with five examples, an d gro up 4 wit h ten e xample s. P articip an ts fro m g roups 1 and 2 entered their passwords twice in the generation stage of Step 3, and participants from groups 3 and 4 entered their passwords fiv e times.

7.3 Think-aloud Findings The advantage of the think-aloud interview process is that it made many of th e hypothetical “elements of episodic mem ory” in the password-generation stages into observables. Tulving’s GAPS Model includes only four observables, and considers the Hypothetical Processes and States to be unobservable, and thus, because the think-aloud interviewees expressed their conscious mental states and t hought p r ocesses d urin g the inte rvie w, thes e hypotheticals within the Password GAPS Model became observable to the in ve stiga tor t hrough self-report data. Table 7.2 displays the details of think-aloud interviewee performance in the study. Because interviewees were not required to reenter their passwords into the online authentication server during the interview, and because the only difference between the groups was the number of example passwords supplied, the investigator did not assign any of them to group 3, which required five entries on enrollment. Because of time constraints and because all interviewees

138

wrote their new passwords on paper, they did not reenter their passwords during the interview. After the interview sessions, the investigator assigned the interviewees to groups 1, 2, and 4, and those in group 4 reentered their passwords four additional times at that time according to the protocol for that group.

Table 7.2: Think-aloud Participant Performance

Password Scheme F. Scheme Failures Attempts % 130390THStNorthBergen Address Address 0 6 0 2apenn-adamsburgroad Address Address 0 8 0 Bluejay1CB323038675309 Own Memory 5 5 100 801SW56TERR,Plantation,FL Address 1 3 33 ILT#MGITa3DBSIaBWASG Acrostic

Group 1 UnknwPrdigy200begttctr Acrostic pineTREESareTREESwithPINE357 Nonsense BUTCHERblockcantstop Nonsense 6 21 29 954748MY0ldNum1s3298 Confession Nonsense 12 14 86 BabyBlue87HondaAccordDX Memory 6 8 75 15055W.TharpesStFL32303 Address myfavoritetripwastocanadA Memory

Group 2 7ksd3fru3fubar@063084 Memory ShoesOffTheOtherFoot Nonsense 18 22 82 C0mm@nder1PWN1$yourG@wd Memory Acrostic 7 9 78 8080ShadyGroveRdGrandridge Address Address 8 15 53 BartendingGetsMeLaid5TimesaDay Confession Memory 1 8 12 MyLoBaCa2SuMeThWoWee Acrostic Own 23 25 92 Group 4 11117SW134thPLACE

Eight of the original nineteen interviewees completed the eleven Steps of the study. Among them, those using the Old Address scheme had the lowest authentication failure rate, and among the groups, group 1 had the lowest failure rate. Most think-aloud interview volunteers found the schemes to be useful, especially when accompanied by multiple examples, but several expressed difficulties encountered in long password generation. Two participants became confused about the Acrostic scheme, and expressed that the examples did not help them in the process, and most interviewees found the Confession scheme example passwords to be humorous. Of the eight participants who completed all eleven Steps and generated a second password, only three of them used the same scheme again.

139

7.4 Log Analysis Findings Log analysis provided a quantitative measure of participants’ performance with their passwords during the study. The Password GAPS Model conceptualized password generation, recall, and input at the prompt as a special category of the verbal cue experiments used in memory research. Rather than the remembering of a random, and often meaningless, TBR item that forms the empirical basis of GAPS, it sought to establish the Password Engram as a fixed episodic memory that was not subject to the recoding and interference of normal episodic memory. The assumption was that the repetition caused by repeated logins would fix the Password Engram as a familiar pattern within the consciou s and tactile experience of the user, thus leading to improved remembrance a nd lower input failure rate s. As reported in Section 5.3, the seven participants using the Acrostic scheme experienced a 65% failure rate, the fourteen using the Old Address scheme had a 68% failure rate, the fourteen using the Confession scheme ha d a 32% failure rate, t he fourteen using the Old Memory scheme had a 47% failure rate, the five using the Unexpected Nonsense scheme had a 31% failure rate, and the eleven using their own scheme had a 52% failure rate. These data suggest a memorability advantage for passwords made using the Unexpected Nonsense scheme, and relative disadvantages for the Acrostic and Old Address schemes. Participants using the Confession and Unexpected Nonsense schemes had the lowest authentication failure rates, while those using the Acrostic and Old Address schemes experienced the highest failure rates. As indicated in Section 6.5, the seventeen participants in group 1 experienced a 32% failure rate, the sixteen in group 2 had a 45% failure rate, the fifteen in group 3 had a 43% failure rate, and the seventeen in group 4 had a 51% failure rate. These data suggest that participants in groups 2 and 4, who were supplied with 5 and 10 example passwords, respectively, had higher failure rates than those in groups 1 and 3. Thus, despite expectations, participants in group 1, which received no special assistance during the password-generation process, outperformed all others in terms of password recall and successful input.

140

7.5 Survey Findings Although this study sought to help users with the management of long passwords, participants were well aware of, and many expressed an interest in the use of other types of authentication. Table 7.3 lists participants’ preferences in alternative authentication methods.

Table 7.3: Alternative Auth entication Preference Authentication Method Responses Biom te ric 3 5 Tokenometric 4 Hash or PIN 4 Behaviorometric 1 Locome tric 1 Multifac tor 7 N/A 4

Participants responded overwhelmingly in favor of biometric methods, including fingerprint, thumbprint, and retina scan. Although many participants expressed a desire for the ease of biometric authentication, most recognized its limitations in hardware requirements and remote application. Seven indicated a preference for the security advantage of multifactor authentication. To frame the issue of password security, Survey 1 inquired about participants’ experiences with online data and identity theft and previous password management. 98.4% of participants indicated a familiarity with the phenomenon of online identity theft, 46.8% reported that they personally knew a victim, and 8% said that they had personally been victims of identity theft. Among the crimes reported were six physical or insider credit card thefts, eleven online credit card thefts, four bank credential thefts, two online community credential thefts, one eBay credential theft, and one PayPal credential theft. The eBay and PayPal thefts were via online “phishing” scams. All of these instances of identity theft reportedly involved the compromise of passwords. Participants in this study were above average in computer skills, and many worked in the IT sector. Despite their familiarity of security issues and identity theft, 90.3% of respondents currently used passwords from six to eleven characters. Table 7.4 lists survey results on

141

Table 7.4: Participant Password Change Frequency Password Cha nge s per Year Pe rcentage of R esp ondents Never 20% <1 32% 1 12% 2 16% 4 8% 6 4% >12 10%

password change frequency. Fully 20% of respondents responded that they never change their passwords unless required to do so by system policy, and 64% reported that they change their passwords once per year or less. 10% responded that they change their passwords at least once per month. To test scheme preference at the end of the study, Survey 8 requested participants to generate a new password. Table 7.5 lists self-report data of the password-generation scheme choices of respondents to Survey 8. Notably, the Unexpected Nonsense and Old Address schemes gained popularity among respondents, while alternative schemes lost favor. Despite the entropy advantage provided by the Acrostic scheme, none of the three participants who completed Survey 8 chose it again. By contrast, fully 60% of participants who used the weaker Old Address scheme and 67% of those who used the Unexpected Nonsense scheme used them again. Participants using acrostically generated passwords had high dropout and failure rates in this study, and it is telling that none of those who completed all eleven Steps made an acrostic password at the end. Passwords made using the Unexpected Nonsense and Old Address schemes are based on familiar common phrases or personal history and often include dictionary words. These features are the sources of the their relative weakness, but make password generation relatively quick and easy. Because password generation is an infrequent creative process, participants were attracted to the ease that these schemes provided in the process.

142

Table 7.5: Password-generation Scheme Choice

Scheme Percentage Final Percentage Final Scheme Percentage Repeated Address Acrostic 6% 4% Nonsense 0% (3) Own Address Confession Address 20% 32% Memory 60% (10) Nonsense Own Address Confession Confession 24% 20% 42% (12) Memory Nonsense Acrostic Address Memory 20% 14% Confession 20% (10) Memory Own Nonsense Address 12% 28% 67% (6) Nonsense Address Own Confession 20% 6% 0% (10) Memory Nonsense

7.6 Chapter Summary This study utilized a modified version of Tulving’s GAPS framework, introduced in Chapters 1 and 2, to investigate areas of potential improvement in the security and usability of robust passwords. This chapter assesses the applicability of the resultant Password GAPS Model for the study of the usability of secure memometric authentication systems. Section 7.2 described the GAPS framework and the modifications made to it for the purpose of this study. Section 7.3 reported think-aloud data from the interviewees in Step 3, and Section 7.4 discussed the log analysis of Sections 5.3 and 6.5 on the conceptual level, and Section 7.5 revisited the survey findings from Sections 5.5 and 6.6. The Password GAPS Model, as formulated and operationalized for this study, proved to be inadequate for the goals of this study. In particular, it failed to predict what interventions would improve password performance. First, although all passwords used in this study were very strong by industry standards, analysis suggested that the Acrostic scheme produced the strongest among them. Nevertheless, even participants who expressed that security was their primary

143

concern did not adopt the Acrostic initially or finally, and the schemes that produced the weakest passwords, the Old Address and Unexpected Nonsense, were the most popular. Second, as reported in Chapter 6, the provision of password examples during the generation stage seemed to have no positive effect on password performance. Third, as reported in Chapter 6, multiple reentries of passwords during the generation stage also had no correlation with improved subsequent password performance. Section 8.6 suggests possible revisions to the Password GAPS model and alternative research designs that may produce improved password performance.

144

CHAPTER 8 CONCLUSIONS AND FUTURE RESEARCH

8.1 Introduction The preceding four chapters reported the findings of this study in the investigation of the specific research questions listed in Section 3.2. This final chapter summarizes these findings, assesses the achievement of study objectives, suggests future research trajectories toward the long-term goal of discovering effective means of assisting users with robust password use, and makes practical recommendations for stakeholders. Section 8.2 evaluates the usefulness of password-generation schemes for meeting the password challenge in terms of the strength, usability, and memorability of the passwords that participants generated using them. Section 8.3 assesses the study’s research design and evaluates the applicability of the study treatments to answering the research question into the effect of supplying examples and requiring reentry on subsequent password performance. Section 8.4 discusses the implications and impacts of this study on IT security stakeholders. Section 8.5 assesses the applicability of the Password GAPS conceptual framework to password research. Section 8.6 suggests further research strategies, including possible reformations of the Password GAPS Model. Section 8.7 makes specific, practical recommendation s in term s of security practice, security policy, and user education. Finally, Section 8.8 summ arizes findings and achieveme nts i n terms of meeting stated objectives.

8.2 Password-generation Scheme Assessment This study allowed participants to use any of five suggested password-generation schemes, or to use one of their own choosing to explore the relative effectiveness of the various schemes. This approach assumed that (i) self-selection would reveal participant preferences among the individual schemes, and that (ii) participant interest in a particular scheme would increase password usability. Thus, it did not directly test the schemes by randomly assigning them to groups. As evident in resultant passwords, Table 5.1, the distinctions between passwords made using these five schemes can be subtle, and the desirable techniques of chunking, scheme

145

combining, and entropy enhancement tend to blur the lines between the schemes. As examples, participants chunked old fragments of (e.g. 1PUMPKINayanna@MANlamar#KIKIvere), incorporated favorite words, nicknames, or numbers (e.g. BenneTt*AmanDa2256+osx5), and combined schemes unexpectedly (e.g. tupacSHAKUR&BONE!east893st). Entropy enhancement also naturally leads to unexpectedness (e.g. seNorCHEEKS!ismyKRazydog, thePurPlePorPoiselosthis19

Table 8.1: Authentication Failure Rate by Scheme Scheme Failures Attempts Failure Rate Acrostic 30 46 65% Address 72 106 68% Confession 35 109 32% Memory 46 98 47% Nonsense 10 32 31% Own 43 83 52%

particularly in terms of entropy per character, the participants who used it experienced a 65% failure rate. This high failure rate suggests that the acrostic password, which is the most cryptic of the five, even in plaintext, is the most difficult to recall and accurately input. The Old Address scheme, although it produced passwords that were generally weak and susceptible to social engineering attacks, and participants who used it suffered an even greater 68% failure rate.

146

Passwords generated using the Confession and Nonsense schemes resulted in the lowest failure rates of all, possibly because of the highly personal or familiar mental images they created in the minds of participants. These findings suggest that the Acrostic affords the highest level of protection, and that the Confession and Nonsense schemes may be most effective for users with recall challenges.

8.3 Research Design Assessment The preliminary Password GAPS Model, Figure 1.2, modified Tulving’s GAPS framework in two fundamental ways specific to the act of remembrance needed to meet the password challenge. First, it incorporated password-generation schemes and practical examples of passwords generated using each scheme, as illustrated in Figure 7.1. Second, it reconceptualized the GAPS Recoding-Ecphory-Conversion sequence into a Generation-Prompt- Input loop, as illustrated in Figure 7.2, to investigate the effect of multiple password reentry during the generation stage on subsequent password performance. Thus, the Password GAPS Model specifically (i) introduced various numbers of example passwords to respective groups, and (ii) required various numbers of immediate password reentry during the password-generation stage to study their effects on subsequent password performance. The first test explored the effect of providing various numbers of practical examples of passwords generated using each scheme. This served to reiterate the GAPS Interpolated Event / Recoding sequence with the aim of recalling the participant’s Original Engrams for use as the Password Engram. The second test explored the effect of iteration of the GAPS central Recoding, Ecphory, and Conversion sequence. The reiteration imposed on treatment groups in this study operationalized this sequence as a mandatory iterative loop to explore the effect such repetition on subsequent recall. Conceptually, the reentry of passwords during the generation stage aimed to reinforce the semantic content, format, and tactile input unique to the password with the assumption that improved subsequent recall and input would result. The investigator expected that examples and reentry during the generation stage would result in improved subsequent password recall and input. Group 1, however, which received only one example password, and which performed only one reentry of the candidate password during Step 3, had the lowest failure rate during subsequent authentication attempts. The seventeen participants in group 1 experienced a 32% failure rate, the sixteen in group 2 had a 45% failure

147

rate, the fifteen in group 3 had a 43% failure rate, and the seventeen in group 4 had a 51% failure rate. These data suggest that participants in groups 2 and 4, who were supplied with 5 and 10 example passwords, respectively, had higher failure rates than those in groups 1 and 3. Thus, despite initial expectations, participants in group 1, who received no special assistance during the password-generation process, outperformed all others in terms of successful password recall and input. Although these findings suggest that the specific interventions used did not affect future password performance, they are nonetheless encouraging because of the great strength of the passwords generated. These findings indicate that there are more powerful factors in play than simple generation stage treatments used in this study, and point to the need for further research, as discussed in Section 8.6.

8.4 Implications and Impacts The findings of this study reflect the complexity of human behavior pertaining to the process of ro bus t memometric authen ticatio n. Previous password res earch has noted the difficulty of getting users to use secure passwords, and although the robust passwords used in this study proved prob lematic f or most participants, nearly a third of participants experience no difficulty a t all with th em. Spec ifically, participan ts found password generati on to be relatively straightforward, but long-term recall and accurate input to be much more problematic. Despite the challenges involved in the management of such robust passwords, 31% of participants experienced a 100% success rate when using their study passwords, and this suggests that future password research needs to focus on ways to target those users who encounter recall or input failure. Section 8.6 includes possible research strategies to enquire specifically into the causes of authentication failure among such users. Participants demonstrated a wide variety of password generation styles, and many used creative combinations of episodic memories and entropy enhancement. It is encouraging that all participants completing Step 3 generated a password with an entropy over 50 bits, and, as evident in Table 4.5, several participants managed passwords containing over 30 characters and over 140 bits of entropy. This level of security is very high by industry standards, and such passwords provide protection against all but the most resourceful attackers. A key finding was that accurate input of robust passwords with entropy enhancement into authentication mechanisms was the greatest cause of authentication failure. To authenticate

148

participants, this study used the .htaccess protocol, which demanded 100% accurate password input for success as per industry standard, and which obfuscates the password during input into the browser window. Thus, participants who otherwise recalled their passwords experienced failure if they wrongly input even one character among the twenty or more. This is partially because of the semantic unexpectedness caused by entropy enhancement within the password and partially because of the obfuscation used by the authentication mechanism. Table 8.1 lists the percentages of respondents who reported that they remembered their passwords, but mistyped them while authenticating for Surveys 2 through 7. Although there was slight improvement over time, mistyping these long passwords remained the major source of authentication failure. According to previous studies, the most common user frustration with password management is recall, and Table 8.2 also lists the percentages of participants who

Table 8.2: Password Input and Recall Error Rates

Survey Mistyped Password Forgotten Password Referred to Hardcopy

2 10.3 % 4.4 % 14.7 %

3 14.3 % 8.2 % 8.2 %

4 9.3 % 0 % 7.4 %

5 3.7 % 1.8 % 3.7 %

6 10.4 % 0 % 4.2 %

7 7.7 % 3.8 % 1.9 %

reported forgotten passwords, and those who needed to refer to written versions of their passwords. Overall, while participants made gradual improvement in password recall over the course of the study, the most distinct trend that emerged was a dramatically lower need to refer to a hardcopy. Thus a significant implication of this study is that, with the password-generation advice and practical examples given, all participants were capable of generating robust passwords. Additionally, nearly one third of users were capable of using 20-character passwords with no further assistance over a seven-week period. Although robust password use led to

149

increased mistyping and recall failure, participants demonstrated improving recall rates and much less need to look at the paper version with repeated use. Another implication of this study is that the specific generation stage interventions used were not sufficient to improve subsequent password recall and input success. Nevertheless, the interventions helped some participants in the creative process of generating passwords. Therefore, password-generation scheme education and example provision can be considered useful for increasing the entropy of long passwords. The findings of this study have many potential impacts on the stakeholders of IT security practice and development. These stakeholders include password users; organizations, system owners, employers, managers, governments, etc. with security policies and digital resources to protect with passwords; system administrators and security professionals who directly manage authentication mechanisms and deal with corresponding user issues; software and hardware developers who deploy memometric authentication; and researchers in HCISec. One impact of this study on users is that password-generation schemes and practical examples using them are beneficial to the process of generating the robust passwords increasingly required of them. This study also demonstrated that many users are quite capable of managing robust passwords without the frustrations of recall and input failure. Another impact is that the confessional and nonsensical password generation schemes are potentially most suited for use by users who are prone to forget passwords. A potential impact for organizations is that the vulnerabilities of weak passwords can be avoided by instituting and enforcing robust password policies, such as requiring passwords over 15 characters and assisting users with them. The advantages of memometric authentication listed in Chapter 2 can be extended, even in today’s threat environment, although more resources should be dedicated to targeted user education and training in the management of robust passwords. Specifically, the significant vulnerability associated with the loss of notebook computers can be dramatically reduced using encrypted drives protected by high-entropy passwords. Another impact for organizations is that single sign-on memometric mechanisms can be used in combination with locometric techniques so that the robust password is required only once for a 12-hour period, and allowing short passwords after interim screen locks. One impact for administrators is that the weakest link in information security is dramatically strengthened by passwords over 14 characters. Another impact is that the

150

exponential strength gains provided by robust passwords enables significant increase in the number of input errors allowed for authorized user login before assuming an attack and locking the account. For example, if a password contains 100 bits of entropy, and if a short delay that is not noticeable to the user is deployed to thwart automated remote attacks, there is only insignificant risk added by allowing dozens, or even hundreds, of additional user authentication attempts before locking the account. This will avoid many of the common costs and inconveniences caused by help desk and administrator intervention. The caveat remains, however, concerning key-logging spyware that can detect the plaintext of any password. Another way of making password generation and recall a more social and holistic experience for the user, as in Bartlett’s model of remembrance, is the “password day” (Burnett, 2006, p. 125-8). Such a password day is an informal, semi-annual event, sponsored by the management, during which the focus is exclusively on organizational information security in the form of password generation and refinement. There are many potential impacts for security software and hardware developers. As noted in Chapter 3, most alternative, multifactor, and threshold authentication schemes still rely on passwords at some level, despite the fact that most of the alternative authentication mechanisms being proposed or developed are intended to overcome the password “problem.” Robust password practice greatly reduces the inherent vulnerabilities of memometrics, and lessens the need for costly, proprietary, non-standard, or even multi-factor alternatives. Another impact of this study is that assistive technologies can synergistically extend the usefulness of passwords. As in all information security research, progress is incremental, and advances are cumulative, but there is a defender's advantage in information security, and technology is appearing to assist user interaction with access control mechanisms. For example, this study found that input error was the greatest single cause of authentication failure, and this could be minimized with technologies such as the “rolling blackout” that displays passwords during input to authorized users while providing variable levels of protection against voyeurs. Another impact is the increasing suitability of single sign-on mechanisms, since long passwords are themselves a cause of frustration for users and they should not be required after short lockouts. Other salient examples are hardware keyrings, robust protocols such as SSH and SSL, and session key amplification techniques. In addition all these technological improvements could

151

be combined with the long passwords such as those used in this study into robust multi-factor mechanisms. A significant impact on HCISec researchers is that 31% of participants in this study proved fully capable of successfully using robust passwords, and that the usefulness of memometric authentication can thus be extended, by focusing on users with difficulties. Much work remains, however, in determining ways to help the majority of users with passwords, and that should be the focus of future password research. The treatments used in the password- generation stage proved to be useful for most participants, at least in terms of generating creative and strong passwords, and the instruments used could be refined and used for future research. Specific recommendations for future research are listed in Section 8.6.

8.5 Conceptual Framework Assessment There were no previously established conceptual frameworks specifically designed for the HCISec study of password usability and memorability. This study turned to cognitive psychology and modified Tulving’s GAPS Model, derived from cue-response memory testing to conceptualize the sequence of events involved in password remembrance. The resultant Password GAPS Model, Figure 1.2, initially modified the GAPS framework in two ways. First, it provided practical examples to iterate Tulving’s Interpolated Event stage for each password- generation scheme introduced to participants. Second, it reconceptualized Tulving’s Recoding- Ecphory-Conversion sequence into a repeatable Generation-Prompt-Input loop to explore the effect of multiple password reentry during the generation stage. Participants in two treatment groups reentered their passwords repeatedly to create a series of events intended to reinforce the idea, the format, and the tactile input unique to the generated test-based password with the assumption that improved subsequent recall and input would result. Study findings, however, indicated no positive correlation between the interventions and improved performance, and this indicates the need to continue the search for means to assist those users who need further assistance. The following section discusses possible reformulations of the Password GAPS Model and strategies for targeted focus on problem users, and sketches research designs to test them.

152

8.6 Future Research This multi-method study asked fundamental questions concerning several aspects of robust password management, and found that all participants were capable of generating very strong and distinctive passwords. Many new research questions emerged, however, especially concerning the improvement of subsequent use of the password. The wide discrepancy between participants’ authentication success rate that emerged in this study, and the lack of correlation between password performance and study treatments suggest greater complexity in the human behavior involved in meeting the password challenge than the investigator initially assumed, and that other factors have greater effect on password performance. The remainder of this section introduces questions and strategies for future research. An obvious future research question concerns how to achieve greater participation and larger treatment groups to more clearly distinguish treatment effects within the wider population of users. Further questions into the reasons for the observed effectiveness of the Confession and Unexpected Nonsense schemes may discover user memory characteristics useful for users of other schemes. Research into reformulating or replacing the Password GAPS conceptual model, with the aim of discovering more effective treatments is needed. Finally, the most pressing future questions concern follow-up assistance for those users who experience authentication failure, and determination of the specific causes of such failure, and specific research into a conceptual model that acknowledges the wide variety of user capacity, and allows for special treatment of problem cases on a contingency ba sis is needed. In many senses, the environment surrounding this study represented a bad case scenario. Participants had no digital resources to protect, nor did they face any penalty for non- compliance. Although they had a chance of winning prizes for completing the eleven steps of the study, they were not stakeholders in the sense of employees of an organization, account holders at a financial institution, or administrators of online resources, etc. Further, although they were computer literate and familiar with password management, many of them became distracted, lost interest in the study, and partially or fully dropped out of it. A simple future research strategy to remedy this phenomenon is simply to rerun the study with live passwords and user accounts in a real-world scenario. For example, with management support, an organization could implement a 20-character password policy with training sessions and the instruments such as those used in this study. Because participants would have a much greater stake in a successful authentication,

153

the dropout rate should be dramatically lower, although resets and help desk requests would consequently be higher. With increased or even daily usage, higher success rates could be expected. If previous authentication logs using weaker passwords were available, a baseline could be drawn to compare resulting differences. In such a scenario, the research questions remain the same, and short online surveys could collect data concerning user experience longitudinally over a much longer timeframe. Because this study explored the feasiblity of using very long passwords, it allowed all participants to write their passwords on paper, or to store them electronically. Although it did not count automatic authentication by browsers in the determination of authentication success, it did not specifically distinguish between reading the written password and memory failure on the part of participants. Altough findings indicated that participants experienced less need to refer to written versions of their passwords with use, future research disambiguating referal to a written password and pure recall is desirable. Much work remains in meeting the password challenge, especially among those users who experience difficulties. It is relatively easy to specify the requirements of a robust password, but inspiring users to dedicate the cognitive will to effectively use them remains problematic. For example, Burnett suggests the following practical tests for robust passwords: 1. More than twenty characters; 2. Inclusion of capitals, numbers, and symbols; 3. Ease of input; 4. Younger than six months; 5. Unique to the world; 6. Securely stored in hardcopy, spreadsheet, or key-ring; 7. Unknown to other people; 8. Three elements; and 9. Resist googling (Burnett, 2006), and the passwords generated in this study easily passed most of these tests. All were over twenty characters, and the vast majority (except joisfromwashingtondc) included capitals, numbers, or symbols, but many proved difficult for participants to recall and input. As examples, MyLoBaCa2SuMeThWoWee resulted in a 92% failure rate, and thePurPlePorPoiselosthis19

154

resisted searches on the google.com engine, but many did not contain three distinct elements. Thus, even Burnett’s recommendations are concerned mostly with the security, not the usability, of robust password management. This continues the trend, noted in Chapter 2, of the systems, rather that the user, approach to memometric authentication. This study took a distinctly user- centric approach, but it did not impose Burnett’s three-element suggestion, and this may prove a promising area for future research. Three unrelated elements naturally create a high degree of unexpectedness within a password, and if these elements were semantic non-sequiturs, they could be composed primarily of easy to type characters, thus increasing input reliability. Future research into the effects of frequency of password use on authentication success rates is desirable. This study explored only the effects of once weekly authentication over a seven week period, but potentially productive alternatives include daily, biweekly, or monthly logins and longer runs to uncover the effects of frequency on performance. Since the Password GAPS Model and the research design built on it failed to achieve significant password performance improvement, the model must be either revised or rejected. A possible problem with the Password GAPS Model may be its reliance on episodic memory. As shown in Table 7.1, Tulving notes that the retrieval of episodic memory necessarily recodes an Engram, and that the episodic Engram is much more susceptible to interference from other mental processes than the semantic Engram. These are highly undesirable features for effective and reliable password recall and input, a specific retrieval case in which the precise formulation of the shared secret must be accurately input for successful and repeatable authentication, and that secret must be distinguished from other secrets shared with other authentication systems. A potential strategy to overcome this is to use semantic memory as the basis for passwords. As Tulving notes, semantic memory is much less susceptible to Recoding and interference than episodic memory. Although these characteristics are desirable, passwords based on semantic memory could fail some of Burnett’s tests listed above. For example, most well known semantic memory could not be younger than six months, unique to the world, unknown to other people, or resist googling. Long semantic passwords using unexpected, semantically unrelated elements and entropy enhancement techniques could prove effective, and after such modification, they would also share the episodic traits of being unique to the rememberer, and being based on a creation event. Such hybrid memories blur the semantic / episodic distinction and thus may again become susceptible to Recoding and interference.

155

Research into overcoming the unwanted effects of Recoding and interference is central to increasing the effectiveness of passwords based on episodic memories. Just as the details of vivid and emotional dreams quickly fade for those who do not write them down, the details of even the

Figure 8.1: Reformulated Password GAPS Model

156

most creative, humorous, and distinctive of passwords can fall victim to the forces of Recoding and interference. One possible strategy is to reformulate the Password GAPS Model as indicated in Figure 8.1. This approach conceptualizes the formulation and recall of events surrounding the Password Engram as a specific Recollective Experience in the user’s life. Because the Recollective Experience, in Tulving’s model, is the end of an act of remembering and the conscious awareness of Ecphoric Information, which in turn is the product of the experience of conscious Ecphory and a “subset of the information available in the Engram” (Tulving, 1982, p. 184), in the reformulated Password GAPS Model, such a subset could be consciously limited to a fixed expression of a unique, personalized password that may prove more resistant to the Recoding and interference tendencies of episodic memory. Thus, a reformulated Password GAPS Model might overcome the limitations of episodic memory by making the Recollective Experience as the central concept in the loop. The reformulated Password GAPS Model in Figure 8.1 conceptualizes password recall as a loop that situates the Recollective Experience as a central conscious event in the cycle of password ecphory. Such cognitive fixing of the new password generation sequence in the participant’s Recollective Experience might serve to reinforce the idea, the format, and the tactile input unique to the generated text-based password and to improve subsequent performance. In this model, the Recollective Experience in turn becomes an Original Event, but Recoding is suppressed by emphasizing it strictly as identical to the Password Engram and the Ecphoric Information to be retrieved at the login prompt. To operationalize this, users who encounter recall failure can be invited to subsequent sessions to recall the Recollective Experience and further fix it in the consciousness through repetition. Future research should enquire into ways to assist users who experience difficulties. A first step is to recognize users who have no problems, then focus on the others by testing further steps for them. Online password reminder mechanisms could be used to help whenever users experience recall failure. Participants could be required to make and secure a hardcopy of their passwords for future reference. A promising approach could apply contingency theory to the problem by recognizing that the population is composed of users with vastly different capacities of managing robust passwords. This study found that 31% of its participants were able to authenticate themselves successfully seven times over seven weeks with the instruments provided in Step 3. Since the other 69% of participants experienced some degree of failure,

157

further assistance could be given to them, and users that experience authentication failure could be invited to further training or repeat the password-generation session until the password becomes fixed. The range of participant success in managing their long passwords is consistent with the history of memory thought surveyed in Chapter 2. Plato noted the variance of human mental ability, and argued that individual memory capacity was analogous to the size of the “wax tablet” on which impressions of “signet ring” were made. Aristotle argued that emotional resonance caused strong memories and served as a hook for recall. Ebbinghaus, the pioneer of empirical memory research, argued that “very great is the dependence of retention and reproduction upon the intensity of the attention and interest which were attached to the mental states the first time they were present” (Ebbinghaus, 1913, p. 2). Later researchers discovered individual differences in long-term memory, general knowledge, learning, and memory retention. Miller saw these limitations and how chunking partially overcomes them, and Gasser applied chunking to phonemic IT passwords. Adams and Sasse argue for proactive password construction using feedback during the password construction process to increase awareness of system security and its importance. Noting the trouble that users have remembering the precise formulation of a password, Vu argues that the solution is a matter of memory self-training. Further research into leveraging emotion and meaning as the defender’s advantage is promising for both usability and security.

8.7 Recommendations This study differed from previous password studies by requiring all participants to use passwords containing at least twenty characters, and all findings and recommendations concern the management of such robust passwords. The online security situation will undoubtedly continued to deteriorate as attackers apply greater resources to the enterprise of cracking passwords, and stronger passwords are a necessary requirement of effective security. This section makes pragmatic recommendations for the major stakeholders in IT security based on study findings. Users should use long passwords when protecting anything of value on insecure networks and computers, such as confidential information, bank accounts, important email and shell accounts, servers providing online services, etc. This study found that at all participants were

158

capable of generating passwords containing over 50 bits of entropy, and many achieved 100% authentication rates using these passwords. Although 69% of participants experienced some degree of authentication failure, the investigator recommends long passwords. Passwords made using the Acrostic scheme should be used for maximum strength, and passwords made using the Confession and Unexpected Nonsense schemes should be used for greatest memorability. All of Burnett’s tests are useful for increasing the strength of a password, and his advice concerning the incorporation of three semantically unrelated elements, are recommended applications of memory chunking. Users should also write their important passwords down, preferably in an obfuscated way decipherable only to them, and store these hardcopies as they would any other valuable. If necessary, the robust passwords that users make can be further deployed to protect encrypted folders, spreadsheets, databases, etc. containing still other passwords. Organizations should mandate, or at least encourage, the use of strong passwords and introduce users to the schemes and examples found in the Appendices, since they have been found to help users create good passwords. This should be done as part of a larger strategy of dedicating more resources to user education and assistance, as essential components of overall IT security. Because stronger passwords have a much longer effective life, their resistance to current attacks enables much more lenient lockout and aging policies that can thus reduce two major sources of user frustration. The cost effectiveness of deploying assistive technologies, such as the rolling blackout, single sign-on protocols, etc, should be considered as they become available. Dedication of resources to password education, such as the preliminary instruments and examples used in this study, and a periodic password day, should also prove cost effective for organizations. Especially in the case of traveling notebook computers containing sensitive data, the investigator highly recommends hard drive encryption with robust password, and possibly multifactor, authentication. System and security administrators should recognize the advantages of training users in robust password management. Upfront and personal attention to password generation, exposure to generation schemes, examples, and tolerance should result in greater overall security and minimization of the need for resets and aggressive aging policies. With the exponential gains achievable by long passwords, the threats from brute force, dictionary, and rainbow table attackers is dramatically reduced, and greater attention can then be given to social engineering attacks such as phishing, and subterfuge attacks such as keylogging. In a corporate environment,

159

robust password user training should engage users in the security of the enterprise and decrease any existing hostility between IT and other departments. Burnett’s concept of the password day merits consideration, since it focuses needed attention on the security problem in a non- threatening environment. Training in a governmental setting depends on the level of security involved, and robust multifactor authentication is no doubt more feasible in tightly controlled security environments. Password training in an academic institutution could take advantage of the relatively high computer literacy of students, and personal users should have access to online or operating system resources to explain the need and practice of password security. In all of the above scenarios, this study’s instruments and examples could be refined and made available online for user training, and wikis or blogs could be initiated via internet or intranet to further the discourse of robust password culture. Developers should continue the refinement of such excellent secure password and session key amplification protocols as SSH and SSL, and focus on other means of harnessing the computer’s power to exploit the differences between authorized users and attackers. The most salient example is Tognazzini’s and Blaser’s rolling blackout used in Tresor 2.2 (Tognazzini, 2005), especially since this study found that the single greatest cause of authentication failure among participants was the inaccurate input of the long password. Participants reported that they often mistyped one or more of the twenty characters in their passwords, which they otherwise recalled. This is because current mechanisms, such as the .htaccess protocol used in this study, in the attempt to thwart shoulder surfing and screen-scraping attacks, obfuscate passwords from users during the input stage, exactly when authorized users need to see them. Additionally, multifactor authentication mechanisms, and single sign-on protocols, perhaps using biometrics or locometrics, could increase security and minimize robust password reentry requirements after temporary screen locks. Finally, protocols and mechanisms, in gener al, should have variable security levels, to allow for different scenarios and security policies. The ultimate purpose of this study remains unfulfilled, and researchers should investigate interventions that will result in increased authentication success rates. The password-generation scheme education and example provision used in this study proved useful for the creative process of generating long, high-entropy passwords, and the investigator recommends them as a baseline for research into more effective instruments. The investigator also recommends taking the

160

research into real-world situations to better understand the behavioral complexity of the population of users in the search for the elusive balance between security and usability.

8.8 Conclusions This study sought to contribute to the usability of human interaction with secure memometric authentication systems by discovering effective means of assisting users in the generation of strong and usable passwords. To this end, it identified five password-generation schemes, and included them in instruments designed to assist participants in formulating passwords containing twenty or more characters. The twenty-character threshold was shown to be sufficient cryptographic strength for protection against the vast majority of current attacks, and passwords used in this study ranged from 50 to over 160 bits of entropy. The schemes

Acrostic Own 11% 17 %

Unexpected Ol d Nonsense Address 9% 21%

Ol d Memory 21% Confession 21%

Figure 8.2: Initial Scheme Preference introduced to participants were the Acrostic, the Old Address, the Confession, the Unexpected Nonsense, and the Old Memory, and participants were free to use any among them, or to use any alternative with which they were familiar. As indicated in Figure 8.2, participants showed a slight preference for the Old Address, the Confession, and the Old Memory schemes. In the final Step of the study, participants generated a new password; again using any scheme of their preference, and Figure 8.3 displays those final scheme choices. After seven weeks of using their

161

initial passwords, participants showed a slightly increased preference for the Old Address and Unexpected Nonsense schemes, and a dramatically decreased preference for their own and the Acrostic schemes.

Own Acrostic 5% 2%

Ol d Unexpected Address Nonsense 32% 28%

Ol d Memory Confession 13 % 20%

Figure 8.3: Fina l Schem e Preferen ce

This study also enquired into the effectiveness of providing practical password examples and multiple password reentry immediately after generation on password performance. Despite the expectations of the investigator, Table 8.2 shows no positive correlation between the treatments listed in Table 3.2 and subsequent password performance. Specifically, group 1, which received only one example password, and which was required to reenter the new password only once, actually had a lower failure rate than all other groups. On the other hand, group 4, which was supplied with ten example passwords and was required to reenter passwords four additional times, actually experienced the highest failure rate. These findings suggest that the password problem is complex, and that other factors have greater effect on password performance. As further evidence, 31% of participants experienced no problems whatsoever, in spite of the treatment group they were in, while other participants had repeated failure even with their username, or the relation between their username and their password. Table 8.3 lists

162

Table 8.3: Authentication Failure Rate by Group

Group Failures Attempts Failure Rate 1 36 113 32% 2 50 111 45% 3 43 99 43% 4 76 148 51% Total 205 471 44% the percentages, by treatment group, of these extreme cases of participants who experienced no authentication failure, and of those who experienced total failure. These observations further suggest a negative correlation between study treatments, since 44% of participants who encountered no authentication

Table 8.4: No Failure and Total Failure by Group Group No Failure Total Failure 1 44% 30% 2 25% 20% 3 12% 20% 4 19% 30% failure were in group 1 and received no intervention. The participants who encountered profound difficulties at every step were almost evenly spread among the study groups, and this also confirms the negative correlation between treatment and password performance. Positive findings were that supplying example passwords led to stronger and more distinctive passwords. Several think-aloud interviewees expressed surprise at the scope and variety of possible passwords, and others quickly recognized the benefits of chunking and entropy enhancement and combined schemes and techniques in highly personalized ways. The Acrostic scheme produced the strongest passwords, but that the Confession and Unexpected Nonsense schemes produced more memorable ones. The writing of the password on physical media also proved useful to many participants, and reduced the number of manual resets required. This study experienced a high dropout rate among participants. Although the investigator personally invited dozens of candidates to participate by means of public and private invitation, only 139 actually signed consent forms. Of these, 65 complete Step 3 and generated a study

163

password, and only 51 completed all eleven steps of the process. Although the investigator offered prizes in a raffle among participants who completed all steps, this study demanded a lot of interaction from participants and the high dropout rate was not surprising. There is evidence that dropouts were primarily due to loss of interest or miscommunication, and not because of the cognitive load introduced by having to remember the long password. Participants informed the investigator that the email filtering disruptions discussed in Chapter 5, and apathy were the two main reasons for dropping out. Dropouts may have slightly colored the observed success rates, but 78% of those monitored did not drop out. This study was exploratory in nature, and was designed to better understand the nature of the password challenge presented by the need for long passwords. Its aim was not predictive, but to point to future research into meeting this HCISec challenge. The investigator sought to improve user performance with memometric authentication systems since information security experts had widely considered the user as the weakest link in IT security. Despite the encouraging performance of nearly one third of participants, this study reinforces that conclusion and points to the continuing difficulty of engaging users in security processes. Memometric authentication is both art and science. The art is to make passwords usable in terms of generation, recall, and accurate input, and the science is to make them secure from attackers. Incremental progress continues on both of these seemingly opposing trajectories, but the user remains the greatest cause of authentication failure. This study found that over 30% of participants were capable of robust password management using password-generation schemes, and that input error, rather than outright forgetfulness, was the greatest cause of authentication failure. This suggests that the most promising work remains in identifying the specific causes of authentication failure among other users who experience difficulties, and developing effective means of improving the password performance of those users.

164

APPENDIX A GLOSSARY OF TERMS (some paraphrased from wikipedia.com)

ASCII: the American Standard Code for Information Interchange

Collision attack: A collision attack on a cryptographic hash looks for different inputs to produce the same hash value (hash collision). Unlike a preimage attack, a collision attack normally aims to find an alternate input that still makes sense, rather than just a nonsense input.

Cracker: A cracker is a malicious hacker who seeks to exploit computer systems and resources.

DES: The Data Encryption Standard was an early encryption algorithm with a 56-bit key. Although superceded by AES, it is still widely used as 3DES, or Triple DES with a 128-bit key.

Dictionary attack: A dictionary tries to crack a password by searching possibilities that are most likely to succeed, typically derived from a list of words in a dictionary.

Dongle: A dongle is a small hardware device that connects to a computer to authenticate some piece of software or a user.

Dumpster diving: Dumpster diving is the practice of rummaging through commercial or residential trash to find useful items that have been discarded.

Ecphory: Term used by Semon and Tulving to denote the internal event of recall of an engram in response to a cue.

Engram: An engram is a hypothetical means by which memory traces are stored as physical or biochemical change in the brain (and other neural tissue) in response to external stimuli. The existence of engrams is posited by some scientific theories to explain the persistence of memory and how memories are stored in the brain.

Entropy: Entropy is a measure of the probability distribution of the disorder within a system, and in information systems it can be viewed as a measure of the lack of information in a sequence. Alphabet size, length, and randomness together determine the cumulative entropy of a password.

GAPS: The General Abstract Processing System is Endel Tulving’s conceptual framework for studying episodic memory.

GUI: The Graphical User Interface, in contrast to the Command Line Interface (CLI), uses graphics, iconic themes, and various input devices such as the mouse, in addition to the keyboard, for human computer interaction.

Hash: A hash is a one-way cryptographic function that transforms the plaintext password into an obfuscated, yet thoroughly deterministic, ciphertext, and that is nearly impossible to reverse.

165

HCI: Human-Computer Interaction is the academic study of human factors surrounding computing. The ACM uses the alternative acronym CHI.

HCISec: Human-Computer Interaction Security. A sub-discipline of HCI dealing specifically with the impact of security on HCI.

.htaccess: .htaccess is a hypertext protocol designed to password-protect web content.

Keylogging: Keylogging was originally a diagnostic used in software development to capture the programmer’s keystrokes. It is now incorporated into spyware to bypass other security measures and obtain passwords, credit card numbers, or other sensitive data.

Keyring: A keyring is an encrypted IT hardware device (usually USB) that stores encryption keys. It can be password protected and contain other passwords.

LM, LanMan, LANManager: A widely used legacy Microsoft network protocol that creates a weak hash of the password.

Man in the middle attack: A man-in-the-middle attack allows an attacker to read, insert content, and modify at will, messages between two parties without the knowledge of either party.

Packet sniffing: Packet sniffers intercept and log network TCP/IP packet traffic, and then decode and analyze content.

Passphrase: A passphrase is type of long password made of semantic strings of words and typically including spaces characters.

PGP: Pretty Good Privacy. A robust, highly adaptable cryptosystem developed by Phil Zimmermann. Originally open-source only, it is now also a commercial product.

Pharming: Pharming exploits DNS server software to acquire the domain name for a site, and to redirect it, typically to a fraudulent site.

Phishing: Phishing, a pun derived from “password harvesting,” is a form of online fraud using social engineering techniques. It is characterized by attempts to fraudulently acquire sensitive information, such as passwords and credit card details, by masquerading as a trustworthy person or business in an apparently official electronic communication. Phishing is typically initiated with email or an instant message.

Preimage attack: The preimage attack on a cryptographic hash is an attempt to find a message that has a specific hash value. There are two types of preimage attacks: First: given a hash h, find a message m such that hash(m) = h. Second: given a message m1, find a message m2 such that hash(m2) = hash(m1).

166

Rainbow table: A rainbow table is a special type of lookup table offering a time-memory tradeoff used in recovering the plaintext password from a ciphertext generated by a one-way hash. A common application is to make brute force attacks against hashed passwords more feasible.

Rootkit: A rootkit is a set of software tools deployed after gaining access to a computer system. The rootkit conceals running processes, files, or system data from authorized users.

Salt: Salt is a string of random bits often appended to a password before hashing to increase the entropy of the password and to ensure that even if two users use the same password, the resultant hashes will be distinct.

Screen scraping: Screen scraping is a technique in which a computer program extracts data from the display output of another program. The program doing the scraping is called a screen scraper. The key element that distinguishes screen scraping from regular parsing is that the output being scraped was nominally intended for human consumption, not machine interpretation.

Script kiddie: A script kiddie is a derogatory term for inexperienced crackers who use scripts and programs developed by others.

Search Space: The search space of a password is the alphabet from which it is drawn. On an American English keyboard, it is typically the set of 94 printable ASCII characters, but it can be extended with additional ASCII and Unicode characters.

Shoulder surfing: Shoulder surfing refers to using direct observation techniques, such as looking over someone’s shoulder, to get information.

Social Engineering: Social engineering obtains confidential information by manipulation of legitimate users, usually against policies. By this method, social engineers exploit the natural tendency of a person to trust his or her word, rather than exploiting computer security holes.

Social Hacking: Social hacking encourages activity among online groups, often by breaking social norms. Social Hacking is closely related to hacktivism, the fusion of hacking and activism, and attempt to leverage small causes to yield large effects in society.

Smart Card: A smart card, is a microprocessor card of credit card dimensions with various tamper-resistant properties and capable of providing security services.

Spamming: Spamming is the abuse of electronic messaging systems to send unsolicited, bulk messages, usually in the form of email spam.

TBR Item: The “To Be Remembered” item supplied by researchers to participants in cue-recall memory testing.

167

Trojan Horse: The Trojan horse is a malicious program covertly delivered by legitimate software. Often the term is shortened to simply trojan, even though this turns the adjective into a noun, reversing the myth

Zombie: A zombie computer has been compromised by a security cracker, a computer virus, or a trojan horse. Often one of many in a “botnet,” the zombie performs malicious tasks under remote direction, unknowingly to its owner.

168

APPENDIX B HUMAN SUBJECTS APPROVAL AND INFORMED CONSENT

169

APPENDIX C USER TESTING INSTRUMENTS INCLUDING THINK ALONG PROTOCOL AND PRELIMINARY QUESTIONNAIRE

C.1 Password Generation – Thinking Aloud The increasing threat to user identities and data requires longer passwords than people are accustomed to use. This study investigates ways to assist users in the management of long passwords to protect them from online fraud. We will look at five different password generation schemes.

1. Take a few moments and read the paragraph describing the each scheme.

2. Which scheme se ems most suited for you to use?

3. What specific advantages does this scheme have for you over the others in terms of usability, ease of use, and memorability?

4. Using example passwords provided, generate your own password and enter it into the system as requested.

5. What difficulties did you have in the process?

6. You may use your new password for other servers if you like.

7. You will be requested via email to input this new password weekly during this study.

8. Would you like a paper printout of the password in case you may forget it? Yes ___ No ___

170

C.2 Preliminary Questionnaire

1. Have you ever heard of online data or identity theft? a. Yes ___ b. No ___ c. Don’t Know ___

2. Do you know any one who has experienced online data or identity theft? a. Yes ___ b. No ___ c. Don’t Know ___

3. Have you ever experienced online data or identity theft? a. Yes ___ b. No ___ c. Don’t Know ___

4. How many c haracters were in the first password you ever used? ______

5. a. How many total characters were in the longest password you have ever used? ______(Please do not reveal the password!) b. How many of those characters were upper case? ______c. How many of those characters were numbers? ______d. How many of those characters were symbols or other special characters? ______

6. Which Passwo rd generation scheme did you use to make your password in this study? a. very old address b. unexpected nonsense c. acrostic of song, verse, poem, etc. d. episodic memory event e. confession / embarrassment f. my own – Please describe:______

7. Why did you use it instead of the others? a. ease of use b. familiarity c. humor d. other – Please explain: ______

171

APPENDIX D FINAL QUESTIONNAIRE

1. Which Password generation scheme did you use to make your password? a. very old address b. unexpected nonsense c. acrostic of song, verse, poem, etc. d. episodic memory event e. confession / embarrassment f. my own Please explain:______

2. Why did yo u use it instead of the others? a. ease of use b. familiarity c. humor d. other Please explain:______

3. How many times did you enter your password during the study? a. 0 b. 1 c. 2 d. 3 e. 4 or more

4. How much trouble did you have trouble remembering the password? a. none at all b. very little c. moderate d. quite a bit e. too much Please explain:______

5. How much trouble did you have trouble inputting the password? a. no n e at all b. very little c. moderate d. quite a bit e. too much Please explain:______

172

APPENDIX E GROUP 1 PROTOCOL

The increasing threat to user identities and data requires longer passwords than people are accustomed to use. This study investigates ways to assist users in the management of long passwords to protect them from onl ine fraud. We will look at five different password generation schemes. 1. Take a few moments and read the paragraph describing the each scheme. 2. For this te st, you are to generate a password of at least twenty characters using any of the five following methods, or a method of your own choosing. These methods are designed to help you come up with memorable passwords that are relatively easy to input, despite their length. Strengthen your password by: a. Running all individual words together, b. Using the caps lock key unexpectedly, and c. Include unexpected numbers, punctuation, or symbols.

A. The Old Address. A simple password-generation technique is to spell out an old, unforgettable address, such as “819 Ash St., Keokuk, Iowa.” Strengthening yields passwords like: 819ASHST>Keokuk,iowa

B. Unexpected Nonsense. Unexpectedly nonsensical passphrases can be easy to recall and input, and its imagery makes it easy to remember. The nonsense phrase “Pink curtains meander across the ocean” can become: PINKCURTAINSmeanderacrosstheocean

C. The Acrostic. The acrostic draws only one letter (usually the first) of each word, of an easily remembered phrase. Thus, Hamlet’s line, “Whether ‘tis nobler in the mind to suffer the slings and arrows of outrageous fortune, or to take arms against a sea of troubles, and by opposing end them” can become: w’tnitmtstsaaooFORTUNE

D. The Old Memory. Use a personal experience to generate an answer to one of the following questions: • What is the first and last name of your first boyfriend or girlfriend? • Which phone number do you remember most from your childhood? • What was your favorite place to visit as a child? • Who was your favorite actor, musician, or artist when you were 16?

Try to draw on remote, yet unforgettable memories. A possible answer to the third question is “My favorite place was Lake Geode” After strengthening, it becomes: myfavoriteplacewaslakeGEODE

E. The Confession. The sharing of passwords with others can be discouraged by the use of embarrassing or confessional passwords. An example is the following: “I pick my nose at stoplights,” which becomes:ipickmyNOSEatstoplights

3. Which scheme seems most suited for you to use?

173

4. What specific advantages does this scheme have for you over the others in terms of usability, ease of use, and memorability?

5. Using the above password-generation schemes, or using one of your own, generate your own password. You may write it on paper and keep that hardcopy secure until you feel that it is no longer needed.

6. Enter the password into the web browser based authentication system, and reenter it to confirm.

7. Wha t difficulties did you have in the process?

8. You may use your new password for other servers if you like.

9. You will be requested via email to input this new password weekly during this study.

10. Would you like a paper printout of the password in case you may forget it? Yes ___ No ___

174

Preliminary Questionnaire

1. Have you ever heard of online data or identity theft? a. Yes ___ b. No ___ c. Don’t Know ___

2. Do you know any one who has experienced online data or identity theft? a. Yes ___ b. No ___ c. Don’t Know ___

3. Have you ever experienced online data or identity theft? a. Yes ___ b. No ___ c. Don’t Know ___

4. How many characters were in the first password you ever used? ______

5. a. How many total characters were in the longest password you have ever used? ______(Please do not reveal the password!) b. How many of those characters were upper case? ______c. How many of those characters were numbers? ______d. How many of those characters were symbols or other special characters? ______

8. W h ich Password generation scheme di d you use to make your password in this study? a. very old address b. unexpected nonsense c. acrostic of song, verse, poem, etc. d. old memory e. confession f. my own – Please describe:______

9. W h y did you use it instead of the others? a. ease of use b. familiarity c. humor d. other – Please explain: ______

175

APPENDIX F

GROUP 2 PROTOCOL

The increasing threat to user identities and data requires longer passwords than people are accustomed to use. This study investigates ways to assist users in the management of long passwords to protect them from online fraud. We will look at five different password generation schemes. Please complete the following steps: 1. T ake a few moments and read the paragraph describing the each scheme. 2. For this test, you are to generate a password of at least twenty characters using any of the five following methods, or a method of your own choosing. These methods are designed to help you come up with memorable passwords that are relatively easy to input, despite their length. Strengthen your password by: a. Running all individual words together, b. Using the caps lock key unexpectedly, and c. Include unexpected numbers, punctuation, or symbols. The most effective means to strengthen any password is to make it longer. You may notice that some of the longer examples contain almost entirely lower case letters for ease of input. This is encouraged; if your password is longer than twenty-five characters, you do not need t o include special characters or even capitals.

A. The Old Address. A simple password-generation technique is to spell out an old, unforgettable address. Thus, “819 Ash St., Keokuk, Iowa” becomes something like: • 819ASHST>Keokuk,iowa • 10008JUNIPERdr.o.p.ks • 678GAUVA,chulavista,ca • 2445collins.M>BEACH.fl • TYCOBBRD.3306tycobbrd. • 806nninth,LINCOLNtall.fl32310 • 3445stuart_APTBBURLINGTON • greaterkailash-IP> Believe it or not, these are all former addresses (and passwords) of mine. They are very strong passwords, but are easy to type using the caps lock key. I cannot forget them, although I must take ca r e to remember the unexpected capitalization and characters.

B. Unexpected Nonsense. Unexpectedly nonsensical passphrases can be easy to recall and input, and its imagery makes it easy to remember. For example, the nonsense phrase “Pink curtains meander across the ocean” can become: • PINKCURTAINSmeanderacrosstheocean • hismother’sbeardISHALFCONSTRUCTED • colorlessGREENIDEASdreamfuriously • thenewangelskillALLPLANETwaves

176

• aSQUARErootoftheCIRCULARpi • thepregnantSNAKEOFFINNIGAN • FLITSTHEelephantinarubarbtree • twofinedaysintheMIDDLEofnight • OH!PENSAYS:ohmeohmeohme • theCLAPOFONEhandsounding These nonsense phrases contain colorful ideas that are easy to remember and type. The trick is to remember the exact formulation of the capitals and punctuation.

C. The Acrostic. The acrostic draws only one letter (usually the first) of each word, of an easily remembered phrase. Thus, • Hamlet’s, “Whether ‘tis nobler in the mind to suffer the slings and arrows of outrageous fortune becomes: w’tnitmtstsaaooFORTUNE • John 3:16: For God so loved the world that he gave his only begotten Son: FgodsltworldthghobSON • Francis Scott Key: O say, does that star-spangled banner yet wave? Os,dtstarspangledbanneryw? • Milton: How soon hath Time, the subtle thief of youth, stolen on his wing my three and twentieth year! hshtime,thstoy,sohwm3a20y! • Shakespeare’s Sonnet 116: If this be error and upon me proved, I never writ, nor no man ever loved. itbeaumproved,inw,nnmeloved • Johnny Cash: I keep a close watch on this heart of mine…Because your mine, I walk the line. Ikacwothominebymineiwtline • Beatles: Obladi oblada life goes on bra. Lala how the life goes on. obladiobladalgob.lalahtlgo • Ray Charles: Tell your momma, tell your pa, Gonna move you back to Arkansas. tym,typa,gmybtARKANSAS • Bob Dylan: The answer my friend is blowin’ in the wind, the answer is blowin’ in the wind. tamfiblowin’itw,taibitwind • Qur’an: In the name of God, most Gracious, most Compassionate. inthenameofALLAH

D. The Old Memory. Try to draw on remote, yet unforgettable memories. Use a personal experience to generate an answer to one of the following questions: 1. What was your favorite place to visit as a child? 2. What is the first and last name of your first boyfriend or girlfriend? 3. Which phone number do you remember most from your childhood? 4. Who was your favorite actor, musician, or artist when you were 16? 5. What was the first movie you saw on a date, and with whom? 6. What was your favorite book as a child? 7. What bra nd of beer was the first you ever drank? 8. Who was the first teacher you had a crush on? 9. What were the make, model, color, and style of your first car?

177

10. What was the first foreign country (or distant state) you visited, and when? A possible answer to the first question is “My favorite place was Lake Geode” After strengt h ening, it becomes: 1. myfavoriteplacewaslakeGEODE 2. SUSANmyfirstgirlfriendDELANO 3. friendscalledmeatMI9-5726inkansas 4. at16,iwasintoclint’thegood’eastwood 5. myfirstDATEMOVIEwasstarwarswithcheryl 6. myfavoritebookasaKIDwasasimov’sfoundation 7. thefirstBEERieverdrankwasfalstaff 8. in3rdgrade,ihadacrushonms.Dumont 9. myfirsttruckwasagreen’51Chevy3100pickup 10. thefirstforeigncountryivisitedwasindiain’82

E. The Confession. The sharing of passwords with others can be discouraged by the use of embarrassing or confessional passwords. An example is the following: “I pick my nose at stoplights,” which becomes: • ipickmyNOSEatstoplights • iamhavingaBADHAIRlife • iv otedforBUSH/cheneyin’04 • ireallydon’tthinkthatmyfartsstink • inthe90’s,ithoughtMadonnawashot • oh,i’msorry.ithoughtyouwereawoman • thePHBisfatandstupid,andhasbadbreath • ifihadthenerve,iwouldcomeoutofthecloset • what’sWRONGwitheatingyourBUGERS? • whenilookinthemirror,iknowthatmybuttistoobig

3. Which scheme seems most suited for you t o use? 4. What specific advanta ges does this scheme have for you over the others in terms of usability, ease of use, and memorability? 5. Using t he above password-generation schemes, or using one of your own, generate your own password. You may write it on paper and keep that hardcopy secure until you feel that it is no longer needed. 6. Enter the password into the web browser based authentication system, and reenter it to confirm. 7. What difficulties did you have in the process?

8. You may use your new password for other servers if you like.

9. You will be requested via email to input this new password weekly during this study.

10. Would you like a paper printout of the password in case you may forget it? Yes ___ No ___

178

Preliminary Questionnaire

1. Have you ever heard of online data or identity theft? a. Yes ___ b. No ___ c. Don’t Know ___

2. Do you know any one who has experienced online data or identity theft? a. Yes ___ b. No ___ c. Don’t Know ___

3. Have you ever experienced online data or identity theft? a. Yes ___ b. No ___ c. Don’t Know ___

4. How many characters were in the first password you ever used? ______

5. a. How many total characters were in the longest password you have ever used? ______(Please do not reveal the password!) b. How many of those characters were upper case? ______c. How many of those characters were numbers? ______d. How many of those characters were symbols or other special characters? ______

5. Which Password generation scheme did you use to make your password in this study? a. very old address b. unexpected nonsense c. acrostic of song, verse, poem, etc. d. old memory e. confession f. my own – Please describe:______

6. Why did you use it instead of the others? a. ease of use b. familiarity c. humor d. other – Please explain: ______

179

APPENDIX G GROUP 3 PROTOCOL

The increasing threat to user identities and data requires longer passwords than people are accustomed to use. This study investigates ways to assist users in the management of long passwords to protect them from onl ine fraud. We will look at five different password generation schemes. 1. Take a few moments and read the paragraph describing the each scheme. 2. For this test, you are to generate a password of at least twenty characters using any of the five following methods, or a method of your own choosing. These methods are designed to help you come up with memorable passwords that are relatively easy to input, despite their length. Strengthen your password by: a. Running all individual words together, b. Using the caps lock key unexpectedly, and c. Include unexpected numbers, punctuation, or symbols.

A. The Old Address. A simple password-generation technique is to spell out an old, unforgettable address, such as “819 Ash St., Keokuk, Iowa.” Strengthening yields passwords like: 819ASHST>Keokuk,iowa

B. Unexpected Nonsense. Unexpectedly nonsensical passphrases can be easy to recall and input, and its imagery makes it easy to remember. The nonsense phrase “Pink curtains meander across the ocean” can become:PINKCURTAINSmeanderacrosstheocean

C. The Acrostic. The acrostic draws only one letter (usually the first) of each word, of an easily remembered phrase. Thus, Hamlet’s line, “Whether ‘tis nobler in the mind to suffer the slings and arrows of outrageous fortune, or to take arms against a sea of troubles, and by opposing end them” can become: w’tnitmtstsaaooFORTUNE

D. The Old Memory. Use a personal experience to generate an answer to one of the following questions: • What is the first and last name of your first boyfriend or girlfriend? • Which phone number do you remember most from your childhood? • What was your favorite place to visit as a child? • Who was your favorite actor, musician, or artist when you were 16?

Try to draw on remote, yet unforgettable memories. A possible answer to the third question is “My favorite place was Lake Geode” After strengthening, it becomes: myfavoriteplacewaslakeGEODE

E. The Confession. The sharing of passwords with others can be discouraged by the use of embarrassing or confessional passwords. An example is the following: “I pick my nose at stoplights,” which becomes: ipickmyNOSEatstoplights 3. Which scheme seems most suited for you to use?

180

4. What specific advantages does this scheme have for you over the others in terms of usability, ease of use, and memorability?

5. Using the above password-generation schemes, or using one of your own, generate your own password. You may write it on paper and keep that hardcopy secure until you feel that it is no longer needed.

6. Enter the password into the web browser based authentication system, and reenter it five times to confirm it and practice typing it.

7. What difficulties did you have in the process?

8. You may use your new password for other servers if you like.

11. You will be requested via email to input this new password weekly during this study.

12. Would you like a paper printout of the password in case you may forget it? Yes ___ No ___

181

Preliminary Questionnaire

1. Have you ever heard of online data or identity theft? a. Yes ___ b. No ___ c. Don’t Know ___

2. Do you know any one who has experienced online data or identity theft? a. Yes ___ b. No ___ c. Don’t Know ___

3. Have you ever experienced online data or identity theft? a. Yes ___ b. No ___ c. Don’t Know ___

4. How many characters were in the first password you ever used? ______

5. a. How many total characters were in the longest password you have ever used? ______(Please do not reveal the password!) b. How many of those characters were upper case? ______c. How many of those characters were numbers? ______d. How many of those characters were symbols or other special characters? ______

6. W h ich Password generation scheme di d you use to make your password in this study? a. very old address b. unexpected nonsense c. acrostic of song, verse, poem, etc. d. old memory e. confession f. my own – Please describe:______

7. Why did you use it instead of the others? a. ease of use b. familiarity c. humor d. other – Please explain: ______

182

APPENDIX H GROUP 4 PROTOCOL

The increasing threat to user identities and data requires longer passwords than people are accustomed to use. This study investigates ways to assist users in the management of long passwords to protect them from online fraud. We will look at five different password generation schemes. Please complete the following steps: 1. Take a few moments and read the paragraph describing the each scheme. 2. For this test, you are to generate a password of at least twenty characters using any of the five following methods, or a method of your own choosing. These methods are designed to help you come up with memorable passwords that are relatively easy to input, despite their length. Strengthen your password by: a. Running all individual words together, b. Using the caps lock key unexpectedly, and c. Include unexpected numbers, punctuation, or symbols. The most effective means to strengthen any password is to make it longer. You may notice that some of the longer examples contain almost entirely lower case letters for ease of input. This is encouraged; if your password is longer than twenty-five characters, you do not need to include special characters or even capitals.

A. The Old Address. A simple password-generation technique is to spell out an old, unforgettable address. Thus, “819 Ash St., Keokuk, Iowa” becomes something like: • 819ASHST>Keokuk,iowa • 10008JUNIPERdr.o.p.ks • 678GAUVA,chulavista,ca • 2445collins.M>BEACH.fl • TYCOBBRD.3306tycobbrd. • 806nninth,LINCOLNtall.fl32310 • 3445stuart_APTBBURLINGTON • greaterkailash-IP> Believe it or not, these are all former ad dresses (and passwords) of mine. They are very strong passwords, but are easy to type using the caps lock key. I cannot forget them, although I must take care to remember the unexpected capitalization and characters.

B. Unexpected Nonsense. Unexpectedly nonsensical passphrases can be easy to recall and input, and its imagery makes it easy to remember. For example, the nonsense phrase “Pink curtains meand e r across the ocean” can become: • PINKCURTAINSmeanderacrosstheocean • hismother’sbeardISHALFCONSTRUCTED • colorlessGREENIDEASdreamfuriously • thenewangelskillALLPLANETwaves • aSQUARErootoftheCIRCULARpi

183

• thepregnantSNAKEOFFINNIGAN • FLITSTHEelephantinarubarbtree • twofinedaysintheMIDDLEofnight • OH!PENSAYS:ohmeohmeohme • theCLAPOFONEhandsounding These nonsense phrases contain colorful ideas that are easy to remember and type. The trick is to remember the exact formulation of the capitals and punctuation.

C. The Acrostic. The acrostic draws only one letter (usually the first) of each word, of an easily remembered phrase. Thus, • Hamlet’s, “Whether ‘tis nobler in the mind to suffer the slings and arrows of outrageous fortune becomes: w’tnitmtstsaaooFORTUNE • John 3:16: For God so loved the world that he gave his only begotten Son: FgodsltworldthghobSON • Francis Scott Key: O say, does that star-spangled banner yet wave? Os,dtstarspangledbanneryw? • Milton: How soon hath Time, the subtle thief of youth, stolen on his wing my three and twentieth year! hshtime,thstoy,sohwm3a20y! • Shakespeare’s Sonnet 116: If this be error and upon me proved, I never writ, nor no man ever loved. itbeaumproved,inw,nnmeloved • Johnny Cash: I keep a close watch on this heart of mine…Because your mine, I walk the line. Ikacwothominebymineiwtline • Beatles: Obladi oblada life goes on bra. Lala how the life goes on. obladiobladalgob.lalahtlgo • Ray Charles: Tell your momma, tell your pa, Gonna move you back to Arkansas. tym,typa,gmybtARKANSAS • Bob Dylan: The answer my friend is blowin’ in the wind, the answer is blowin’ in the wind. tamfiblowin’itw,taibitwind • Qur’an: In the name of God, most Gracious, most Compassionate. inthenameofALLAH

D. The Old Memory. Try to draw on remote, yet unforgettable memories. Use a personal experience to generate an answer to one of the following questions: 1. What was your favorite place to visit as a child? 2. What is the first and last name of your first boyfriend or girlfriend? 3. Which phone number do you remember most from your childhood? 4. Who was your favorite actor, musician, or artist when you were 16? 5. What was the first movie you saw on a date, and with whom? 6. What was your favorite book as a child? 7. What brand of beer was the first you ever drank? 8. Who was the first teacher you had a crush on? 9. What we re the make, model, color, and style of your first car? 10. What was the first foreign country (or distant state) you visited, and when?

184

A possible answer to the first question is “My favorite place was Lake Geode” After strengthening, it becomes: 1. myfavoriteplacewaslakeGEODE 2. SUSANmyfirstgirlfriendDELANO 3. friendscalledmeatMI9-5726inkansas 4. at16,iwasintoclint’thegood’eastwood 5. myfirstDATEMOVIEwasstarwarswithcheryl 6. myfavoritebookasaKIDwasasimov’sfoundation 7. thefirstBEERieverdrankwasfalstaff 8. in3rdgrade,ihadacrushonms.Dumont 9. myfirsttruckwasagreen’51Chevy3100pickup 10. thefirstforeigncountryivisitedwasindiain’82

E. The Confession. The sharing of passwords with others can be discouraged by the use of embarrassing or confessional passwords. An example is the following: “I pick my nose at stoplights,” which becomes: • ipickmyNOSEatstoplights • iamhavingaBADHAIRlife • ivotedforBUSH/cheneyin’04 • ireallydon’tthinkthatmyfartsstink • inthe90’s,ithoughtMadonnawashot • oh,i’msorry.ithoughtyouwereawoman • thePHBisfatandstupid,andhasbadbreath • ifihadthenerve,iwouldcomeoutofthecloset • what’sWRONGwitheatingyourBUGERS? • whenilookinthemirror,iknowthatmybuttistoobig

3. Which scheme seems most suited for you t o use? 4. What specific advanta ges does this scheme have for you over the others in terms of usability, ease of use, and memorability? 5. Using t he above password-generation schemes, or using one of your own, generate your own password. You may write it on paper and keep that hardcopy secure until you feel that it is no longer needed. 6. Enter the password into the web browser based authentication system, and reenter it five times to confirm it a nd practice inputting it. 7. What difficulties did you have in the process?

8. You may use your new password for other servers if you like.

9. You will be requested via email to input this new password weekly during this study.

10. Would you like a paper printout of the password in case you may forget it? Yes ___ No ___

185

Preliminary Questionnaire

1. Have you ever heard of online data or identity theft? a. Yes ___ b. No ___ c. Don’t Know ___

2. Do you know any one who has experienced online data or identity theft? a. Yes ___ b. No ___ c. Don’t Know ___

3. Have you ever experienced online data or identity theft? a. Yes ___ b. No ___ c. Don’t Know ___

4. How many characters were in the first password you ever used? ______

5. a. How many total characters were in the longest password you have ever used? ______(Please do not reveal the password!) b. How many of those characters were upper case? ______c. How many of those characters were numbers? ______d. How many of those characters were symbols or other special characters? ______

6. Which Password generation scheme did you use to make your password in this study? a. very old address b. unexpected nonsense c. acrostic of song, verse, poem, etc. d. old memory e. confession f. my own – Please describe:______

7. Why did you use it instead of the others? a. ease of use b. familiarity c. humor d. other – Please explain: ______

186

REFERENCES

Accessdata.com (2007). Rainbow Tables. Available: http://accessdata.com.

Adams, A. & Sasse, M.A. (1999). Users are not the enemy: Why users compromise security mechanisms and how to take remedial measures. Communications of the ACM, 42, 12, 40-46.

Adams, A., Sasse, M., & Lunt, P. (1997). Making passwords secure and usable. People and Computers XII: Proceedings of HCI’97, 1-19. Berlin: Springer.

Anderson, J., & Bower, G. (1973). Human Associative Memory. Washington, DC: Winston.

Anderson, J., & Ross, B. (1980). Evidence against a semantic-episodic distinction. Journal of Experimental Psychology: Human Learning and Memory, 6, 441-65.

Babbie, E. (2001). The Practice of Social Research, 9th Edition. Belmont, CA: Wadsworth.

Bartlett, F. (1932). Remembering. Cambridge: Cambridge University Press.

Bastroff, S., & Sasse, M. (2000). Are Passfaces More Usable Than Passwords? A Field Trial Investigation. In S. McDonald, Y. Waern, and G. Cockton, (eds), People and Computers XIV - Usability or Else. (Proceedings of HCI 2000).

Bastroff, S., & Sasse, M. (2003). “Ten strikes and you're out”: Increasing the number of login attempts can improve password usability. Workshop on Human-Computer Interaction and Security Systems, CHI2003, April 5-10, 2003, Fort Lauderdale, Florida.

Bergadeno, F. & (1998). High dictionary compression for proactive password checking, ACM Transactions on Information and System Security, 1, 1, November, 1998.

Bergadano, F. & (1997). Proactive password checking with decision trees. ACM Conference on Computer and Communications Security, 1997, Zurich.

Besnard, D., & Arief, B. (2004). Computer security impaired by legitimate users, Computers & Security, 23, 3, 253-264, May 2004.

Bishop, M. (1991). A Proactive Password Checker. In D.T. Lindsay & W.L. Price (eds.) Information Security, 169-180. New York: North-Holland.

Bishop, M. (2005). Psychological Acceptability Revisited. In L.F. Cranor & S. Garfinkel (eds.), Security and Usability: Designing Secure Systems That People Can Use. Sebastopol, CA: O’Reilly.

Bishop, M., & Klein, D. (1992). Improving System Security via Proactive Password Checking. Computers & Security, 14, 3, 233-49 (1995).

187

Bower, G. (1970). Organizational factors in memory. Cognitive Psychology, 1, 18-46.

Brewer, J., & Hunter, A. (1989). Multimethod research: A synthesis of styles. Newbury Park, CA: Sage.

Burnett, M. (2006). Perfect Passwords: Selection, Protection, Authentication. Rockland, MA: Syngress.

Calkins, M. (1898). Short Studies in Memory and Association from the Wellesley College Laboratory. Psychological Review, 5, 451-62.

Coltheart, V., & Evans, J. (1981). An investigation of semantic memory in individuals. Memory & Cognition, 9, 524-32.

Conway, M. (1996). Autobiographical Memory. In Bjork, E. & Bjork, R. (eds.). Memory, 165- 94. San Diego: Academic Press.

Coombs, C., Dawes, R., & Tversky, A. (1981). Mathematical Psychology: An Elementary Introduction. Ann Arbor, MI: Mathesis Press.

Cott, J. (2005). On the Sea of Memory: A Journey from Forgetting to Remembering. New York: Random House.

Coventry, L. (2005). Usable Biometrics. In L.F. Cranor & S. Garfinkel (eds.), Security and Usability: Designing Secure Systems That People Can Use. Sebastopol, CA: O’Reilly.

Cox, B. (1998). PGP Passphrase FAQ. FAQ: How do I choose a good password or phrase? Available: http://www.virtualschool.edu/mon/Crypto/PGPPassPhraseFAQ.html.

Cresswell, J. (1994). Research Design: Qualitative & Quantitative Approaches. Thousand Oaks, CA: Sage Publications.

Cronbach, L., & Snow, R. (1977). Aptitudes and instructional methods: A handbook for research on interactions. New York: Irvington.

Dourish, P., & Redmiles, D. (2002). An approach to usable security based on event monitoring and visualization. Proceedings of the 2002 Workshop on New Security Paradigms, 75-81. New York: ACM Press.

Davies, C., & Ganesan, R. (1993). BApasswd: A new proactive password checker. 16th National Computer Security Conference, Baltimore, MD, September, 1993, 1-15.

Davis, D., & Price, W. (1987). Security for Computer Networks. Chichester: Wiley.

188

DeAlvare, A. (1990). How Crackers Crack Passwords or What Passwords to Avoid. Proceedings of UNIX Security Workshop II (Portland, 1990).

Diceware. (2006). Diceware Passphrase Home Page. Available: http://world.std.com/~reinhold/diceware.html.

Dhamija, R., & Perrig, A. (2000). Déjà vu: A User Study Using Images for Authentication. Proceedings of USENIX Security Symposium.

Ebbinghaus, H. (1913). Memory: A Contribution to Experimental Psychology. H. Ruger (trans.). New York: Teachers College, Columbia University. (Original German edition, 1885).

Epps, D. (2006). Blog. Available: http://silverstr.ufies.org/blog/archives/000657.html

Ellison, C., Hall, C., Milbert, R., & Schneier, B. (2000). Protecting Secret Keys with Personal Entropy. Future Generation Computer Systems, 16, 311-318. Available: http://www.schneier.com/paper-personal-entropy.html.

Engelfriet, A. (2005). Passphrase FAQ: Strength of the passphrase Available: http://www.iusmentis.com/security/passphrasefaq/strength.

Ericson, J. (2006). Password Protection? Forget It. Dr. Dobb's Journal | Departments | Security | Security Blog. Available: http://www.ddj.com/blog/securityblog/archives/2006/05/password_protec.html.

Exchange Security. (2004). Passwords vs passphrases, redux, July 30, 2004. Available: http://www.e2ksecurity.com/archives/001140.html.

Federal Trade Commission. (2006). FTC Releases Top 10 Consumer Fraud Complaint Categories: Identity Theft Again Leads the List. Available: http://www.ftc.gov/opa/2006/01/topten.htm.

Ferguson, N., & Schneier, B. (2003). Practical Cryptography. Indianapolis, IN: Wiley.

Friedman, W. (1925). The index of coincidence and its applications in cryptanalysis. Technical Paper, War Department, Office of the Chief Signal Officer, United States Government Printing Office (Washington, D.C., 1925).

Friedman, W., & Callimahos, L. (1985). Military Cryptanalytics Part I - Volume 1 & 2. Laguna Hills, CA: Aegean Park Press.

Ford, W. (1994). Computer Communications Security: Principles, Standard Protocols and Techniques. Englewood Cliffs, NJ: Prentice Hall.

Gagne, R. (1967). Learning and individual differences. Columbus, OH: Merrill.

189

Gasser, M. (1975). A Random Word Generator for Pronounceable Passwords. Technical Report ESD-TR-75-97. Bedford, MA: Electronic Systems Division, Hanscom Air Force Base.

GoedSoft. (2005). Good and Bad Passwords. How-To. Available: http://geodsoft.com/howto/password/cracking_passwords.htm.

Gordon, S. (1995). Social Engineering: Techniques and Prevention. Proceedings of the 12th World Conference on Computer Security: Audit and Control, October 25-27, 1995, 445- 451. Oxford, U.K.: Elsevier.

Gregg, V. (1986). Introduction to Human Memory. London: Routledge & Kegan Paul.

Grundin, J. (1987). Social Evaluation of User Interfaces. Who Does the Work and Who Gets the Benefit? In H-J Bullinger and B. Shackel (eds.), Proceedings of INTERACT 1987 IFIP Conference of Human Computer Interaction (Elsevier, 1987), 805-11.

Hitchings, J. (1995). Deficiencies of the Traditional Approach to Information Security and the Requirements for a New Methodology. Computers and Security, 14, 377-383.

Indiana University. (2006). Passwords are Passé. Available: http://itso.iu.edu/Passwords_are_passe.

ISO 9241-11 (1998). Ergonometric Requirements for Office Work with Visual Display Terminals (VDTs) – Part 11: Guidance on Usability. Geneva: International Organization for Standardization.

Ives, B., Walsh, K., & Schneider, H. (2004). The Domino Effect of Password Reuse. Communications of the ACM, 47, 4, 75-8. (April, 2004).

Jablon, D. (1996). Strong Password-Only Authenticated Key Exchange. Computer Communication Review, 26, 5, 5-26, October 1996.

Jacoby, L., & Dallas, M. (1981). On the relationship between autobiographical memory and perceptual learning. Journal of Experimental Psychology: General, 110, 306-40.

Jansen, W., Gavrila, S., Korolev, V., Ayers, R., & Swanstrom, R. (2003). Picture Password: A Visual Login Technique for Mobile Devices. Computer Security Division, Information Technology Laboratory, NIST. Gaithersburg, MD: NIST.

Jick, T. (1979). Mixing qualitative and quantitative methods: Triangulation in action. Administrative Science Quarterly, 24, 602-11.

Johansson, J. (2006). The Great Debates: Pass Phrases vs. Passwords. Part 3 of 3. Available: http://www.microsoft.com/technet/security/secnews/articles/itproviewpoint110104.mspx.

190

Johansson, J. (2005). How strong is my passphrase? Security Management - October 2005. Frequently Asked Questions About Passwords. Available: http://www.microsoft.com/technet/security/secnews/articles/itproviewpoint110104.mspx.

Just, M. (2005). Designing Authentication Systems with Challenge Questions. In L.F. Cranor & S. Garfinkel (eds.), Security and Usability: Designing Secure Systems That People Can Use. Sebastopol, CA: O’Reilly.

Karis, A., Fabini, M., & Donchin, E. (1984). P300 and memory: Individual differences in the von Restorff effect. Cognitive Psychology, 16, 177-216.

Kaufmann, C., Perlman, R., & Speciner, M. (2002). Network Security: Private Communication in a Public World. Upper Saddle River, NJ: Prentice Hall PTR.

Kerckhoffs, A. (1883). La cryptographie militaire. Journal des sciences militaires, IX, 5-83.

Kintsch, W. (1970). Models for free recall and recognition. In D.A. Norman (ed.), Models of Human Memory. New York: Academic Press.

Kintsch, W. (1974). The Representation of Meaning in Memory. Potomac, MD: Lawrence Erlbaum Associates.

Kinsbourne, M., & George, J. (1974). The Mechanisms of the Word Frequency Effect on Recognition Memory. Journal of Verbal Learning and Verbal Behavior, 13, 63-9.

Kirkpatrick, E. (1894). An Experimental Study of Memory. Psychological Review, 1, 602-9.

Klein, D. (1990). Foiling the Cracker: A Survey of, and Improvements to, Password Security. Proceedings of the USENIX Security Workshop. Portland, Oregon: USENIX Association, Summer 1990; expanded as a technical report from SEI, 1992.

Knuth, D. (1997). The Art of Computer Programming, Volume 3: Sorting and Searching, Third Edition. New York: Addison-Wesley, 1997.

Krug, S. (2000). Don’t make me think: A common sense approach to Web usability. Indianapolis: New Riders.

Kung, S., Mak, M., & Lin, S. (2004). Biometric Authentication: A Machine Learning Approach. Upper Saddle River, NJ: Prentice Hall PTR.

Kyllonen, P., Tirre, W., & Christal, R. (1991). Knowledge and processing speed as determinants of associative learning. Journal of Experimental Psychology: General, 120, 57-79.

Landauer, T. (1986). How much do people remember? Some estimates of the quantity of learned information in long-term memory. Cognitive Science, 10, 477-493.

191

Lazar, J. (2006). Web usability: A user-centered design approach. Boston: Pearson.

Loftus, E., & Loftus, G. (1974). Changes in memory structure and retrieval over the course of instruction. Journal of Educational Psychology, 66, 315-8.

Lucas, I. (2006). Password Recovery Speeds: How long will your password stand up? Lockdown.co.uk - The Home Computer Security Centre. Available: http://www.lockdown.co.uk/?pg=combi&s=articles.

Madigan, A. (1983). Picture Memory. In J.C. Yuille (ed.). Imagery, Memory, and Cognition: Essays in Honour of Allan Paivio. Erlbaum.

Mandler, G. (1967). Organization and memory. In K.W. Spence & J.T. Spence (eds), The Psychology of Learning and Motivation, Volume 1. New York: Academic Press.

Mandler, G., & Pearlstone, Z. (1966). Free and constrained concept learning and subsequent recall. Journal of Verbal Learning and Verbal Behavior, 5, 126-31.

Metcalfe, B. (1973). The Stockings Were Hung by the Chimney with Care. RFC 602.

Miller, G. (1956). The magical number seven, plus or minus two: Some limits of our capacity for processing information. Psychological Review, 63, 81-7.

Monrose, F., & Reiter, M. (2005) Graphical Passwords. In L.F. Cranor & S. Garfinkel (eds.), Security and Usability: Designing Secure Systems That People Can Use. Sebastopol, CA: O’Reilly.

Monrose, F., Reiter, M., & Wetzel, S. (1999). Password Hardening Based on Keystroke Dynamics. Proceedings of the 6th ACM Conference on Computer and Communications Security, 73-82.

Morris, R., & Thompson, K. (1979). Password Security: A Case History. Communications of the ACM, 22, 11, 594-7.

Muffett, A. (2005). CrackLib: a proactive password sanity library. Available: http://www.users.dircon.co.uk/~crypto/download/cracklib.2.7.txt.

National Institute of Standards and Technology. (2006). Federal Information Processing Standards Publication 201-1. Available: http://csrc.nist.gov/publications/fips/fips201- 1/FIPS-201-1-v5.pdf.

National Institute of Standards and Technology. (1985). Federal Information Processing Standards Publication 112. Available: http://wwww.itl.nist.gov./fisppubs/fip112.htm.

Neisser, U. (1976). Cognition and Reality. San Francisco: Freeman.

192

NeoSmart Technologies. (2006). Innovating the road to tomorrow. Available: http://www.neosmart.net/.

NeoSmart Technologies. (2006). The Advent of Uncrackable Passwords, 3rd Edition.

Nielsen, J. (2002). Alertbox: Top Research Laboratories in Human-Computer Interaction (HCI). useit.com. March 31, 2002. Available: http://www.useit.com/alertbox/20020331.html.

Nielsen, J. (2000). Designing Web Usability. Indianapolis: New Riders.

Park, D. (1997). Aging and Memory: Mechanisms Underlying Age Differences in Performance. Proceedings of the 1997 World Congress of Gerontology.

Peacock, A., Ke, X., & Wilkerson, M. (2005). Identifying Users from Their Typing Patterns. In L.F. Cranor & S. Garfinkel (eds.), Security and Usability: Designing Secure Systems That People Can Use. Sebastopol, CA: O’Reilly.

Perruchet, P., & Baveux, P. (1989). Correlational analyses of explicit and implicit memory performance. Memory & Cognition, 17, 77-86.

Pleeger, C. (1989). Security in Computing. Englewood Cliffs, NJ: Prentice-Hall.

Podd, J., Bunnel, J., & Henderson, R. (1996). Cost-effective Computer Security: Cognitive and Associative Passwords. Proceedings of the 6th Australian Conference on Computer- Human Interaction (OZCHI '96), 304-5.

Porter, S. (1981). A Password Extension for Improved Human Factors. In A. Gersho (ed.), Advances in Cryptology (Santa Barbara, California) 1981, 81. Also: Computers & Security, 1, 1, 54-6. 1982.

Postman, L. (1972). A pragmatic view of organization theory. In E. Tulving & W. Donaldson (eds), Organization of Memory. New York: Academic Press.

Povey, D. (2000). Optimistic Security: A New Access Control Paradigm. Proceedings of the 1999 Workshop on New Security Paradigms (ACM Press, 2000), 40-5.

Proctor, R., & Vu, K. (2006). Stimulus-Response Compatibility Principles: Data, Theory, and Application. London: CRC Press.

Project Rainbow Crack. (2006). Rainbow Table. Available: http://www.antsight.com/zsl/rainbowcrack/.

Realuser.com (2006). Technology. Available: http://www.realuser.com/technology.

193

Reber, A., Walkenfield, F., & Hernstadt, R. (1991). Implicit and explicit learning: Individual differences and IQ. Journal of Experimental Psychology: Learning, Memory, and Cognition, 17, 888-96.

Roediger, H. & Goff, L. (1998). Memory. In W. Bechtel and G. Graham (eds.), A Companion to Cognitive Science. Oxford, UK: Blackwell.

Reinhold, A. (1996). Results of a Survey on PGP Pass Phrase Usage. Available: http://world.std.com/~reinhold/passphrase.survey.asc.

Renaud, K. (2005). Evaluating Authentication Mechanisms. In L.F. Cranor & S. Garfinkel (eds.), Security and Usability: Designing Secure Systems That People Can Use. Sebastopol, CA: O’Reilly.

Roznowski, M. (1993). Measures of cognitive processes: Their stability and other psychometric and measurement properties. Intelligence, 17, 361-88.

The SANS Institute. (2002). The Twenty Most Critical Internet Security Vulnerabilites (Updated). http://www.sans.org/top20.htm.

Saltzer, J., & Schroeder, M. (1975). The Protection of Information in Computer Systems. Proceedings of the IEEE, 63, 9, 1278-1308.

Sasse, M., Bastroff, S., & Weirich, D. (2001). Transforming the ‘weakest link’ — a human/computer interaction approach to usable and effective security. BT Technology Journal, 19, 3, 122-131.

Schrage, M. (2005). The Password Is Fayleyure. TechnologyReview.com. March 2005. Available: http://www.technologyreview.com/articles/05/03/issue/review_password.asp.

Schultz, E., Proctor, R., Lien, M., & Salvendy, G. (2001). Usability and security: An appraisal of usability issues in information security methods. Computers & Security, 20, 7, 620-34.

Shamir, A. (1979). How to share a secret, Communications of the ACM, 22, 612-613.

Shannon, C. (1948). A Mathematical Theory of Communication, Bell System Technical Journal, 27, 379–423, 623–656.

Simon, H. (1974). How big is a chunk? Science, 183, 482-8.

Smetters, D., & Grinter, R. (2002). Moving from the design of usable security technologies to the design of useful secure applications. Proceedings of the 2002 Workshop on New Security Paradigms, 82-9. New York: ACM Press.

Spafford, E. (2006). CERIAS Weblogs; Security Myths and Passwords. Available: http://www.cerias.purdue.edu/weblogs/author/spaf/.

194

Spafford, E. (1992a). Observing Reusable Password Choices. Purdue Technical Report CSD-TR 92-049. West Lafayette, IN: Purdue University.

Spafford, E. (1992b). OPUS: Preventing Weak Password Choices, Computers and Security 11, 3, 273-278.

Spector, Y., & Ginzberg, J. (1994). Pass-Sentence – A New Approach to Computer Code. Computers and Security, 13, 145-60.

Smith, S. L. (1987). Authenticating Users by Word Association. Computers and Security, 6, 464- 470.

Stinson, D. (2002). Cryptography: Theory and Practice, Second Edition. Boca Raton, FL: Chapman & Hall/CRC.

Summers, W., & Bosworth, E. (2004). Password Policy: The Good, The Bad, and The Ugly. ACM International Conference Proceeding Series, 58. Proceedings of the winter international symposium on Information and communication technologies, Cancun, Mexico.

Thorpe, J., van Oorshot, P., & Somayaji, A. (2005). Pass-Thoughts: Authentication With Our Minds. ACSA 2005 New Security Paradigms Workshop, Sept. 2005, Lake Arrowhead, California.

Tognazinni, B. (2005). Design for Usability. In L.F. Cranor & S. Garfinkel (eds.), Security and Usability: Designing Secure Systems That People Can Use. Sebastopol, CA: O’Reilly.

Tulving, E. (1968). Theoretical issues in free recall. In T.R. Dixon & D.L. Horton (eds.), Verbal Behavior and General Behavior Theory. Engelwood Cliffs, NJ: Prentice-Hall.

Tulving, E. (1974). Cue-dependent forgetting. American Scientist, 62, 74-82.

Tulving, E. (1983). Elements of episodic memory. Oxford: Clarendon Press.

Tulving, E., & Osler, S, (1968). Effectiveness of retrieval cues in memory for words. Journal of Experimental Psychology, 77, 593-601.

Tulving, E., & Pearlstone, Z. (1966). Availability versus accessibility of information in memory for words. Journal of Verbal Learning and Verbal Behavior, 5, 381-91.

UCCASS. (2006). Unit Command Climate Assessment and Survey System (UCCASS). Available: http://www.bigredspark.com/survey.html.

195

Weirich, D., & Sasse, M. (2002). Pretty Good Persuasion: A First Step towards Effective Password Security in the Real World. New Security Paradigms Workshop, Proceedings of the 2001 workshop on New security paradigms Cloudcroft, New Mexico, 137-43.

Weiss, E. (2006). Consultant Breached FBI's Computers: Frustrated by Bureaucracy, Hacker Says Agents Approved and Aided Break-Ins. Washington Post, Thursday, July 6, 2006, A, 05. Available: http://www.washingtonpost.com/wp- dyn/content/article/2006/07/05/AR2006070501489.html.

Whitten, A., & Tygar, J. (1999). Why Johnny Can’t Encrypt: A Usability Evaluation of PGP 5.0. Proceedings of the 8th USENIX Security Symposium (Washington, D.C., August 23-6, 1999), 169-184.

Wiedenbeck, S., Waters, J., Birget, J., Brodskiy, A., &. Memon, N. (2005). Authentication Using Graphical Passwords: Basic Results. Human-Computer Interaction International (HCII 2005), Las Vegas, July 25-27, 2005.

Williams, R. (2006). The PGP Passphrase FAQ. Available: http://www.iusmentis.com/security/passphrasefaq/.

Woltz, D., & Shute, V. (1993). Individual difference in repetition priming and its relationship to declarative knowledge acquisition. Intelligence, 17, 333-59.

Yan, J. (2001). A Note on Proactive Password Checking. ACM New Security Paradigms Workshop, New Mexico, USA, September 2001.

Yan, J. Blackwell, A., Anderson, R., & Grant, A. (2004). The Memorability and Security of Passwords: Empirical Results. Cambridge University Computer Laboratory. IEEE Security & Privacy, September/October 2004, 25-31.

Zhu, S. (2005). Project RainbowCrack. Available: http://www.antsight.com/zsl/rainbowcrack.

Zimmermann, P. (1995). PGP Source Code and Internals. Boston: MIT Press.

Zviran, M., & Haga, W. (1993). A Comparison of Password Techniques for Multilevel Authentication Mechanisms. The Computer Journal, 36, 3, 227-37.

Zviran, M., & Haga, W. (1990). Cognitive Passwords: The Key to Easy Access Control. Computers and Security, 9, 723-36.

196

BIOGRAPHICAL SKETCH

Peter Thomas Henry

EDUCATION Ph.D., College of Information, Florida State University, Tallahassee, FL (2007) Master of Science in Library & Information Studies, Florida State University, Tallahassee, FL (2002) Master of Arts in Religion, Florida State University, Tallahassee, FL (1997) Bachelor of Arts in Religion, Florida State University, Tallahassee, FL (1996)

PUBLICATIONS Breeden, R., Cantey, M., Cureton, B., Henry, P., Mulholland, J., Sprague, W., Stokes, C., & Watson, J. (2006). The Phlorida Autopsy Report. Journal of Digital Forensics Practice, 1, 3, 203-222.

Burmester, M., Henry, P., & Kermes, L. (2005). Tracking Cyberstalkers: a Cryptographic Approach. Computers and Society, September, 2005.

Aggarwal, S., Henry, P., Kermes, L., & Mulholland, J. (2005). Evidence in Proactive Cyberstalking Investigations: The PAPA Approach. SADFE 2005 Conference Proceedings. IEEE, November Conference Proceedings.

Aggarwal, S., Henry, P., Kermes, L., & Mulholland, J. (2005). Anti-Cyberstalking:The Predator and Prey Alert (PAPA) System. SADFE 2005 Conference Proceedings. IEEE, November Conference Proceedings.

197