Why Johnny Still, Still Can’t Encrypt: Evaluating the Usability of a Modern PGP Client

Scott Ruoti, Jeff Andersen, Daniel Zappala, Kent Seamons Brigham Young University {ruoti, andersen} @ isrl.byu.edu, {zappala, seamons} @ cs.byu.edu

ABSTRACT In our study of 20 participants, grouped into 10 pairs of par- This paper presents the results of a laboratory study involv- ticipants who attempted to exchange encrypted , only ing Mailvelope, a modern PGP client that integrates tightly one pair was able to successfully complete the assigned tasks with existing providers. In our study, we brought using Mailvelope. All other participants were unable to com- in pairs of participants and had them attempt to use Mailve- plete the assigned task in the one hour allotted to the study. lope to communicate with each other. Our results shown that This demonstrates that encrypting email with PGP, as imple- more than a decade and a half after Why Johnny Can’t En- mented in Mailvelope, is still unusable for the masses. crypt, modern PGP tools are still unusable for the masses. Our results also shed light on several ways that PGP-based We finish with a discussion of pain points encountered using tools could be improved. First, integrated tutorials would Mailvelope, and discuss what might be done to address them be helpful in assisting first time users in knowing what they in future PGP systems. should be doing at any given point in time. Second, an ap- Author Keywords proachable description of public key cryptography could help Security, usability, secure email, PGP users correctly manage their own keys. Third, in line with previous work by Atwater et al. [1], we find that PGP-based ACM Classification Keywords tools would be well served by offering automatically gener- H.1.2. Models and Principles: User/Machine Systems—hu- ated for unknown recipients asking them to install the man factors; H.5.2. Information Interfaces and Presentation PGP software, generate a public key, and share it with the (e.g. HCI): User Interfaces—user-centered design sender. Finally, the PGP block itself could be enhanced to help non-PGP users who receive an encrypted email know INTRODUCTION how to work with their friend to get an encrypted message Usable, secure email is still an open problem more than 15 they will be able to read. years after it was first studied by Whitten et al. [10]. Six years after the original Johnny paper, Sheng et al. showed RELATED WORK that PGP 9 was still difficult for users to operate correctly [9]. Whitten and Tygar [10] conducted the first formal user study In this paper, we attempt to see if in the last decade, modern of a secure email system (i.e., PGP 5), uncovering serious us- PGP-based tools have improved to the point where users can ability issues with key management and users’ understanding successfully send encrypted email. of the underlying public key cryptography. It was found that a majority of users were unable to successfully send encrypted We elected to test Mailvelope, a modern PGP tool, for our email in the context of a hypothetical political campaign sce- study. Mailvelope is a that integrates with nario. The results of the study took the security community users’ webmail systems. It is the only system currently being by surprise and became responsible for shaping modern us- 1 promoted by the EFF’s secure message score card that in- able security research. Sheng at al. demonstrated that despite tegrates with users’ webmail providers, an important feature improvements made to PGP in the seven years since Whitten arXiv:1510.08555v2 [cs.CR] 13 Jan 2016 for many users. It is also highly rated on the Chrome Web and Tygar’s original publication, key management was still Store, with 242 users collectively giving it 4.6 out of 5 stars. a challenge for users. Furthermore, they showed that in the In our own testing of PGP alternatives, we found Mailvelope new version of PGP encryption and decryption had become to be roughly as usable as other alternatives (i.e., GPG Tools, so transparent that users were unsure if a message they re- Enigmail, Google’s End-to-End Encryption). ceived had actually been encrypted. 1https://www.eff.org/secure-messaging-scorecard While there have been some attempts at implementing email using key escrow [6,7], these approaches have degraded se- curity when compared to traditional PGP. Atwater et al. cre- ated a mock-up of Mailvelope that automated key creation and sharing. This approach falls somewhere in between tra- ditional PGP and key escrow on the spectrum of usability and security tradeoffs. Unfortunately, it was not possible to in- clude Atwater et al.’s mock-up in our study, as we discovered that it relied on hard-coded keys for email recipients, and the effort required to implement a working key management sys- to send him the necessary sensitive information (e.g., SSN). tem into their mock-up would have exceeded our resources. Once Participant B had received this information, he was in- Furthermore, the mock-up did not correctly simulate the need structed to use Mailvelope to respond to Participant A with a for participants to wait on the recipient to set up keys, which confirmation code (encrypted using Mailvelope) to conclude made the mock-up incompatible with our study.2 the task.

METHODOLOGY After the instructions were given, Participant A was provided with the the Mailvelope website and instructed to begin the We conducted an IRB-approved user study wherein pairs of 6 participants used secure email to transmit sensitive informa- task. While participants waited for email from each other, tion to each other. This section gives an overview of the study they were told that they could browse the Internet, use their and describes the scenario, tasks, study questionnaire, and phones, or engage in other similar activities. This was done post-study interview. In addition, we discuss how the study to provide a more natural setting for the participants, as well was developed and its limitations. as to avoid frustration if participants had to wait for an ex- tended period of time while their friends figured out how to 7 Study Setup use Mailvelope. The study ran for two weeks, beginning Tuesday, Septem- Study coordinators were allowed to answer questions related ber 8, 2015 and ending Friday, September 18, 2015. In to- to the study, but were not allowed to provide instructions on tal, 10 pairs of participants (20 total participants) completed how to use Mailvelope. If participants became stuck and the study. Participants were allocated sixty minutes to com- asked for help, they were told that they could take whatever plete the study, with about 35-40 minutes spent using Mail- steps they normally would to solve a similar problem, in- velope. Participants were compensated $15 USD for their cluding using an Internet search service. Additionally, when participation. Participants were required to be accompanied asked for help, if the study coordinator believed communi- by a friend, who served as their counterpart for the study. cation between the two parties could help, study coordina- For standardization and requirements of the systems tested tors could remind participants that they were completing this in the study, both participants were required to have study with their friend and were free to communicate with accounts. their friend however they wanted, and that only the sensitive When participants arrived, they were read a brief introduction information was required to be transmitted over secure email. detailing the study and their rights as participants. Partici- pants were informed that they would be in in separate rooms Study Questionnaire during the study and would use email to communicate with We administered our study using the Qualtrics web-based sur- each other.3 Participants were also informed that a study co- vey software. Before beginning the survey, participants were ordinator would be with them at all times, and could answer read an introduction by the study coordinator and then asked questions they might have. to answer a set of demographic questions. Participants then completed the study task. Demographics Immediately upon completing the study task, participants We recruited Gmail users for our study at a local university. were asked several questions related to their experience with Participants were two-thirds male: male (13; 65%), female that system. First, participants completed the ten System Us- (7; 35%). Participants skewed young: 18 – 24 years old (18; ability System questions [4,5]. Second, participants were 90%), 25 – 34 years old (2; 10%). also asked to describe in their own words what they liked We distributed posters broadly across campus to avoid biasing about Mailvelope, what they would change, and why they our results to any particular major. All participants were uni- would change it. If Participant B had never received an en- versity students,4 with the majority being undergraduate stu- crypted email from Participant A, they were not required to dents: undergraduate students (17; 85%), graduate students complete the study questionnaire, as they had no experience (3; 15%). with Mailvelope.

Scenario and Task Design Post-study Interview During the study, participants were asked to role-play a sce- After the completion of the survey, participants were inter- nario regarding completing taxes. Participant A was told they viewed by their respective study coordinators. Study coordi- needed Participant B’s help with filing taxes. Participant A nators focused on issues that had arisen during the study and was also told they since they were sending sensitive informa- probed for more details regarding areas of confusion. After tion (e.g., SSN) that they should encrypt this information us- the participants completed their individual post-study inter- ing Mailvelope.5 Participant B was told to wait for his friend views, they were brought together for a final post-study in- 2This incorrect simulation also calls into question their results re- terview. Participants were asked to explain what they had garding the high usability of automated-PGP. experienced to each other, and notes were taken on what 3The study coordinators ensured that the participants knew each was said by the study coordinators. Additionally, participants other’s email addresses. 4We did not require this. 6https://www.mailvelope.com/ 5Participants used sensitive information we generated and not their 7In some cases Participant B never actually received an email from own information. Participant A. td oriaoswudalwte og pt e minutes ten to up go to progress, them making allow first would were coordinators had end participants study A to If Participant after instructed Mailvelope. elapsed were installed minutes coordinators thirty if com- Study task to the minutes tasks. 45 the and 30 plete between given were Participants acceptable”. “Not Failures as labeled is and “F” 15 of the grade as below letter rated a falls is given It 34.5 Us- usability. of “Poor” score 1). having SUS Figure Mailvelope’s is (see ratings, it deviations adjective these standard the ing of adjective with terms these correlated in is to for score closest ranges a generated that to we such 8]. order ratings, data, 3, in this [2, on SUS scores SUS Based used system’s describing which ratings each who adjective-based studies of researchers give several of meaning of hundreds work the the analyzed leverage to we score, context SUS greater give SUS To the as used is value mean [4]. The com- score and 1. or Table system B, in given Participant each is A, for bined) Participant score (i.e., participant SUS of Scale the type Usability of breakdown System A the (SUS). using Mailvelope evaluated We Scale Usability System made mistakes Mailvelope, the for list participants. and scores by rate, SUS success the the on on details report provide we section, this In RESULTS addressed. are problems glaring when more fares its Mailvelope how of see some to conducted stud- be also Future could ies professionals). security profession- scientists, technical this computer possi- (e.g., rerun als, all populations to different interesting of especially using indicative difficulties be study not would show It is to outcomes. whom it enough ble of Mailveope was all with this While associated participants, twenty students. included were only study Our while instruc- reference task to study. the participants the the out completing to printed on them we provided Based study, and pilot tions total). the participants of with results (six study the participants pilot of a conducted pairs we three study, the developing After Limitations and Development Survey unspo- remain partici- otherwise systems, that ideal ken. preferences design reveal to often asked experience pants when our that designers, shown system has not function. are would system participants email While secure ideal an how asked were Combined B Participant A Participant 16 10 6 Count 4515.3 34.5 10.9 41.3 16.6 30.5

al .SSScores SUS 1. Table Mean

Standard Deviation 753.54. 6957.5 16.9 41.3 33.75 27.5 . 503. 5657.5 45.6 35.0 57.5 41.3 25.0 28.8 0.0 21.3 0.0 Min

Q1

th Median ecnie is percentile,

Q3

Max esedtermidro hsscinrpriglessons reporting section this masses, of the by remainder usable be the ever will spend PGP if we unclear is it While example, For called.” and study. up Mail- the use said, during to also did trying M3A they up given before in have long doing.” that would velope was indicate they I to world what participants real several was the led that also and difficulty use, The ever this would of you expression ware M3A: comical frustration from most expressed coming the frustration participants with All Mailvelope, participants with email. of majority their the help encrypt to failed clearly Mailvelope session). and study A same Participant the were in M1B respectively, and B, Participant M1A paired (e.g., were during other number same each played the with participant with the participants role and study, which the to The refers M[1-10][A,B]. identifier letter running unique Participants final while a assigned participants. found been from all we quotes have include themes and discuss study we this section this In DISCUSSION the sending without task the to key. accomplish needed private and they more that some informed try re- were the they transmitted In information, had so received. participants quired friend had the he his though even message to case, the password this decrypt keyring could his friend key his with private that along his it block. exported sent PGP eventually and the participant before one area the Finally, adding to window), encrypt information compose sensitive PGP to after their the block key in PGP still public the (while that modified encryption participant use One to in- message. tried friend their their then with and pair of key Three formation, a generated successful. pairs eventually participant was for the that including pair pairs, key. participant participant public the the sender’s of seven the for with occurred message mis- This a common encrypting most The was take mistakes. made pairs participant All Mistakes finish to ability their influenced limit. heavily time this the within that likely cryptography. key is public It partic- about that the learned of in previously one unique had where participants was ipants of pair pair successful only the The were forty- they so. full the do required to task minutes the five complete did that pair one The public their sharing after do pair to this keys. what though about one keys, confused public Only still traded was message. actually pairs the nine read they the to that of unaware Mailvelope was install and mes- to email any needed PGP encrypted mys- send A completely the was to participants by B tified Mailvelope Participant pairs, pairs, use two nine another to In the sage. how of out two figured In successfully never to unable task. were the nine to complete pairs, time participant study ten overall the Of the cause not did hour. it one exceed as long as longer, Atrfiemnts ol aejs given just have would I minutes, five “After Iaietesuietsoft- stupidest the “Imagine . Figure 1. Adjective-based Ratings to Help Interpret SUS Scores learned from the study that could assist the design of PGP- also suggested by Atwater et al. [1] and our experience cor- based secure email tools. roborates their suggestions.

Integrated Tutorials Better Text to Accompany PGP Block Every single participant constantly flipped back and forth be- When first seeing the PGP block generated by Mailvelope, tween Gmail and Mailvelope’s instructions. At no stage was no participant in the Participant B role was clear what they it intuitive what they should do next based on the Mailvelope were supposed to do with it. One participant noted that they UI. Nearly all participants indicated that they wished Mail- thought it was an image that had gotten garbled during email velope had provided instructions that were integrated with transmission. The one participant that managed to success- the Mailvelope software, and would walk them through, step- fully use Mailvelope, M10B, even stated, “It was like a puz- by-step, in setting up Mailvelope and sending their first en- zle, I only got a link to Mailvelope. I then had to go there and crypted email. This has been shown to greatly improve the explore.” usability of other secure email systems [7], and would greatly To address this issue, PGP-based tools could adopt a tech- assist first-time users in acclimating to PGP. nique used by other secure email tools: including plaintext Key steps that could be addressed by tutorials are: (1) helping instructions detailing the nature of the encrypted email and participants generate their PGP key pair, (2) discussing how how to decrypt it [7]. While it is true that instructions on how to share public keys, (3) inviting their friends to setup Mail- to decrypt the message are meaningless—as PGP-based tools velope, (4) importing their friend’s public keys, (5) sending can only encrypt messages for a recipient who has already es- their first encrypted email, and (6) decrypting their first en- tablished and shared their public key with the sender—there is crypted email. still room for instructions helping the unexpected recipient of an encrypted email know what to do next. These instructions Explanation of Public Key Cryptography could invite the recipient to install the PGP software, gener- The only participant pair that successfully completed the ate a key pair, and then send their public key to their friend. study task likely did so because one of the participants in the While these instructions aren’t foolproof, they are certainly a pair had previous knowledge related to public key cryptogra- positive improvement to the current situation. phy. Additionally, the only other pair that made progress did so because they realized that they needed each other’s public CONCLUSION keys, but even that pair did not know how to then use those We studied Mailvelope, a browser-based PGP secure email shared public keys. For the remaining eight participant pairs, tool that integrates tightly with users’ existing webmail the post-study interview made it clear that they did not under- providers. In our study of 20 participants, grouped into 10 stand how public and private keys were used. pairs of participants who attempted to exchange encrypted email, only one pair was able to successfully complete the To help address this, a simple explanation of PGP needs to assigned tasks using Mailvelope. All other participants were be created that is accessible to the masses. Several partici- unable to complete the assigned task in the one hour allotted pants indicated that they would prefer these concepts to be to the study. Even though a decade has passed since the last displayed in a simple video. Ideally, whatever form the de- formal study of PGP [9], our results show that Johnny has still scription takes, it would be be integrated with the tutorials, not gotten any closer to encrypt his email using PGP. allowing new users to be introduced to public key cryptogra- phy as a natural extension of their usage of Mailvelope. While our results are disheartening, we also discuss several ways that participant experiences and responses indicate how Automatic Email Invites for Recipients PGP could be improved. First, integrated tutorials could help All participants in the Participant A role were confused about first time users with step-by-step instructions. Second, an what their friend needed to do in order for their friend to approachable description of public key cryptography would receive encrypted email. An easy way to address this issue give novice individuals a “fighting chance” at using PGP. would be to modify PGP-based tools to send an email to re- Third, PGP-based tools would be more effective if, when en- cipients for which there isn’t an associated public key, asking countering an unknown recipient, the recipient were to be au- that individual to install the appropriate software, generate a tomatically sent an email telling them what they needed to do key pair, and reply with the public key. This technique was receive encrypted email. Finally, the PGP block itself could be prefixed with instructions that would help non-PGP users 5. John Brooke. 2013. SUS: A Retrospective. Journal of understand how to overcome the problem of receiving an en- Usability Studies 8, 2 (2013), 29–40. crypted message that they can’t decrypt. 6. Sascha Fahl, Marian Harbach, Thomas Muders, Matthew Smith, and Uwe Sander. 2012. Helping Johnny ACKNOWLEDGEMENTS 2.0 to encrypt his Facebook conversations. In We would like to thank Scott Heidbrink, Mark ONeill, El- Proceedings of the Eighth Symposium on Usable ham Vaziripour, and Justin Wu for their assistance as study Privacy and Security. ACM, 11. coordinators. 7. Scott Ruoti, Nathan Kim, Ben Burgon, Timothy Van REFERENCES Der Horst, and Kent Seamons. 2013. Confused Johnny: 1. Erinn Atwater, Cecylia Bocovich, Urs Hengartner, Ed when automatic encryption leads to confusion and Lank, and Ian Goldberg. 2015. Leading Johnny to Water: mistakes. In Proceedings of the Ninth Symposium on Designing for Usability and Trust. In Proceedings of the Usable Privacy and Security. ACM, 5. Eleventh Symposium on Usable Privacy and Security. 8. Jeff Sauro. 2011. A practical guide to the system 2. Aaron Bangor, Philip Kortum, and James Miller. 2008. usability scale: Background, benchmarks & best An Empirical Evaluation of the System Usability Scale. practices. Measuring Usability LLC. International Journal of Human–Computer Interaction 9. S. Sheng, L. Broderick, CA Koranda, and JJ Hyland. 24, 6 (2008), 574–594. 2006. Why Johnny still can’t encrypt: evaluating the 3. Aaron Bangor, Philip Kortum, and James Miller. 2009. usability of email encryption software. In Proceedings Determining What Individual SUS Scores Mean: of the Second Symposium On Usable Privacy and Adding An Adjective Rating Scale. Journal of Usability Security - Poster Session. Studies 4, 3 (2009), 114–123. 10. Alma Whitten and J. D. Tygar. 1999. Why Johnny can’t 4. John Brooke. 1996. SUS — a quick and dirty usability encrypt: A usability evaluation of PGP 5.0. In 8th scale. In Usability Evaluation in Industry. CRC Press. USENIX Security Symposium. citeseer.nj.nec.com/whitten99why.html