User Identification Through Keystroke Biometrics at an Internet Scale

Calhoun: The NPS Institutional Archive DSpace Repository Theses and Dissertations 1. Thesis and Dissertation Collection, all items 2019-12 USER IDENTIFICATION THROUGH KEYSTROKE BIOMETRICS AT AN INTERNET SCALE Veazey, Mark W. Monterey, CA; Naval Postgraduate School http://hdl.handle.net/10945/64089 Downloaded from NPS Archive: Calhoun NAVAL POSTGRADUATE SCHOOL MONTEREY, CALIFORNIA THESIS USER IDENTIFICATION THROUGH KEYSTROKE BIOMETRICS AT AN INTERNET SCALE by Mark W. Veazey December 2019 Thesis Advisor: Vinnie Monaco Second Reader: John D. Fulp Approved for public release. Distribution is unlimited. THIS PAGE INTENTIONALLY LEFT BLANK Form Approved OMB REPORT DOCUMENTATION PAGE No. 0704-0188 Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instruction, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188) Washington, DC 20503. 1. AGENCY USE ONLY 2. REPORT DATE 3. REPORT TYPE AND DATES COVERED (Leave blank) December 2019 Master's thesis 4. TITLE AND SUBTITLE 5. FUNDING NUMBERS USER IDENTIFICATION THROUGH KEYSTROKE BIOMETRICS AT AN INTERNET SCALE 6. AUTHOR(S) Mark W. Veazey 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING Naval Postgraduate School ORGANIZATION REPORT Monterey, CA 93943-5000 NUMBER 9. SPONSORING / MONITORING AGENCY NAME(S) AND 10. SPONSORING / ADDRESS(ES) MONITORING AGENCY N/A REPORT NUMBER 11. SUPPLEMENTARY NOTES The views expressed in this thesis are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government. 12a. DISTRIBUTION / AVAILABILITY STATEMENT 12b. DISTRIBUTION CODE Approved for public release. Distribution is unlimited. A 13. ABSTRACT (maximum 200 words) Identification of users on the internet has broad-reaching implications in the computer science discipline regarding cyber security and privacy. Keystroke biometrics leverages the unique dynamics of how a user types to perform identification; however, current methods of authentication and identification using keystroke dynamics do not scale well beyond a few hundred users. This thesis investigates the feasibility of using conventional machine learning and deep learning techniques to identify users at an internet scale. By analyzing free-text keystroke information from a collection of over 100,000 users, several methods to perform user identification and profiling are identified, with a focus on determining how the size of the dataset affects identification accuracy. This thesis includes a novel method of representing keystroke data in a two-dimensional format suitable for a convolutional neural network, and it examines to what extent keystroke biometrics has implications for privacy on the internet. 14. SUBJECT TERMS 15. NUMBER OF artificial intelligence, machine learning, neural networks, keystroke biometrics, keystroke PAGES dynamics, authentication, identification, cyber security, fingerprinting 85 16. PRICE CODE 17. SECURITY 18. SECURITY 19. SECURITY 20. LIMITATION OF CLASSIFICATION OF CLASSIFICATION OF THIS CLASSIFICATION OF ABSTRACT REPORT PAGE ABSTRACT Unclassified Unclassified Unclassified UU NSN 7540-01-280-5500 Standard Form 298 (Rev. 2-89) Prescribed by ANSI Std. 239-18 i THIS PAGE INTENTIONALLY LEFT BLANK ii Approved for public release. Distribution is unlimited. USER IDENTIFICATION THROUGH KEYSTROKE BIOMETRICS AT AN INTERNET SCALE Mark W. Veazey Lieutenant, United States Navy BS, U.S. Naval Academy, 2011 Submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE IN COMPUTER SCIENCE from the NAVAL POSTGRADUATE SCHOOL December 2019 Approved by: Vinnie Monaco Advisor John D. Fulp Second Reader Peter J. Denning Chair, Department of Computer Science iii THIS PAGE INTENTIONALLY LEFT BLANK iv ABSTRACT Identification of users on the internet has broad-reaching implications in the computer science discipline regarding cyber security and privacy. Keystroke biometrics leverages the unique dynamics of how a user types to perform identification; however, current methods of authentication and identification using keystroke dynamics do not scale well beyond a few hundred users. This thesis investigates the feasibility of using conventional machine learning and deep learning techniques to identify users at an internet scale. By analyzing free-text keystroke information from a collection of over 100,000 users, several methods to perform user identification and profiling are identified, with a focus on determining how the size of the dataset affects identification accuracy. This thesis includes a novel method of representing keystroke data in a two-dimensional format suitable for a convolutional neural network, and it examines to what extent keystroke biometrics has implications for privacy on the internet. v THIS PAGE INTENTIONALLY LEFT BLANK vi TABLE OF CONTENTS I. INTRODUCTION..................................................................................................1 A. PROBLEM STATEMENT .......................................................................1 B. RESEARCH QUESTIONS .......................................................................2 C. SCALE ........................................................................................................2 D. BENEFITS OF RESEARCH ....................................................................3 E. ORGANIZATION .....................................................................................4 II. BACKGROUND ....................................................................................................5 A. BIOMETRICS............................................................................................5 1. Biometric Types .............................................................................5 2. Limitations ......................................................................................6 3. Biometric Menagerie .....................................................................7 4. Performance Measures ..................................................................7 B. KEYSTROKE DYNAMICS .....................................................................9 1. Static versus Dynamic Typing ......................................................9 2. Mechanics of a Keystroke .............................................................9 3. Computing Features ....................................................................10 4. Typing Performance Factors ......................................................11 C. INTERNET PRIVACY ...........................................................................13 1. General Authorship .....................................................................14 2. Fingerprinting ..............................................................................14 3. Methods of Anonymity ................................................................15 D. KEYSTROKE DYNAMICS AND INTERNET PRIVACY ................16 1. Keystroke Dynamics versus Browser Fingerprinting ..............16 2. Role of Keystroke Dynamics .......................................................18 III. THE KEYSTROKE DATASET .........................................................................19 A. DATA COLLECTION ............................................................................19 1. How Participants Were Chosen ..................................................19 2. Demographics ...............................................................................19 3. How Data Was Collected .............................................................20 4. How Data Was Presented ............................................................21 5. Keystroke Data Visualization .....................................................22 B. LIMITATIONS OF THIS DATASET ...................................................25 IV. BASELINE APPROACH ....................................................................................27 A. METHODOLOGY ..................................................................................28 vii 1. Feature Extraction .......................................................................28 2. Preprocessing................................................................................31 3. Classification ................................................................................33 4. Accuracy Metrics .........................................................................33 5. Profiling ........................................................................................34 B. RESULTS .................................................................................................34 1. Accuracy .......................................................................................34 2. Time ...............................................................................................36 3. CDFs ..............................................................................................37 4. Profiling Users ..............................................................................40

Load more