Identifying Users Using Keystroke Dynamics and Contextual Information

Programa de doctorat de la Universitat d’Andorra Identifying users using Keystroke Dynamics and contextual information Identificació d’usuaris mitjançant cadència de tecleig i dades contextuals Aleix Dorca Josa Direcció: Dr. Jose Antonio Morán Moreno i Dra. Eugènia Santamaría Pérez Identificador: TD-049-100018/201710 Data de defensa: 5 de febrer de 2018 ADVERTIMENT. La consulta d’aquesta tesi queda condicionada a l’acceptació de les següents condicions d’ús: La difusió d’aquesta tesi per mitjà del servei TDX (www.tdx.cat) ha estat autoritzada pels titulars dels drets de propietat intel lectual · únicament per a usos privats emmarcats en activitats d’investigació i docència. No s’autoritza la seva reproducció amb finalitats de lucre ni la seva difusió i posada a disposició des d’un lloc aliè al servei TDX. No s’autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant al resum de presentació de la tesi com als seus continguts. En la utilització o cita de parts de la tesi és obligat indicar el nom de la persona autora. WARNING. On having consulted this thesis you’re accepting the following use conditions: Spreading this thesis by the TDX (www.tdx.cat) service has been authorized by the titular of the intellectual property rights only for private uses placed in investigation and teaching activities. Reproduction with lucrative aims is not authorized neither its spreading nor the availability from a site foreign to the TDX service. Introducing its content in a window or frame foreign to the TDX service is not authorized (framing). These rights affect to the presentation summary of the thesis as well as to its contents. In the using or citation of parts of the thesis it’s obliged to indicate the name of the author. Programa de doctorat de la Universitat d’Andorra Identifying users using Keystroke Dynamics and contextual information Identificació d’usuaris mitjançant cadència de tecleig i dades contextuals PhD Thesis Tesi doctoral Aleix Dorca Josa supervised by Eugènia Santamaría Pérez Jose Antonio Morán Moreno October 6, 2017 To my wife Nina and sons Aleix & Jordi Sorry... “So long, and thanks for all the fish” Douglas Adams Contents List of Figures vii List of Tables ix List of Listings xii List of Abbreviations xiii Declaration xv Abstract xvi Resum xvii Acknowledgments xviii 1 Introduction1 1.1 Justification and research context.....................2 1.2 Document structure.............................4 2 State of the Art6 2.1 Biometrics..................................6 2.1.1 Introduction.............................6 2.1.2 Basic biometric steps........................7 2.1.3 Common biometric techniques...................8 2.1.4 Multimodal biometric techniques................. 10 2.1.5 Keystroke Dynamics over other techniques............ 11 2.2 Keystroke Dynamics............................ 12 2.2.1 Feature selection.......................... 14 2.2.2 Fixed text vs. Free text...................... 17 i 2.2.3 System vs. Application data recollection............. 18 2.2.4 Authentication, Verification and Identification.......... 19 2.2.5 Gender recognition......................... 19 2.3 Biometric evaluation............................ 20 2.3.1 Accuracy............................... 20 2.3.2 FAR and FRR........................... 20 2.3.3 Equal Error Rate.......................... 21 2.3.4 Receiver Operating Characteristic curves............. 22 2.4 Methodology applied to Keystroke Dynamics............... 23 2.5 Classification techniques.......................... 25 2.5.1 Statistical.............................. 26 2.5.2 Distance measurements....................... 26 2.5.3 Machine learning.......................... 30 2.6 Other techniques.............................. 31 2.6.1 Fusion................................ 31 2.6.2 Weighting features......................... 31 2.7 Bibliography analysis............................ 32 2.8 Relevant results from previous research.................. 34 2.8.1 Free text studies.......................... 35 2.8.2 Fixed text studies.......................... 39 2.9 Keystroke Dynamics applications..................... 42 2.10 Advantages of using Keystroke Dynamics................. 43 2.11 Summary.................................. 46 3 Objectives and Hypotheses 49 3.1 Objectives.................................. 49 3.2 Hypotheses................................. 50 3.3 Summary.................................. 50 4 Methodology 51 4.1 Contextual information and behavioral features............. 53 4.1.1 Context applied to Keystroke Dynamics............. 55 4.1.2 Behavioral features......................... 56 4.2 The Dataset................................. 56 4.2.1 Software developed to collect samples............... 57 ii 4.2.2 Samples gathering......................... 60 4.2.3 Ethics in samples gathering.................... 61 4.2.4 Keystroke dataset.......................... 62 4.2.5 Selecting users and groups..................... 66 4.3 Model description.............................. 71 4.3.1 Interval analysis........................... 71 4.3.2 Straight tree model......................... 73 4.3.3 Inverted tree model......................... 73 4.3.4 Combined tree model........................ 74 4.3.5 Forest of trees model........................ 75 4.3.6 n-graph frequency model...................... 76 4.4 Testing the models............................. 77 4.4.1 Size, quality and searching parameters.............. 77 4.4.2 Behavioral features......................... 83 4.4.3 Comparing new samples to the model............... 86 4.4.4 Determining the owner of a session................ 90 4.4.5 Authentication........................... 100 4.4.6 Age group and gender....................... 102 4.4.7 Cross-validation methodology................... 103 4.5 Summary.................................. 103 5 Results 105 5.1 Using Relative and Absolute distances.................. 107 5.1.1 Results using the n-graph methodology.............. 109 5.2 Test 1 – Quality and size of the model.................. 110 5.2.1 Model building methodology.................... 111 5.2.2 Samples verification methodology................. 112 5.2.3 Evaluated parameters........................ 113 5.2.4 Number of independent tests performed............. 114 5.2.5 Determining the owner of a session................ 114 5.2.6 Results for the Quality and size of the model test........ 116 5.2.7 Performance evaluation....................... 120 5.2.8 Test 1 summary........................... 121 5.3 Test 2 – Most relevant model parameters................. 123 5.3.1 Initial model parameters...................... 123 iii 5.3.2 Samples verification methodology................. 124 5.3.3 Evaluated parameters........................ 124 5.3.4 Number of independent tests performed............. 125 5.3.5 Determining the owner of a session................ 126 5.3.6 Results for the Most relevant model parameters test....... 126 5.3.7 Feature selection.......................... 131 5.3.8 Performance evaluation....................... 131 5.3.9 Test 2 summary........................... 132 5.4 Test 3 – Distances and methods to identify users............. 133 5.4.1 Initial model parameters...................... 133 5.4.2 Samples verification methodology................. 133 5.4.3 Distances and methods evaluated................. 134 5.4.4 Number of independent tests performed............. 134 5.4.5 Results for the identification of users test............. 134 5.4.6 Cleaning sessions of large values.................. 144 5.4.7 Test 3 summary........................... 145 5.5 Test 4 – Features related to user behavior................ 146 5.5.1 Behavioral features......................... 146 5.5.2 Initial model parameters...................... 147 5.5.3 Number of independent tests performed............. 147 5.5.4 Results when evaluating user behavior.............. 148 5.5.5 Test 4 summary........................... 149 5.6 Test 5 – User group size.......................... 149 5.6.1 Number of independent tests performed............. 150 5.6.2 Results for the user group sizes test................ 150 5.6.3 Test 5 summary........................... 154 5.7 Test 6 – Authenticating users....................... 154 5.7.1 Number of independent tests performed............. 155 5.7.2 Results for the authentication tests................ 156 5.7.3 Test 6 summary........................... 156 5.8 Test 7 – Dealing with age group and gender............... 159 5.8.1 Number of independent tests performed............. 161 5.8.2 Gender separation......................... 161 5.8.3 Results for the gender separation test............... 162 iv 5.8.4 Age group separation........................ 163 5.8.5 Results for the age group separation test............. 165 5.8.6 Age group and gender separation analyzing mistakes...... 167 5.8.7 Results when separating by age group and gender........ 167 5.8.8 Test 7 summary........................... 174 5.9 Summary.................................. 176 6 Conclusions 177 6.1 Conclusions on the proposed Objectives.................. 179 6.1.1 On the validity of the model.................... 179 6.1.2 On the underlying methodology.................. 181 6.1.3 On the parameters to build and search the models........ 182 6.1.4 On age group and gender separation............... 183 6.1.5 On authentication.......................... 183 6.1.6 On behavioral features....................... 184 6.1.7 On the main objective....................... 185 6.2 Conclusions on the

Identifying Users Using Keystroke Dynamics and Contextual Information

Using Xgboost to Classify the Beihang Keystroke Dynamics Database

Keystroke Dynamic Analysis Using Relative Entropy & Timing Sequence

Identification of User Behavioural Biometrics for Authentication Using Keystroke Dynamics and Machine Learning

Shared Data Set for Free-Text Keystroke Dynamics Authentica- Tion Algorithms

Behavioral Biometric Verification of Student Identity in Online Course Assessment and Authentication of Authors in Literary Works

Enhancing Online Banking Authentication Using Keystroke Dynamics

Downloaded From

Biometric Authentication Techniques: a Study on Keystroke Dynamics

Keystroke Dynamics for User Authentication and Identification by Using Typing Rhythm

Authentication System Based on Keystroke Dynamics

Soft Biometrics for Keystroke Dynamics Syed Zulkarnain Syed Idrus

Keystroke Dynamics