Identifying Users Using Keystroke Dynamics and Contextual Information

Identifying Users Using Keystroke Dynamics and Contextual Information

Programa de doctorat de la Universitat d’Andorra Identifying users using Keystroke Dynamics and contextual information Identificació d’usuaris mitjançant cadència de tecleig i dades contextuals Aleix Dorca Josa Direcció: Dr. Jose Antonio Morán Moreno i Dra. Eugènia Santamaría Pérez Identificador: TD-049-100018/201710 Data de defensa: 5 de febrer de 2018 ADVERTIMENT. La consulta d’aquesta tesi queda condicionada a l’acceptació de les següents condicions d’ús: La difusió d’aquesta tesi per mitjà del servei TDX (www.tdx.cat) ha estat autoritzada pels titulars dels drets de propietat intel lectual · únicament per a usos privats emmarcats en activitats d’investigació i docència. No s’autoritza la seva reproducció amb finalitats de lucre ni la seva difusió i posada a disposició des d’un lloc aliè al servei TDX. No s’autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant al resum de presentació de la tesi com als seus continguts. En la utilització o cita de parts de la tesi és obligat indicar el nom de la persona autora. WARNING. On having consulted this thesis you’re accepting the following use conditions: Spreading this thesis by the TDX (www.tdx.cat) service has been authorized by the titular of the intellectual property rights only for private uses placed in investigation and teaching activities. Reproduction with lucrative aims is not authorized neither its spreading nor the availability from a site foreign to the TDX service. Introducing its content in a window or frame foreign to the TDX service is not authorized (framing). These rights affect to the presentation summary of the thesis as well as to its contents. In the using or citation of parts of the thesis it’s obliged to indicate the name of the author. Programa de doctorat de la Universitat d’Andorra Identifying users using Keystroke Dynamics and contextual information Identificació d’usuaris mitjançant cadència de tecleig i dades contextuals PhD Thesis Tesi doctoral Aleix Dorca Josa supervised by Eugènia Santamaría Pérez Jose Antonio Morán Moreno October 6, 2017 To my wife Nina and sons Aleix & Jordi Sorry... “So long, and thanks for all the fish” Douglas Adams Contents List of Figures vii List of Tables ix List of Listings xii List of Abbreviations xiii Declaration xv Abstract xvi Resum xvii Acknowledgments xviii 1 Introduction1 1.1 Justification and research context.....................2 1.2 Document structure.............................4 2 State of the Art6 2.1 Biometrics..................................6 2.1.1 Introduction.............................6 2.1.2 Basic biometric steps........................7 2.1.3 Common biometric techniques...................8 2.1.4 Multimodal biometric techniques................. 10 2.1.5 Keystroke Dynamics over other techniques............ 11 2.2 Keystroke Dynamics............................ 12 2.2.1 Feature selection.......................... 14 2.2.2 Fixed text vs. Free text...................... 17 i 2.2.3 System vs. Application data recollection............. 18 2.2.4 Authentication, Verification and Identification.......... 19 2.2.5 Gender recognition......................... 19 2.3 Biometric evaluation............................ 20 2.3.1 Accuracy............................... 20 2.3.2 FAR and FRR........................... 20 2.3.3 Equal Error Rate.......................... 21 2.3.4 Receiver Operating Characteristic curves............. 22 2.4 Methodology applied to Keystroke Dynamics............... 23 2.5 Classification techniques.......................... 25 2.5.1 Statistical.............................. 26 2.5.2 Distance measurements....................... 26 2.5.3 Machine learning.......................... 30 2.6 Other techniques.............................. 31 2.6.1 Fusion................................ 31 2.6.2 Weighting features......................... 31 2.7 Bibliography analysis............................ 32 2.8 Relevant results from previous research.................. 34 2.8.1 Free text studies.......................... 35 2.8.2 Fixed text studies.......................... 39 2.9 Keystroke Dynamics applications..................... 42 2.10 Advantages of using Keystroke Dynamics................. 43 2.11 Summary.................................. 46 3 Objectives and Hypotheses 49 3.1 Objectives.................................. 49 3.2 Hypotheses................................. 50 3.3 Summary.................................. 50 4 Methodology 51 4.1 Contextual information and behavioral features............. 53 4.1.1 Context applied to Keystroke Dynamics............. 55 4.1.2 Behavioral features......................... 56 4.2 The Dataset................................. 56 4.2.1 Software developed to collect samples............... 57 ii 4.2.2 Samples gathering......................... 60 4.2.3 Ethics in samples gathering.................... 61 4.2.4 Keystroke dataset.......................... 62 4.2.5 Selecting users and groups..................... 66 4.3 Model description.............................. 71 4.3.1 Interval analysis........................... 71 4.3.2 Straight tree model......................... 73 4.3.3 Inverted tree model......................... 73 4.3.4 Combined tree model........................ 74 4.3.5 Forest of trees model........................ 75 4.3.6 n-graph frequency model...................... 76 4.4 Testing the models............................. 77 4.4.1 Size, quality and searching parameters.............. 77 4.4.2 Behavioral features......................... 83 4.4.3 Comparing new samples to the model............... 86 4.4.4 Determining the owner of a session................ 90 4.4.5 Authentication........................... 100 4.4.6 Age group and gender....................... 102 4.4.7 Cross-validation methodology................... 103 4.5 Summary.................................. 103 5 Results 105 5.1 Using Relative and Absolute distances.................. 107 5.1.1 Results using the n-graph methodology.............. 109 5.2 Test 1 – Quality and size of the model.................. 110 5.2.1 Model building methodology.................... 111 5.2.2 Samples verification methodology................. 112 5.2.3 Evaluated parameters........................ 113 5.2.4 Number of independent tests performed............. 114 5.2.5 Determining the owner of a session................ 114 5.2.6 Results for the Quality and size of the model test........ 116 5.2.7 Performance evaluation....................... 120 5.2.8 Test 1 summary........................... 121 5.3 Test 2 – Most relevant model parameters................. 123 5.3.1 Initial model parameters...................... 123 iii 5.3.2 Samples verification methodology................. 124 5.3.3 Evaluated parameters........................ 124 5.3.4 Number of independent tests performed............. 125 5.3.5 Determining the owner of a session................ 126 5.3.6 Results for the Most relevant model parameters test....... 126 5.3.7 Feature selection.......................... 131 5.3.8 Performance evaluation....................... 131 5.3.9 Test 2 summary........................... 132 5.4 Test 3 – Distances and methods to identify users............. 133 5.4.1 Initial model parameters...................... 133 5.4.2 Samples verification methodology................. 133 5.4.3 Distances and methods evaluated................. 134 5.4.4 Number of independent tests performed............. 134 5.4.5 Results for the identification of users test............. 134 5.4.6 Cleaning sessions of large values.................. 144 5.4.7 Test 3 summary........................... 145 5.5 Test 4 – Features related to user behavior................ 146 5.5.1 Behavioral features......................... 146 5.5.2 Initial model parameters...................... 147 5.5.3 Number of independent tests performed............. 147 5.5.4 Results when evaluating user behavior.............. 148 5.5.5 Test 4 summary........................... 149 5.6 Test 5 – User group size.......................... 149 5.6.1 Number of independent tests performed............. 150 5.6.2 Results for the user group sizes test................ 150 5.6.3 Test 5 summary........................... 154 5.7 Test 6 – Authenticating users....................... 154 5.7.1 Number of independent tests performed............. 155 5.7.2 Results for the authentication tests................ 156 5.7.3 Test 6 summary........................... 156 5.8 Test 7 – Dealing with age group and gender............... 159 5.8.1 Number of independent tests performed............. 161 5.8.2 Gender separation......................... 161 5.8.3 Results for the gender separation test............... 162 iv 5.8.4 Age group separation........................ 163 5.8.5 Results for the age group separation test............. 165 5.8.6 Age group and gender separation analyzing mistakes...... 167 5.8.7 Results when separating by age group and gender........ 167 5.8.8 Test 7 summary........................... 174 5.9 Summary.................................. 176 6 Conclusions 177 6.1 Conclusions on the proposed Objectives.................. 179 6.1.1 On the validity of the model.................... 179 6.1.2 On the underlying methodology.................. 181 6.1.3 On the parameters to build and search the models........ 182 6.1.4 On age group and gender separation............... 183 6.1.5 On authentication.......................... 183 6.1.6 On behavioral features....................... 184 6.1.7 On the main objective....................... 185 6.2 Conclusions on the

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    284 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us