Fingerprinting for Web Applications: from Devices to Related Groups

Fingerprinting for Web Applications: from Devices to Related Groups Christine Blakemore Thesis to obtain the Master of Science Degree in Telecommunications and Informatics Engineering Supervisors: Prof. Miguel Nuno Dias Alves Pupo Correia Eng. João Vasco de Oliveira Redol Examination Committee Chairperson: Prof. Paulo Jorge Pires Ferreira Supervisor: Prof. Miguel Nuno Dias Alves Pupo Correia Member of the Committee: Prof. Maria Dulce Pedroso Domingos May 2016 ii Dedicated to my family, who always supported me in every decision. iii iv Acknowledgments I would like to thank my supervisor Prof. Miguel Pupo Correia for his never ending support and enthusiasm, as well as my co-supervisor Eng. Joao˜ Redol for all granted help, during the course of this work. I warmly thank Prof. Miguel Pardal, Prof. Nuno Santos and Dra. Iberia´ Medeiros for their comments on a pre- liminary version of the work. To my family, Margarida Mendes, Craig Blakemore, Rosa Mendes, Ab´ılio Mendes, Lurdes Mendes, Jose´ Mendes, Sofia Seromenho, Carolina Mendes and Constança Mendes, thank you all for your love and support, whom without I would have never been able to achieve this goal. Lastly, I am grateful to my friends, for their support and companionship throughout my academic progress. Thank you all! v vi Resumo Identificar utilizadores e os seus dispositivos e´ tao˜ importante em aplicaçoes˜ web como em varios´ outros contextos. A identificaçao˜ de utilizadores em aplicaçoes˜ web geralmente envolve um processo de autenticaçao,˜ por exemplo, com a inserçao˜ de um nome de utilizador e de uma senha. Esta identificaçao˜ e´ tambem´ poss´ıvel sem autenticaçao˜ expl´ıcita, utilizando cookies ou fingerprints do dispositivo. O processo de device fingerprinting e´ util´ para varios´ fins, como por exemplo para servir como um segundo fator de autenticaçao.˜ Este trabalho pretende ser um passo em direçao˜ ao cross-device fingerprinting, isto e,´ a identificaçao˜ do mesmo utilizador em diferentes dispositivos utilizando metodos´ de fingerprinting. No entanto, temos como alvo uma variante do problema a que chamamos de related group fingerprinting.Nos´ definimos um related group como um conjunto de pessoas (por exemplo, uma fam´ılia) que partilham a mesma rede domestica.´ Idealizou-se um esquema de related group fingerprinting que foi avaliado experimentalmente com dados de centenas de utilizadores. Esta avaliaçao˜ sugere que related group fingerprinting e´ viavel.´ Palavras-chave: Fingerprinting na Web, Autenticaçao,˜ Aplicaçoes˜ Web, Cross-device fingerprinting, Cross-browser fingerprinting, Related Group Fingerprinting vii viii Abstract Identifying users and user devices is as important in web applications as in many other contexts. In web applications, user identification usually involves an authentication process, e.g., providing a username and a password. Identification is also possible without explicit authentication using cookies or device fingerprints. Device fingerprinting is also useful for several purposes, e.g., to serve as a second factor of authentication. Recently some interest appeared in the problem of cross-device fingerprinting, i.e., of the identification of the same user in differ- ent devices using fingerprinting. We target a variation of the problem that we call related group fingerprinting. We define a related group as a set of persons (e.g., a family) that share the same home network. We devised a related group fingerprinting scheme that we evaluated experimentally with data from hundreds of users. This evaluation suggests that group fingerprinting is feasible. Keywords: Web fingerprinting, Authentication, Web applications, Cross-device fingerprinting, Cross- browser fingerprinting, Related Group Fingerprinting ix x Contents Acknowledgments..............................................v Resumo................................................... vii Abstract................................................... ix List of Tables................................................ xiii List of Figures................................................ xv 1 Introduction 1 1.1 Problem Statement..........................................1 1.2 Methodology and Contributions...................................2 1.3 Document Structure..........................................2 2 Background 5 2.1 Current Mechanisms.........................................5 2.1.1 IP Address and HTTP Referrer................................5 2.1.2 Cookies............................................5 2.1.3 Biometrics-Based Authentication..............................7 2.2 Fingerprinting Mechanisms......................................8 2.2.1 Browser Fingerprinting....................................9 2.2.2 Cross-Browser Fingerprinting................................ 13 2.3 Behavior-Based Mechanisms..................................... 14 2.3.1 Location Tracking...................................... 14 2.3.2 Site Preferences........................................ 15 2.3.3 Keyboard and Mouse Dynamics............................... 16 2.4 Protection Mechanisms........................................ 18 3 The Fingerprinting Approach 21 3.1 Device Fingerprinting......................................... 21 3.2 Cross-Device Fingerprinting..................................... 21 3.3 Related Group Fingerprinting..................................... 22 3.4 Testing Tool.............................................. 23 3.4.1 Website and Datasets..................................... 23 3.4.2 User Statistics........................................ 28 xi 3.5 Metrics................................................ 29 3.5.1 Hamming distance...................................... 29 3.5.2 Entropy............................................ 30 3.5.3 Precision and accuracy.................................... 31 4 Experimental Evaluation 33 4.1 Device Fingerprinting......................................... 33 4.2 Cross-Device Fingerprinting..................................... 34 4.3 Related Group Fingerprinting..................................... 35 5 Conclusions 39 5.1 Challenges............................................... 39 5.2 Future Work.............................................. 39 Bibliography 41 xii List of Tables 2.1 Comparison of Various Biometric Technologies............................8 3.1 Fingerprinting Features Collected in the Experiments........................ 24 3.2 Dataset Examples........................................... 27 3.3 Website Questions........................................... 28 3.4 Hamming Distance Between Fingerprints A, B, C and D...................... 29 3.5 Entropy Values............................................ 32 4.1 Fingerprint Feature Subsets...................................... 34 4.2 ISPs and their Number of Related Groups.............................. 35 4.3 Groups that Are Not Related Groups................................. 36 4.4 Website Accesses for a User of Dataset II.............................. 36 4.5 Related Group Analysis for Dataset I................................. 38 xiii xiv List of Figures 3.1 Website’s First Page.......................................... 25 3.2 Website’s Second Page........................................ 25 3.3 Website’s Third Page......................................... 26 3.4 Participants Age Group........................................ 28 3.5 Participants Gender.......................................... 28 3.6 Question 1 (Q1)............................................ 28 3.7 Question 2 (Q2)............................................ 28 3.8 Question 3 (Q3)............................................ 29 3.9 Question 4 (Q4)............................................ 29 3.10 Features with Top 10 Entropy Values................................. 31 xv xvi Chapter 1 Introduction Identifying users and user devices is as important in web applications as in many other contexts. In web applications, user identification usually involves an authentication process, e.g., providing credentials like a username, a password, or a token-generated code. Identification is also possible without explicit authentication using cookies or device fingerprinting. Device fingerprinting, or web-based device fingerprinting, consists in gathering multiple pieces of information from the client’s device and browser for identification purposes, e.g., the fonts and plugins installed, the model, version, and language of the browser, etc. [1,2]. Device fingerprinting is useful for several purposes. It can serve as a second or third factor of authentication [3], complementing the above-mentioned credentials. It can also serve to detect lost or stolen user devices, when they contact a server that uses this form of authentication. Both examples are extremely important today, with the increasing number of phishing attacks and thefts of personal devices. We use the term fingerprint to designate a set of features that identifies a device, browser, or user. To be useful, fingerprints have to combine information that allows to uniquely identify devices. The more diverse the feature’s values, the more they may be unique, as their values may be less likely shared by multiple devices. Every time a user accesses a web page that includes fingerprinting software, the device fingerprint is collected and compared to a database of known devices. If the device is not in the database, it is added, increasing the number of known devices [4,5]. 1.1 Problem Statement Recently some interest appeared in the problem of cross-device fingerprinting, i.e., of the identification of the same

Load more