Inferring Sensitive Information from Seemingly Innocuous Smartphone Data
Total Page:16
File Type:pdf, Size:1020Kb
Inferring Sensitive Information from Seemingly Innocuous Smartphone Data Anthony Quattrone Submitted in total fulfillment of the requirements of the degree of Doctor of Philosophy Department of Computing and Information Systems The University of Melbourne, Australia December 2016 Copyright c 2016 Anthony Quattrone All rights reserved. No part of the publication may be reproduced in any form by print, photoprint, microfilm or any other means without written permission from the author except as permitted by law. Abstract Smartphones have become ubiquitous and provide considerable benefits for users. Personal data considered both sensitive and non-sensitive is commonly stored on smartphones. It has been established that the use of smartphones can lead to users enjoying less privacy when a data leak occurs. In this thesis, we aim to determine if smartphone stored non-sensitive data can be analyzed to derive sensitive information. We also aim to develop methods for protecting smartphone users from privacy attacks. While privacy research is an active area, more work is needed to determine the types of inferences that can be made about a person and how accurate a profile is derivable from smartphone data. We demonstrate how straightforward it is for third party app developers to embed code in inconspicuous apps to capture and mine data. Our studies show that there are a large number of apps found in popular app stores commonly requesting special app permissions in order to gain access to sensitive data. In most cases, we could not find any functional benefits in exchange for accessing the sensitive data. Additional data not local to the device but rather stored on a social networking service can also be extracted and unnoticed via mobile apps provided the user has social networking apps installed. Current research shows that users do not easily comprehend the implications of granting apps access to sensitive permissions. Apps can transfer captured data to cloud services easily via high-speed wireless networks. This is difficult for users to detect since mobile platforms do not provide alerts for when this occurs. With access to sensitive mobile data, we developed and performed a number of case studies to learn details about an individual. Modern smartphones are capable of sending continuous location updates to services providing a near real-time proxy of where a user is located. We were able to determine where a user was traveling simply by analyzing the point of interest results from continuous location queries sent to a location-based service without referencing user location data. With continuous location data, we were able to determine ii the personal encounters between individuals and relationships in realtime. We also found that even diagnostic data commonly used to debug apps that appears to be anonymous is useful in identifying an individual. Mobile devices disclose the indoor location information of the user indirectly via the wireless signals emitted by the device. We were able to locate users indoors by analyzing Bluetooth beacons within a 1m accuracy by using a localization scheme we developed. With the understanding that mobile data is sometimes needed by apps to provide functionality, we developed a system called PrivacyPalisade designed to protect users from revealing sensitive information. The system detects when apps are requesting uncharacteristic app permissions based on the app’s category. Overall we found that mobile smartphones store seemingly non-sensitive data that can reveal sensitive information upon further inspection and that this information is easily ac- cessible. In turn, an analyst can use this data to build a detailed individual profile of private and sensitive information. With the growing number of users expressing privacy concerns, techniques to better protect privacy are needed to allow manufacturers to meet their users’ privacy requirements. Our proposed protection methods demonstrated in PrivacyPalisade can be adopted to make smartphone platforms more privacy aware. Thesis Contributions: In this thesis, we show how sensitive information can be in- ferred from seemingly innocuous data and propose a protection system by performing the following: • Provide a comprehensive literature review of the current state of privacy research and how it relates to the use of smartphones. • Demonstrate an inference attack on trajectory privacy by reconstructing a route using only query results. • Develop an algorithm that combines range-based and range-free localization schemes to perform indoor localization using Bluetooth with accuracies of up to 1m. • Analyze diagnostic data commonly sent by smartphones and used it to identify users in a dataset with accuracies of up to 97%. • Develop an algorithm to infer potential encounters of smartphone users in realtime by proposing the use of a constraint nearest neighbor (c-NN) spatial query. • Develop and demonstrate PrivacyPalisade, a system developed for Android with the aim of protecting against privacy attacks. iii Declaration This is to certify that 1. the thesis comprises only my original work towards the PhD except where indicated in the preface, 2. due acknowledgment has been made in the text to all other material used, 3. the thesis is less than 100,000 words in length, exclusive of tables, maps, bibliogra- phies, and appendices. Anthony Quattrone, December 2016 iv Preface The research conducted throughout the completion of this thesis has led to publications contributing to the field. I am the primary author of these papers: • Chapter 3 is based on the work Tell Me What You Want and I Will Tell Others Where You Have Been, Anthony Quattrone, Elham Naghizade, Lars Kulik, Egemen Tanin - CIKM ’14 Proceedings of the 23rd ACM International Conference on Con- ference on Information and Knowledge Management. Please note that a preliminary study on the use of Voronoi diagrams was completed in an earlier honors research project. • Chapter 4 is based on the work Combining Range-Based and Range-Free Meth- ods: A Unified Approach for Localization, Anthony Quattrone, Lars Kulik, Egemen Tanin - SIGSPATIAL’15 Proceedings of the 23rd SIGSPATIAL International Confer- ence on Advances in Geographic Information Systems. • Chapter 5 is based on the work Mining City-Wide Encounters in Real-Time, An- thony Quattrone, Lars Kulik, Egemen Tanin - SIGSPATIAL’16 - Proceedings of the 24rd SIGSPATIAL International Conference on Advances in Geographic Information Systems. • Chapter 6 is based on the work Is This You?: Identifying a Mobile User Using Only Diagnostic Features, Anthony Quattrone, Tanusri Bhattacharya, Lars Kulik, Egemen Tanin, James Bailey - MUM ’14 Proceedings of the 13th International Con- ference on Mobile and Ubiquitous Multimedia. • Chapter 7 is based on the work PrivacyPalisade: Evaluating App Permissions and Building Privacy into Smartphones, Anthony Quattrone, Lars Kulik, Egemen v Tanin, Kotagiri Ramamohanarao, Tao Gu - ICICS ’15 Proceedings of the 10th Inter- national Conference on Information, Communications and Signal Processing. In addition, I contributed to the following additional publications as a co-author while com- pleting my PhD: • Protecting Privacy for Group Nearest Neighbor Queries with Crowdsourced Data and Computing, Tanzima Hashem, Mohammed Eunus Ali, Lars Kulik, Ege- men Tanin, Anthony Quattrone - UbiComp ’13 Proceedings of the 2013 ACM Inter- national Joint Conference on Pervasive and Ubiquitous Computing. • On the Effectiveness of Removing Location Information from Trajectory Data for Preserving Location Privacy, Amina Hossain, Anthony Quattrone, Egemen Tanin, Lars Kulik - IWCTS ’16 Proceedings of the 9th ACM SIGSPATIAL International Workshop on Computational Transportation Science. vi Dedicated to my family Acknowledgements Completing this dissertation has been a tremendous undertaking that has helped me de- velop both intellectually and personally. Firstly, I would like to thank my supervisors Prof. Lars Kulik and Assoc. Prof Egemen Tanin who made this research possible and provided tremendous support and guidance throughout this journey. My sincerest gratitude and thanks goes out to my principal supervisor Lars for encour- aging and convincing me to pursue graduate research when I was at a major crossroads. The continued support and advice provided over a number of years has been invaluable. I feel I have greatly benefited from Lars’s mentorship in many areas. Particularly, in how to approach, define and find elegant solutions to complex problems in which I was always con- tinuously amazed by his ability. I am thankful to Lars for all the help and support provided that has helped me develop as a researcher. A special thanks to my co-supervisor Egemen Tanin for his strong encouragement and continued support. Particularly in ensuring this research stayed on track and progressed smoothly. I am grateful to Egemen for his continued advice and feedback that greatly helped me consistently ensure the work remained at a high standard. Egemen helped me in pushing the envelope further, ensuring experimental results hold up to scrutiny and greatly assisting me in finding solutions to complex implementation issues. I am thankful to Egemen for all the time and support offered. I am thankful to my committee chair member Prof. Rajkumar Buyya who provided valu- able advice and encouragement at key milestone reviews. I much appreciate the perspective Raj offered in how to ensure publications stay relevant and the great feedback he provided at my seminars including at