University of Bradford Ethesis
Total Page:16
File Type:pdf, Size:1020Kb
University of Bradford eThesis This thesis is hosted in Bradford Scholars – The University of Bradford Open Access repository. Visit the repository for full metadata or to contact the repository team © University of Bradford. This work is licenced for reuse under a Creative Commons Licence. SOCIAL DATA MINING FOR CRIME INTELLIGENCE H. ISAH PHD 2017 Social Data Mining for Crime Intelligence: Contributions to Data Quality Assessment and Prediction Methods Haruna ISAH Submitted for the Degree of Doctor of Philosophy Department of Computer Science University of Bradford 2017 Abstract Haruna Isah Social Data Mining for Crime Intelligence: Contributions to Social Data Quality Assessment and Prediction Methods Keywords: Social networks analysis, data mining, social network data quality, digital crime intelligence With the advancement of the Internet and related technologies, many traditional crimes have made the leap to digital environments. The successes of data mining in a wide variety of disciplines have given birth to crime analysis. Traditional crime analysis is mainly focused on understanding crime patterns, however, it is unsuitable for identifying and monitoring emerging crimes. The true nature of crime remains buried in unstructured content that represents the hidden story behind the data. User feedback leaves valuable traces that can be utilised to measure the quality of various aspects of products or services and can also be used to detect, infer, or predict crimes. Like any application of data mining, the data must be of a high quality standard in order to avoid erroneous conclusions. This thesis presents a methodology and practical experiments towards discovering whether (i) user feedback can be harnessed and processed for crime intelligence, (ii) criminal associations, structures, and roles can be inferred among entities involved in a crime, and (iii) methods and standards can be developed for measuring, predicting, and comparing the quality level of social data instances and samples. It contributes to the theory, design and development of a novel framework for crime intelligence and algorithm for the estimation of social data quality by innovatively adapting the methods of monitoring water contaminants. Several experiments were conducted and the results obtained revealed the significance of this study in mining social data for crime intelligence and in developing social data quality filters and decision support systems. i Declaration The candidate confirms that the work submitted is his own and that appropriate credit has been given where reference has been made to the work of others. Haruna Isah ii Acknowledgements To God be all the glory. I couldn’t have achieved such a great feat without the support of many people. First and foremost, I am forever indebted to my wife, my two kids, my parents, family and friends. They are my root and have made countless sacrifices to provide the optimal starting conditions and motivations for me to flourish. I wouldn’t be in a position of enrolling and completing a PhD thesis without their endless support and love. They have been a welcome distraction. I wish to express my unreserved heartfelt gratitude to my supervisory team which includes my principal supervisor Professor Daniel Neagu and my associate supervisor Dr Paul Trundle for their valuable support before my enrolment into the PhD programme, during funding applications and throughout the entire PhD journey. I couldn’t have endured in completing this thesis without their support and understanding. Special thanks to the panel of examiners (Dr Ian Knopke, Dr Dhaval Thakker, and Professor Crina Oltean-Dumbrava) for their comments and feedback which has greatly enhanced the quality of this thesis. Special thanks to the Commonwealth Scholarship Commission for providing me with the required studentship and grants. Am also indebted to the Federal University of Technology (FUT) Minna for supporting me throughout my studies. Same goes to the staff of the School of Electrical Engineering and Computer Science, other staff of the University of Bradford (especially Sue Baker, the International Student Adviser), Churches Together in Britain and Ireland (CBTI), WAW 2015 School organisers, and many other individuals and organisations who assisted me in achieving my dreams. Finally, I would like to extend my thanks to the entire Tangale and my Church communities (especially my Pastors, Dr and Mrs Akpo Onduku, Dr and Mrs Amos Fatokun, Col and Mrs Solomon Inuwa), researchers, my guardians (Prof. Nebath N. Tanglang and Prof. Solomon L. Lamai), all my guarantors, and organisations that have provided me with different kind of grants, feedback on my research, publication reviews, and datasets used in my experiments. iii Table of Contents Abstract ................................................................................................................ i Acknowledgements ............................................................................................ iii Table of Contents ............................................................................................... iv List of Figures ................................................................................................... vii List of Tables...................................................................................................... ix Glossary .............................................................................................................. x Chapter One: Introduction ................................................................................... 1 1.1. Research Motivation ................................................................................. 8 1.2. Thesis Statement ................................................................................... 10 1.3. Scope ..................................................................................................... 11 1.4. Research Questions ............................................................................... 11 1.5. Methodology ........................................................................................... 12 1.6. Contributions and Publications ............................................................... 13 1.7. Outline .................................................................................................... 15 Chapter Two: Social Data Content Mining for Crime Intelligence ..................... 17 2.1. Introduction ............................................................................................. 17 2.2. Literature Review ................................................................................... 19 2.3. Social Data Definition and Representation ............................................. 22 2.3.1. Representation of Rating Data ......................................................... 23 2.3.2. Representation of Text Data ............................................................ 26 2.4. Rating Aggregation ................................................................................. 32 2.5. Topic Mining ........................................................................................... 37 2.6. Sentiment Analysis ................................................................................. 41 2.7. Product Safety Framework ..................................................................... 43 2.8. Experimental Work ................................................................................. 49 2.8.1. Datasets ........................................................................................... 49 2.8.2. Exploration and Aggregation of Rating Data .................................... 51 2.8.3. Textual Feedback Processing .......................................................... 55 2.8.4. Mining Topics from User Feedback ................................................. 62 iv 2.8.5. Mining Sentiments from User Feedback .......................................... 66 2.9. Issues in Social Data Content Mining ..................................................... 72 2.10. Conclusions .......................................................................................... 74 Chapter Three: Social Network Mining for Crime Intelligence ........................... 76 3.1. Introduction ............................................................................................. 76 3.2. Literature Review ................................................................................... 78 3.3. Network Analysis Fundamentals ............................................................ 83 3.3.1. Network Representation .................................................................. 84 3.3.2. Network Structures and Properties .................................................. 87 3.3.3. Network Communities ...................................................................... 93 3.4. Inferring Hidden Ties in Crime Data ....................................................... 98 3.4.1. Network of Items Co-occurrences .................................................... 98 3.4.2. Bipartite Network Projection ........................................................... 100 3.5. Experimental Work ............................................................................... 102 3.5.1. Datasets ......................................................................................... 102 3.5.2. Rogue Manufacturer-Manufacturer Network .................................. 102 3.5.3. Darknet Vendor-Vendor Network ..................................................