Automatic Detection of Cybercrime-Suspected Threads in Online Underground Forums
Total Page:16
File Type:pdf, Size:1020Kb
Graduate Theses, Dissertations, and Problem Reports 2018 Automatic Detection of Cybercrime-suspected Threads in Online Underground Forums Jian Liu Follow this and additional works at: https://researchrepository.wvu.edu/etd Recommended Citation Liu, Jian, "Automatic Detection of Cybercrime-suspected Threads in Online Underground Forums" (2018). Graduate Theses, Dissertations, and Problem Reports. 7204. https://researchrepository.wvu.edu/etd/7204 This Thesis is protected by copyright and/or related rights. It has been brought to you by the The Research Repository @ WVU with permission from the rights-holder(s). You are free to use this Thesis in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you must obtain permission from the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/ or on the work itself. This Thesis has been accepted for inclusion in WVU Graduate Theses, Dissertations, and Problem Reports collection by an authorized administrator of The Research Repository @ WVU. For more information, please contact [email protected]. Automatic Detection of Cybercrime-suspected Threads in Online Underground Forums Jian Liu Thesis submitted to the Benjamin M. Statler College of Engineering and Mineral Resources at West Virginia University in partial fulfillment of the requirements for the degree of Master of Science in Computer Science Yanfang Ye, Ph.D., Committee Chairperson Xin Li, Ph.D. Elaine M. Eschen, Ph.D. Lane Department of Computer Science and Electrical Engineering Morgantown, West Virginia 2018 Keywords: underground forums, cybercrime, Heterogeneous Information Network Copyright 2018 Jian Liu Abstract Automatic Detection of Cybercrime-suspected Threads in Online Underground Forums Jian Liu Recently, online underground markets emerging in the form of underground forums have enabled cybercriminals to exchange knowledge, trade in illicit products or services, which have played a central role in the cybercriminal ecosystem. In order to facilitate the deployment of effective countermeasures to disrupt the illicit activities, it’s important to analyze these underground forums to gain deep insights into the ecosystem. Unfortunately, due to the number of forums, their size, and the expertise required, it’s infeasible to perform manual exploration to understand their behavioral processes. To address the above challenges, in this thesis, we propose a novel framework to automate the analysis of underground forums for the detection of cybercrime-suspected threads. To detect whether a given thread is cybercrime-suspected or not, we not only analyze the content in the threads, but also utilize the relations among threads, users, replies, and topics. To model the rich semantic relationships (i.e., thread-user, thread-reply, thread-topic, reply-user and reply-topic relations), we introduce a structured Heterogeneous Information Network (HIN) for representation, which is capable to be composed of different types of entities and relations. To capture the complex relationships (e.g., two threads are relevant if they were posted by the same user and discussed the same topic), we use a meta- structure-based approach to characterize the semantic relatedness over threads. As different meta-structures depict the relatedness over threads at different views, we then build a classifier using Laplacian scores to aggregate different similarities formulated by different meta-structures to make predictions. Comprehensive experiments on the data collections from online underground forums are conducted to validate the effectiveness of our developed system in cybercrime-suspected thread detection by comparisons with alternative approaches. iii To my families iv Acknowledgments I would like to express the deepest appreciation to my committee chair Dr. Yanfang Ye, for the patient guidance, encouragement and advice she has provided throughout my time as her student. I have been extremely lucky to have a supervisor who cared so much about my work, and who responded to my questions and queries so promptly. I would like to thank my committee members, Dr. Li and Dr. Eschen. Thank you for investing time. I feel proud and honored that you accepted to be my committee. I also want to thank my group mates Yiming Zhang and Yujie Fan. Without their help, this thesis cannot be accomplished. v Table of Content Content Abstract ......................................................................................................... ii Acknowledgments ........................................................................................ iv Table of Content ............................................................................................ v List of Figures .............................................................................................. vi List of Tables ............................................................................................... vii Chapter 1. Introduction ................................................................................. 1 1.1 Background and Motivation .................................................................... 1 1.2 Research Objective .................................................................................. 3 1.3 Major Contributions ................................................................................ 4 1.4 Organization of the Thesis ...................................................................... 6 Chapter 2. Related Work ............................................................................... 7 Chapter 3. Proposed Method ......................................................................... 9 3.1 Feature Extraction ................................................................................... 9 3.2 Heterogeneous Information Network Construction .............................. 10 3.3 Meta-structure Based Relatedness ........................................................ 11 3.4 Classification Model Construction ........................................................ 15 Chapter 4. System Architecture .................................................................. 16 Chapter 5. Experimental Results and Analysis ........................................... 18 5.1 Data Collection and Annotation ............................................................ 18 5.2 Evaluation of the Proposed Method ...................................................... 19 5.3 Comparisons with Alternative Methods…………………...………….22 5.4 Case Studies…………………………………………………………...25 Chapter 6. Conclusion and Future Work .................................................... 28 References ................................................................................................... 29 vi List of Figures Figure. 1: Facebook hacking tool advertised in Hack Forums………… .2 Figure. 2: Example of relatedness over threads in underground forum....4 Figure. 3: Network schema for HIN in our application ………………. .11 Figure. 4: Meta-paths (left) and meta-structures (right) on HIN………. 12 Figure. 5: System architecture ………………...….…………………….16 Figure. 6: Effectiveness of Laplacian score………………………….….22 Figure. 7: Parameter sensitivity evaluation.…………….……………... 22 Figure. 8: Scalability evaluation …………………….………………….24 Figure. 9: Stability evaluation………………………….……………….24 Figure. 10: Examples of cybercrime-suspected vs. benign threads…… 25 vii List of Tables TABLE I: The description of each matrix and its element…………. ….10 TABLE II: Performance indices for different methods………………....18 TABLE III: Evaluation of the proposed method………………………. 20 TABLE IV: Comparisons with other alternative methods……………...23 TABLE V: Distribution of detected cybercrime-suspected threads ……26 TABLE VI: Different topics discussed in the detected threads...........… 27 1 Chapter 1. Introduction 1.1 Background and Motivation As Internet becomes increasingly ubiquitous, computing devices connected to Internet have permeated all facets of people’s daily life (e.g., online shopping, social networking). Driven by the innovation centering on Internet ecosystem, the growth of e-commerce has been significantly increased in the overall sluggish economy [1]: worldwide e- commerce sales reached over $2.14 trillion in 2017 and are expected to increase to about $4 trillion in 2020 [2]. Though the Internet has become one of the most important drivers in the worldwide economy, it also provides an open and shared platform by dissolving the barriers so that everyone has opportunity to realize his/her innovations, which implies higher prospects for illicit profits at lower degrees of risk. That is, the Internet can virtually provide a natural and excellent platform for illegal Internet-based activities, commonly known as cybercrimes [3] (e.g., hacking, online scam, credit card and identity fraud, etc.). Cybercrime has become increasingly dependent on the online underground markets, especially underground forums, through which cybercriminals can not only acquisitive the tools, methods and ideas to commit cybercrimes, but also trade the illicit products (e.g., stolen credit cards), resources (e.g., website traffics), and services (e.g., Denial of Service (DDoS) attacks). For example, Fig. 1 shows a Facebook hacking tool advertised in an underground forum (i.e., Hack Forums). The emerging underground forums have enabled cybercriminals to realize considerable profits. For example, the estimated annual revenue for