Key Player Identification in Underground Forums Over
Total Page:16
File Type:pdf, Size:1020Kb
Session: Long - Graph Nerual Network II CIKM ’19, November 3–7, 2019, Beijing, China Key Player Identification in Underground Forums over Atributed Heterogeneous Information Network Embedding Framework Yiming Zhang, Yujie Fan, Liang Zhao Chuan Shi Yanfang Ye∗ Department of IST School of CS Department of CDS, Case Western George Mason University Beijing University of Posts and Reserve University, OH, USA VA, USA Telecommunications, Beijing, China ABSTRACT ACM Reference Format: Online underground forums have been widely used by cybercrimi- Yiming Zhang, Yujie Fan, Yanfang Ye, Liang Zhao, and Chuan Shi. 2019. nals to exchange knowledge and trade in illicit products or services, Key Player Identifcation in Underground Forums over Attributed Hetero- geneous Information Network Embedding Framework. In which have played a central role in the cybercriminal ecosystem. In The 28th ACM order to combat the evolving cybercrimes, in this paper, we propose International Conference on Information and Knowledge Management (CIKM ’19), November 3–7, 2019, Beijing, China. ACM, New York, NY, USA, 10 pages. and develop an intelligent system named iDetective to automate https://doi.org/10.1145/3357384.3357876 the analysis of underground forums for the identifcation of key players (i.e., users who play the vital role in the value chain). In 1 INTRODUCTION iDetective, we frst introduce an attributed heterogeneous informa- tion network (AHIN) for user representation and use a meta-path As the Internet has become one of the most important drivers in the based approach to incorporate higher-level semantics to build up global economy (e.g., worldwide e-commerce sales reached over relatedness over users in underground forums; then we propose $2.3 trillion dollars in 2017 and its revenues are projected to grow to $4.88 trillion dollars in 2021 [31]), it also provides an open and Player2Vec to efciently learn node (i.e., user) representations in shared platform by dissolving the barriers so that everyone has AHIN for key player identifcation. In Player2Vec, we frst map the constructed AHIN to a multi-view network which consists of opportunity to realize his/her innovations, which implies higher multiple single-view attributed graphs encoding the relatedness prospects for illicit profts at lower degrees of risk. That is, the over users depicted by diferent designed meta-paths; then we em- Internet can virtually provide a natural and excellent platform for ploy graph convolutional network (GCN) to learn embeddings of illegal Internet-based activities, commonly known as cybercrimes each single-view attributed graph; later, an attention mechanism is (e.g., hacking, online scam, credit card fraud). Cybercrimes have designed to fuse diferent embeddings learned based on diferent become increasingly dependent on the online underground forums, single-view attributed graphs for fnal representations. Comprehen- through which cybercriminals can exchange knowledge (including sive experiments on the data collections from diferent underground ideas, methods and tools) and trade in illicit products (e.g., malware, forums (i.e., Hack Forums, Nulled) are conducted to validate the ef- stolen credit cards) or services (e.g., hacking services, bogus Ama- zon reviews). The emerging underground forums, such as Hack fectiveness of iDetective in key player identifcation by comparisons with alternative approaches. Forums [14], Nulled [28], and Black Hat World [4], have enabled cybercriminals to realize considerable profts. For example, the estimated annual revenue for an individual credit card steal orga- CCS CONCEPTS nization was $300 millions [26]; it’s also revealed that a group of • Security and privacy → Web application security; • Net- cybercriminals could proft $864 millions per year by renting out works → Online social networks; • Computing methodologies the DDoS attacks [20]. → Machine learning algorithms. Underground forums, which provide the platforms for cybercrim- inals to exchange knowledge and trade in illicit products or services KEYWORDS that facilitate all stages of cybercrimes, have played a central role in Underground forums; key player identifcation; attributed hetero- the cybercriminal ecosystem. As one of the most prevalent under- geneous information network; network embedding ground forums, Hack Forums consists of 640,820 registered users with 3,621,989 posted threads containing 790,286 illicit products or ∗Corresponding author: [email protected] services. We here use Hack Forums as a showcase to investigate the proft model and monetization process. As shown in Figure Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed 1, the participants in the value chain can be categorized into dif- for proft or commercial advantage and that copies bear this notice and the full citation ferent groups according to the roles they play: (1) Key players: on the frst page. Copyrights for components of this work owned by others than ACM This group of users play a vital role in the value chain, as they are must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specifc permission and/or a the “decathlon” who are capable of exploiting and disseminating fee. Request permissions from [email protected]. vulnerabilities, developing and testing malicious tools, selling and CIKM ’19, November 3–7, 2019, Beijing, China monetizing illicit products or services. For example, as shown in © 2019 Association for Computing Machinery. ACM ISBN 978-1-4503-6976-3/19/11...$15.00 Figure 1, we found that a Hack Forums user “Ban***s” (we here https://doi.org/10.1145/3357384.3357876 anonymize his user name) frst ○1 analyzed the market demand 549 Session: Long - Graph Nerual Network II CIKM ’19, November 3–7, 2019, Beijing, China for social media account hacking services, then ○2 - ○3 devised a to combat the evolving cybercrimes. Based on the large-scale method to develop cracking tools exploiting a cookie vulnerabil- data collected from diferent underground forums (i.e., Hack Fo- ity to perform brute-force attacks on social media accounts, later rums, Nulled) and the pre-labeled ground-truth, comprehensive ○4 purchased some compromised Facebook accounts to test his experimental studies are performed to validate the efectiveness developed tools, and thus ○5 - ○7 sold and monetized his hacking of iDetective in key player identifcation by comparisons with services. We also found that “Ban***s” shared his developed crack- baselines. Although we use underground forums as a showcase, ing tools with other Hack Forums users for free, who could further the proposed method and the developed system can be easily use it for profts. The skilled individuals like “Ban***s” are heavily expanded to other social media platforms. relied upon by peers within the communities (e.g., novice hackers or newcomers can gain knowledge from them to cultivate their own specialties). (2) Non-key players: Other users including vendors, buyers, victims, technical enthusiasts, administrators and modera- tors are categorized as non-key player. Considering the vital role of key players play in the value chain (i.e., their vulnerability ex- ploiting capability, technical skills, cashing channel, and infuence on others), it is important to identify key players in underground forums to facilitate the deployment of efective countermeasures to disrupt the illicit activities. Toward this goal, human analysts need to continually spend a multitude of time to keep the latest statuses and variances of the activities in underground forums under obser- vation. This calls for novel tools and methodologies to automate the identifcation of key players to enable the law enforcement and security practitioners to devise efective interventions. To address the above challenges, in this paper, we design and de- velop an intelligent system called iDetective to automate the analysis of underground forums for key player identifcation. In iDetective, we frst introduce an attributed heterogeneous information network (AHIN) [32] to represent the rich relationship among users, threads, replies, comments and sections, and then use a meta-path based approach [33] to incorporate higher-level semantics to build up relatedness over users in underground forums. Then, we propose Player2Vec to efciently learn node (i.e., underground forum user) representations in AHIN for key player identifcation. In Player2Vec, (1) we frst map the constructed AHIN to a multi-view network which consists of K single-view attributed graphs encoding the re- latedness over users depicted by K designed meta-paths; and then (2) we employ graph convolutional network (GCN) to learn embed- dings of each single-view attributed graph; later, (3) an attention Figure 1: Participants in the value chain in Hack Forums. mechanism is designed to fuse diferent embeddings learned based The rest of the paper is organized as follows. Section 2 presents on diferent single-view attributed graphs for fnal representations. the system overview of iDetective. Section 3 introduces our proposed The major contributions of our work are summarized as follows: method in detail. Based on the large-scale data collections from diferent underground forums and the pre-labeled ground-truth, • We propose a novel yet natural feature representation to depict Section 4 systematically evaluates the performance