Towards Automatic Discovery of Cybercrime Supply Chains
Total Page:16
File Type:pdf, Size:1020Kb
Towards Automatic Discovery of Cybercrime Supply Chains Rasika Bhalerao∗, Maxwell Aliapoulios∗, Ilia Shumailovy, Sadia Afrozz, Damon McCoy∗ ∗ New York University, y University of Cambridge, z International Computer Science Institute, [email protected], [email protected], [email protected], [email protected], [email protected], Abstract—Cybercrime forums enable modern criminal en- [28], [29], [38]. However, we as a community do not have trepreneurs to collaborate with other criminals into increasingly any systematic methods of identifying these supply chains that efficient and sophisticated criminal endeavors. Understanding the enable more sophisticated and streamlined attacks. Currently connections between different products and services can often illuminate effective interventions. However, generating this un- analysts often manually investigate cybercrime forums to derstanding of supply chains currently requires time-consuming understand these supply chains, which is a time-consuming manual effort. process [19]. In this paper, we propose a language-agnostic method to In this paper, we propose, implement, and evaluate a automatically extract supply chains from cybercrime forum posts framework to systematically identify relevant supply chains and replies. Our supply chain detection algorithm can identify 36% and 58% relevant chains within major English and Russian present in cybercrime forums. Our framework is composed of forums, respectively, showing improvements over the baselines of several components which include automated methods based 13% and 36%, respectively. Our analysis of the automatically on supervised Natural Language Processing (NLP) techniques generated supply chains demonstrates underlying connections and a graph-traversal algorithm. Our approach classifies the between products and services within these forums. For example, product category from a forum post, identifies the replies the extracted supply chain illuminated the connection between hack-for-hire services and the selling of rare and valuable indicating that a user bought or sold a product, then builds ‘OG’ accounts, which has only recently been reported. The an interaction graph and uses a graph traversal algorithm to understanding of connections between products and services discover links of related product buying and subsequent selling exposes potentially effective intervention points. posts. Our approach is language agnostic and does not require Index Terms—Security, Cybercrime, Natural Language Pro- manual categorizations of products, an improvement over prior cessing work on product detection [33]. We used our end-to-end supply chain identification pipeline I. INTRODUCTION to analyze two publicly available cybercrime forums and are Cybercrime-as-a-Service lowers the barrier to entry for able to identify 36% and 58% relevant links in our English new cybercriminals by commoditizing different parts of at- language and Russian language forums, respectively. These tacks [37]. For example, without commoditization, a spammer are supply chain links where users are buying products that needs to find a way to send e-mails, acquire mailing lists, are likely used to facilitate subsequent product offerings (i.e., create storefront websites, contract with web hosting, register a user buying OSN reputation boosting services to groom domains, manage product fulfillment, accept online payments, accounts that are then sold to scammers) or users reselling and provide customer service. With commoditization, the products after they are no longer useful to their original owner. spammer can outsource different responsibilities to different This is an increase from our baselines of 13% and 36%, criminals specialized in one specific task. Cybercriminals often respectively. rely on underground cybercrime forums to establish these The main contributions of our paper are the following: arXiv:1812.00381v2 [cs.CR] 4 Dec 2018 trade relationships that can facilitate the exchange of illicit * We develop, implement and evaluate an automated goods and services. These cybercrime forums thus play a approach for discovering the cybercrime supply chain crucial role in increasing efficiency and promoting innovation (Section IV). Our method uses language agnostic NLP in the cybercrime ecosystem. However, these dependencies can methods and graph traversal to automate the discovery also provide opportunities to undermine a spammer at more of the supply chains. vulnerable points such as the online payment channel [25]. * We perform an analysis of our automatically generated The supply chain of a cybercrime can illuminate the se- supply chains to provide an understanding of how some quence of processes involved in the criminal activities. Prior commodity cybercrime products depend on other offer- work has shown that analyzing these supply chains can result ings within these forums (Section VI). in identifying weak points which could enable effective inter- * We distill our findings from the detected supply chains ventions [8]. There have been several studies exploring specific into several qualitative case studies (Section VII). These instance of these commoditized cybercrime offerings [4], [7], case studies highlight that we were efficiently able to dis- [40] and how some attacks can be more effectively undermined cover supplies chains exposing the connection between once their dependencies to other services are understood [18], the purchasing of hack-for-hire services and the selling of valuable online accounts. Despite this connection longitudinal data from eight structured online anonymous being present in the forums for years, it has only recently marketplaces over six years to understand the value change been discovered based on manual analysis [22]. This and commoditization of the criminal markets as “cybercrime- connection suggests a potentially more effective method as-a-service”. They concluded that commoditization in the of mitigating the theft of valuable online accounts by cybercrime market is still limited. Our work complements avoiding account authentication and recovery methods and extends this line of research by providing a method for based on mobile phone numbers. detecting connections between products. The rest of this paper is structured in the following way. Our approach goes beyond understanding the trust estab- Section II discusses background and related work in this area, lishment, organization, aggregate activity, and classification including past work which uses the same data. Section III performed in prior work. We use the results of our classifiers outlines the data used to validate our approach. Section IV to identify semantically meaningful forum interactions and outlines our approach and contributions in classification and automatically discover supply chains of the products that can supply chain discovery. We evaluate our work empirically improve our understanding of how these markets function in to demonstrate how our approach performs better than the practice. We analyzed the connections between products in baseline in Section V, and analyze the forums using our unstructured cybercrime forums and noticed mostly business- results in Section VI. Section VII outlines several real world to-business (we scope this to mean “sale to the trade” where scenarios where the generated supply chains add value to an the products being bought and sold often have no value except investigation. Section VIII identifies limitations and discusses as a building block to enable an attack) transactions. This the implication of our results. We conclude in Section IX. allows us to study the criminal-to-criminal supply chains that enable attacks. Some of these were previously studied from II. BACKGROUND AND RELATED WORK the direct attackers and victims’ perspectives, such as romance Cybercriminal forums provide a unique opportunity to un- scams [15]. Our new understanding of the underlying supply derstand how criminal markets operate. Criminals rely on these chains can illuminate different and potentially more effective forums to establish trade relationships and facilitate exchanges methods of undermining these threats [25]. of goods and information. The typical forum structure follows the subforum > thread > post > reply hierarchy. III. FORUMS A subforum typically pertains to a particular subject: for To evaluate our approach, we chose two popular forums: example, marketplace or introductions. Users then Antichat (Russian) and Hack Forums (English) (Table I). We can create threads within subforums where a thread is a chose these forums because they are large, publicly avail- collection of messages. Each thread will always contain a first able, and have been used to evaluate cybercrime analysis post, which in this paper, we will use interchangeably with methodologies in prior studies [10], [33]. We also have access product post. We will also use the term reply post to more recent data from these two forums and other large to refer to the posts which come after the product post in each cybercrime forums through an agreement with a security thread. This is a key part of detecting relevant supply chains company. However, this data is private and our agreement based on public information. prevents us from publicly sharing these datasets. Thus, in this Several prior works studied the organization of the cyber- paper we chose to limit our evaluation to public datasets so crime forums [1], [24], [26], [30], profiling key actors [31], as to enable reproducibility of our major findings.