Darknet Data Mining: a Canadian Cyber-Crime Perspective
Total Page:16
File Type:pdf, Size:1020Kb
Darknet Data Mining: A Canadian Cyber-Crime Perspective Edward Crowder Jay Lansiquot Institute of Technology and Institute of Technology and Advanced Learning Advanced Learning Sheridan College Sheridan College Oakville, Canada Oakville, Canada [email protected] [email protected] Abstract— Exploring the darknet can be a daunting task; Pacific region these same tools perceive an increase to this paper explores the application of data mining the darknet exacerbated cybercrime[19]. The project discusses the risks within a Canadian cybercrime perspective. Measuring activity associated with using the darknet as both a user and through marketplace analysis and vendor attribution has cybercrime analyst, followed by an outline of the proposed proven difficult in the past. Observing different aspects of the system design. darknet and implementing methods of monitoring and collecting data in the hopes of connecting contributions to the Considering benefits and threats that Tor may pose would darknet marketplaces to and from Canada. The significant be beneficial. There was an increase in the Tor network usage, findings include a small Canadian presence, measured the specifically in Canada[16] between 2019 and 2020. At a product categories, and attribution of one cross-marketplace minimum, this project may uncover why more Canadians turn vendor through data visualization. The results were made to the Tor network as shown in Fig. 1., and what the intended possible through a multi-stage processing pipeline, including usage is. data crawling, scraping, and parsing. The primary future works include enhancing the pipeline to include other media, such as web forums, chatrooms, and emails. Applying machine learning models like natural language processing or sentiment analysis could prove beneficial during investigations. Keywords— Darknet, Canada, Marketplace, Data Mining, Privacy, Threat Intelligence, Cybersecurity, Cybercrime I. INTRODUCTION To fully understand the threat landscape, you first must correctly identify and fully understand the threat model of an enterprise or country. This research project explores the darknet and its applications by means of data mining. The results include an analysis of current and past darknet marketplaces, a data model capable of further machine learning for indicators of compromise (IOC) analysis, and value analysis for identifying threats in the darknet. Presenting Fig. 1. Tor relay for Canadian users between 2019-01- a sample application that includes a web interface consisting 01 – 2020-06-27 [16] of organized threat information visualized for a qualified analyst to make strategic cyber decisions. II. RELATED WORK To maintain a common terminology, the darknet is a resource Web crawlers have been around since the early 1990's[10]. that cannot be accessed without The Onion Router (“TOR”) The most notable of all web crawler projects being S. Brin [1, 3]. The Tor Project, a 501(c)3 US nonprofit, advocates and L. Page's Google[11]. As the internet grew, it segregated human rights and the defense of a users privacy online through into multiple layers, known as the Clearnet, Deepnet, and free software and open networks[16]. There are many benefits Darknet[12]. Exploring the Darknet provides many benefits to the darknet, such as online anonymity and enhanced privacy[20]. However, in a recent survey of 25,229 general if done correctly and in a time-sensitive manner. internet users by the Centre for International Governance Researchers have found great value in extracting indicators Innovation (CIGI) that took place across North America, Latin for use in private companies, government, and personal America, Europe, the Middle East, Africa, and the Asia- protection [2, 4, 5, 6]. As a result, many large data sets such XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE as the Darknet Marketplace Archive (DNM) from 2011- Miner. D-Miner is a darknet "focused" web scraper that 2015 [7] are publicly available. However, it was decided not collects and parses out specific darknet marketplace features. to use the data sets provided for many reasons. First, with By utilizing JSON, Lawrence et al. gain the benefits of only 6 of the 89 DNMs remain accessible[3], it would be a indexing the data in Elasticsearch. Elasticsearch is a search better representation of current technologies used in new engine based on the apache lucence search engine. The darknet marketplaces to data-mine active markets instead of power of Elasticsearch allows Lawrence et al. to utilize referring to dead sites. features such as full-text search and REST API's[9]. Data Understanding the potential ethical, moral, and physical risks visualization is made possible through Kibana, an open- surrounding the darknet is also essential to keep in mind. source data visualization dashboard for Elasticsearch[14]. Martin et al. [1] explore the significant uncertainties regarding the ethical dimensions of crypto market research. A primary issue surrounding the Nunes et al. project was the Furthermore, there are so many different environments (e.g., use of anti-scraping technologies deployed on the darknet. web pages, chat rooms, e-mail) and that there are new ones The solution proposed was death-by-captcha (DBC), a paid continually emerging means that explicit [ethical] rules are service to solve captcha codes to automate the solution [3]. not possible [1]. Ethical problems are demonstrated by Hayes et al. [5] take a similar approach to their analysis by example, through the use of known ethical principals and identifying the vendors. They explore the use of Apple Script collaborations with others involved in the study of and the Maltego investigation platform to generate a cross- cryptomarkets. Martin et al. further discuss the risks and site threat actor connected graph[15]. Notably, the authors threats of assessment, geographical concerns, copyright outsourced the requirement of solving captcha to the analyst, issues, the effects on the public, and academic research and the extraction and enrichment process to Maltego’s built- considerations, such as determining national jurisdiction, in investigation transformers. Combined, this made a robust self-critical awareness of the potential for bias, and many framework for the manual analyst. However, it was not more. Their research concludes with open-ended questions scalable to the extent desired, and therefore, decided to for the researchers to consider metacognition regarding the continue with the use of Python and custom interfacing decisions within their project. options. Dittus et al.[4] performed a large scale systematic data collection of the darknet in mid-2017, which claimed to III. RISK ASSESSMENT cover 80% of the darknet. Their findings show that 70% of This project uses passive fieldwork, that is, only observing global trades are attributable to the "top five" countries: publicly available material that does not require direct USA, U.K., Australia, Germany, and Holland. Their research communication or response to possible nefarious actors shows Canada falls in sixth place within their findings. except for account verification. The alternative to this would Research also suggests that the darknet is not revolutionizing be active fieldwork, which would include participation this crime. However, it changes only the "last mile" and only within the darknet communities[21]. in high consumer countries, leaving old drug trafficking routes still intact.[4, 13]. The publishing of this research topic is a prime example of an ethical dilemma Martin et al. discussed. This paper's results could negatively influence the public funding surrounding the darknet drug trade because of it. The project aims to create a system where information is provided for a qualified analyst to weigh in from their experience and not overstate risks. Nunes et al. [2] present an operational system for cyber Fig. 2. Active Fieldwork exception for account threat intelligence gathering from various sites on the verification post on darknet marketplace using Darknet. Nunes et al. focused on malicious indicators such as username “Olaf” threat actor names, private sale of data, and executables that they utilized to fulfill their primary intelligence requirements for emerging threat detection. The creation of a focused web Identity and geolocation protection is made possible by crawler, as opposed to a generic web crawler, was required utilizing the Amazons Web Services (“AWS”) public cloud to collect a vast amount of data. Static processing was done as a technical security mechanism. If the crawler, scraper, or after mass collection to extract indicators of interest. other communication to the darknet were to be Specializing in cross-site connections, Nunes et al. created a compromised somehow, due to misconfiguration or other, connected graph depicting their indicator attributions to the the researcher's primary machines would remain safe. AWS underground threat actor profiles. Lawrence et al. [3] provided an additional hop from the tor circuits, among continue to work in this direction with their product, D- many other development benefits. Secondly, one nuance with operating this project within Canada is the law cURL is a command-line utility that supports a surrounding illicit images found on the darknet. The current comprehensive set of protocols and is the backbone for minimum penalty for possession of, or "accessing," child numerous applications. cURL, coupled with Python and sexual exploitation material is six months of Bash,