U. C. Berkeley Dissertation

Total Page:16

File Type:pdf, Size:1020Kb

U. C. Berkeley Dissertation The Dark Net: De-Anonymization, Classification and Analysis by Rebecca Sorla Portnoff A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor David Wagner, Chair Professor Vern Paxson Professor David Bamman Spring 2018 The Dark Net: De-Anonymization, Classification and Analysis Copyright c 2018 by Rebecca Sorla Portnoff 1 Abstract The Dark Net: De-Anonymization, Classification and Analysis by Rebecca Sorla Portnoff Doctor of Philosophy in Computer Science University of California, Berkeley Professor David Wagner, Chair The Internet facilitates interactions among human beings all over the world, with greater scope and ease than we could have ever imagined. However, it does this for both well-intentioned and malicious actors alike. This dissertation focuses on these malicious persons and the spaces online that they inhabit and use for profit and plea- sure. Specifically, we focus on three main domains of criminal activity on the clear web and the Dark Net: classified ads advertising trafficked humans for sexual services, cy- ber black-market forums, and Tor onion sites hosting forums dedicated to child sexual abuse material (CSAM). In the first domain, we develop tools and techniques that can be used separately and in conjunction to group Backpage sex ads by their true author (and not the claimed author in the ad). Sites for online classified ads selling sex are widely used by human traffickers to support their pernicious business. The sheer quantity of ads makes manual exploration and analysis unscalable. In addition, discerning whether an ad is advertising a trafficked victim or an independent sex worker is a very difficult task. Very little concrete ground truth (i.e., ads definitively known to be posted by a trafficker) exists in this space. In the first chapter of this dissertation, we develop a machine learning classifier that uses stylometry to distinguish between ads posted by the same vs. different authors with 90% TPR and 1% FPR. We also design a linking technique that takes advantage of leakages from the Bitcoin mempool, blockchain and sex ad site, to link a subset of sex ads to Bitcoin public wallets and transactions. Finally, we demonstrate via a 4-week proof of concept using Backpage as the sex ad site, how an analyst can use these automated approaches to potentially find human traffickers. In the second domain, we develop machine learning tools to classify and extract information from cyber black-market forums. Underground forums are widely used by criminals to buy and sell a host of stolen items, datasets, resources, and crimi- nal services. These forums contain important resources for understanding cybercrime. However, the number of forums, their size, and the domain expertise required to under- stand the markets makes manual exploration of these forums unscalable. In the second 2 chapter of this dissertation, we propose an automated, top-down approach for analyz- ing underground forums. Our approach uses natural language processing and machine learning to automatically generate high-level information about underground forums, first identifying posts related to transactions, and then extracting products and prices. We also demonstrate, via a pair of case studies, how an analyst can use these automated approaches to investigate other categories of products and transactions. We use eight distinct forums to assess our tools: Antichat, Blackhat World, Carders, Darkode, Hack Forums, Hell, L33tCrew and Nulled. Our automated approach is fast and accurate, achieving over 80% accuracy in detecting post category, product, and prices. In the third domain, we develop a set of features for a principal component analysis (PCA) based anomaly detection system to extract producers (those actively abusing children) from the full set of users on Tor CSAM forums. These forums are visited by tens of thousands of pedophiles daily. The sheer quantity of users and posts make manual exploration and analysis unscalable. In the final chapter of this dissertation, we demonstrate how to extract producers from unlabeled, public forum data. We use four distinct forums to assess our tools; these forums remain unnamed to protect law enforcement investigative efforts. We have released our code written for the first two domains, as well as the proof of concept data from the first domain, and a sub-set of the labeled data from the second domain, allowing replication of our results.1 1As of the filing of this dissertation, our code and data are available online at https://github.com/rsportnoff/anti-trafficking-tools and https://evidencebasedsecurity.org/forums/ i To my daughter, Esther Eunmee, and all the other kids. Contents Contents ii List of Figures iv List of Tables v 1 Introduction 1 1.1 Classified Ads Selling Sex . .1 1.2 Cyber Black-Market Forums . .3 1.3 Child Sexual Abuse Material Forums . .4 2 Classified Ads Selling Sex 7 2.1 Introduction . .7 2.2 Related Work . .8 2.2.1 Sex Trafficking Online . .8 2.2.2 Bitcoin . 10 2.3 Datasets . 11 2.3.1 Backpage . 11 2.3.2 Bitcoin . 12 2.4 Author Classifier . 15 2.4.1 Labeling Ground Truth . 15 2.4.2 Models . 15 2.4.3 Validation Results . 16 2.5 Linking Ads to Bitcoin Transactions . 18 2.6 Grouping Ads by Owner . 20 2.6.1 Grouping by Shared Author . 21 2.6.2 Grouping by Persistent Bitcoin Identities . 22 2.7 Case Study . 23 2.7.1 Price Reconstruction . 23 2.7.2 Linking Backpage Ads to Bitcoin Transactions . 24 iii 2.7.3 Results . 26 2.8 Discussion . 33 2.9 Conclusion . 35 3 Cyber Black-Market Forums 36 3.1 Introduction . 36 3.2 Related Work . 37 3.2.1 Ecosystem Analysis . 37 3.2.2 Classification for Black-Markets . 38 3.2.3 NLP Tools . 39 3.3 Forum Datasets . 40 3.4 Automated Processing . 41 3.4.1 Type-of-Post Classification . 42 3.4.2 Product Extraction . 45 3.4.3 Price Extraction . 50 3.4.4 Currency Exchange Extraction . 52 3.5 Analysis . 54 3.5.1 End-to-end error analysis . 54 3.5.2 Broadly Characterizing a Forum . 54 3.5.3 Performance . 54 3.6 Case Studies . 55 3.6.1 Identifying Account Activity . 56 3.6.2 Currency Exchange Patterns . 57 3.7 Conclusion . 57 4 CSAM Forums 60 4.1 Introduction . 60 4.2 Related Work . 61 4.2.1 Anomaly Detection in Social Network Users . 61 4.2.2 CSAM . 62 4.3 Forum Datasets . 66 4.4 Extracting Producers . 67 4.4.1 Labeling Ground Truth . 68 4.4.2 Features . 68 4.4.3 Public Data . 68 4.4.4 Validation Results . 69 4.5 Conclusion . 75 5 Conclusion 76 A 87 iv List of Figures 2.1 Example of how a new transaction is added to Bitcoin's peer-to-peer network. 14 2.2 Example of a false positive case with possibly flawed ground truth. 18 2.3 Example of a false negative case where the ads appear in different sections. 19 2.4 Linking Ads to Bitcoin Transactions . 19 2.5 Shared Authors . 20 2.6 Persistent Bitcoin Identities . 22 3.1 Example post and annotations from Darkode, with one sentence per line. We underline annotated product tokens. The second exhibits our annotations of both the core product (mod DCIBot) and the method for obtaining that product (sombody). .................... 46 3.2 Number of transactions of each type observed in each forum. Numbers indicate how many posts had each type of transaction. If multiple cur- rencies appear for either side of the transaction, then we apportion a fraction to each option. Colors indicate... 58 4.1 Number of Producers in top % of Ranked Users, with varying Principal Components: Forum 1 . 70 4.2 Number of Producers in top % of Ranked Users, with varying Principal Components: Forum 2 . 71 4.3 Number of Producers in top % of Ranked Users, with varying Principal Components: Forum 3 . 72 4.4 Number of Producers in top % of Ranked Users, with varying Principal Components: Forum 4 . 73 4.5 Number of Producers in top % of Ranked Users, with varying Principal Components: All Forums . 74 v List of Tables 2.1 Backpage . 12 2.2 Classification accuracy for same vs. different author. 17 2.3 Distribution of Ads by Category during 4-week Case Study . 22 2.4 Price calculation correctness. 24 2.5 Transaction to ad linking correctness (Exact Match - EM, Multiple Match - MM). 26 2.6 Multiple match transaction to No. ads matched. 27 2.7 Number of Pairs of Authors Linked within SA Wallet using location and Jaccard . 27 2.8 SA Wallets with Zero FP by Location and Zero FP by Jaccard threshold 0.9...................................... 28 2.9 SA Wallets with between 0%-25% of pairs of authors linked by Jaccard threshold 0.9 . 29 2.10 Number of Transactions Linked to Exact Match within PBI single exact match Wallet using author, area code and location . 31 2.11 Number of Pairs of Authors Linked within MEM Wallet using author, area code and location for Exact Matches . 32 2.12 Final SA Cluster Statistics by Jaccard > 0.9 . 33 2.13 Final PBI Statistics . 33 3.1 General properties of the forums considered. 40 3.2 Number of labeled posts by class for type-of-post classification. 44 3.3 Classification accuracy on the buy vs. sell task. MFL refers to a baseline that simply returns the most frequent label in the training set. Note that the test sets for within-forum evaluation... 44 3.4 Classification accuracy on the buy vs.
Recommended publications
  • Key Player Identification in Underground Forums Over
    Session: Long - Graph Nerual Network II CIKM ’19, November 3–7, 2019, Beijing, China Key Player Identification in Underground Forums over Atributed Heterogeneous Information Network Embedding Framework Yiming Zhang, Yujie Fan, Liang Zhao Chuan Shi Yanfang Ye∗ Department of IST School of CS Department of CDS, Case Western George Mason University Beijing University of Posts and Reserve University, OH, USA VA, USA Telecommunications, Beijing, China ABSTRACT ACM Reference Format: Online underground forums have been widely used by cybercrimi- Yiming Zhang, Yujie Fan, Yanfang Ye, Liang Zhao, and Chuan Shi. 2019. nals to exchange knowledge and trade in illicit products or services, Key Player Identifcation in Underground Forums over Attributed Hetero- geneous Information Network Embedding Framework. In which have played a central role in the cybercriminal ecosystem. In The 28th ACM order to combat the evolving cybercrimes, in this paper, we propose International Conference on Information and Knowledge Management (CIKM ’19), November 3–7, 2019, Beijing, China. ACM, New York, NY, USA, 10 pages. and develop an intelligent system named iDetective to automate https://doi.org/10.1145/3357384.3357876 the analysis of underground forums for the identifcation of key players (i.e., users who play the vital role in the value chain). In 1 INTRODUCTION iDetective, we frst introduce an attributed heterogeneous informa- tion network (AHIN) for user representation and use a meta-path As the Internet has become one of the most important drivers in the based approach to incorporate higher-level semantics to build up global economy (e.g., worldwide e-commerce sales reached over relatedness over users in underground forums; then we propose $2.3 trillion dollars in 2017 and its revenues are projected to grow to $4.88 trillion dollars in 2021 [31]), it also provides an open and Player2Vec to efciently learn node (i.e., user) representations in shared platform by dissolving the barriers so that everyone has AHIN for key player identifcation.
    [Show full text]
  • Supervised Discovery of Cybercrime Supply Chains
    Mapping the Underground: Supervised Discovery of Cybercrime Supply Chains Rasika Bhalerao∗, Maxwell Aliapoulios∗, Ilia Shumailov†, Sadia Afroz‡, Damon McCoy∗ ∗ New York University, † University of Cambridge, ‡ International Computer Science Institute [email protected], [email protected], [email protected], [email protected], [email protected], Abstract—Understanding the sequences of processes needed expertise and connections. Machine learning has been used to perform a cybercrime is crucial for effective interventions. to automate some analysis of cybercrime forums, such as However, generating these supply chains currently requires time- identifying products that are bought and sold [26], however, consuming manual effort. We propose a method that leverages machine learning and graph-based analysis to efficiently extract using it to discover the trade relationship between products supply chains from cybercrime forums. Our supply chain de- has not been explored yet. tection algorithm can identify 33% and 42% relevant chains In this paper, we propose an approach to systematically within major English and Russian forums, respectively, showing identify relevant supply chains from cybercrime forums. Our improvements over the baselines of 11% and 5%, respectively. approach classifies the product category from a forum post, Our analysis of the supply chains demonstrates underlying connections between products and services that are potentially identifies the replies indicating that a user bought or sold the useful understanding and undermining the illicit activity of these product, then builds an interaction graph and uses a graph forums. For example, our extracted supply chains illuminate cash traversal algorithm to discover links between related product out and money laundering techniques and their importance to buying and subsequent selling posts.
    [Show full text]
  • Measuring Cybercrime As a Service (Caas) Offerings in a Cybercrime Forum
    Measuring Cybercrime as a Service (CaaS) Offerings in a Cybercrime Forum Ugur Akyazi Michel van Eeten Carlos H. Gañán Delft University of Technology Delft University of Technology Delft University of Technology [email protected] [email protected] [email protected] ABSTRACT could be purchased for around $200 and a month of SMS spoofing The emergence of Cybercrime-as-a-Service (CaaS) is a critical evo- for only $20. In addition to technical tools, it is also possible to hire lution in the cybercrime landscape. A key area of research on CaaS services, such as targeted account takeover [35]. The overall impact is where and how the supply of CaaS is being matched with demand. of CaaS is to make cybercrime more accessible to new criminals, Next to underground marketplaces and custom websites, cyber- as well as to support business models for advanced criminals via crime forums provide an important channel for CaaS suppliers to specialized business-to-business services [22]. attract customers. Our study presents the first comprehensive and A key area of research on CaaS is where and how the supply longitudinal analysis of types of CaaS supply and demand on a of these services is being matched with demand. This question is cybercrime forum. We develop a classifier to identify supply and critical for developing effective disruption strategies by law enforce- demand for each type and measure their relative prevalence and ment. Simply put: how do CaaS suppliers find their customers? One apply this to a dataset spanning 11 years of posts on Hack Forums, of the promises of CaaS is that it is accessible for new entrants, so one of the largest and oldest ongoing English-language cybercrime it cannot operate effectively within old and constrained model of forum on the surface web.
    [Show full text]
  • Hackers Gonna Hack: Investigating the Effect of Group Processes and Social Identities Within Online Hacking Communities
    Hackers gonna hack: Investigating the effect of group processes and social identities within online hacking communities Helen Thackray Thesis submitted for the degree of Doctor of Philosophy Bournemouth University October 2018 This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with its author and due acknowledgement must always be made of the use of any material contained in, or derived from, this thesis. 1 2 Hackers gonna hack: Investigating the effect of group processes and social identities within online hacking communities Helen Thackray Abstract Hacking is an ethically and legally ambiguous area, often associated with cybercrime and cyberattacks. This investigation examines the human side of hacking and the merits of understanding this community. This includes group processes regarding: the identification and adoption of a social identity within hacking, and the variations this may cause in behaviour; trust within in the social identity group; the impact of breaches of trust within the community. It is believed that this research could lead to constructive developments for cybersecurity practices and individuals involved with hacking communities by identifying significant or influencing elements of the social identity and group process within these communities. For cybersecurity, the positive influence on individual security approaches after the hacker social identity adoption, and the subsequent in-group or out-group behaviours, could be adapted to improve security in the work place context. For individuals involved in the communities, an increase in the awareness of the potential influences from their adopted social identities and from other members could help those otherwise vulnerable to manipulation, such as new or younger members.
    [Show full text]
  • Towards Automatic Discovery of Cybercrime Supply Chains
    Towards Automatic Discovery of Cybercrime Supply Chains Rasika Bhalerao∗, Maxwell Aliapoulios∗, Ilia Shumailovy, Sadia Afrozz, Damon McCoy∗ ∗ New York University, y University of Cambridge, z International Computer Science Institute, [email protected], [email protected], [email protected], [email protected], [email protected], Abstract—Cybercrime forums enable modern criminal en- [28], [29], [38]. However, we as a community do not have trepreneurs to collaborate with other criminals into increasingly any systematic methods of identifying these supply chains that efficient and sophisticated criminal endeavors. Understanding the enable more sophisticated and streamlined attacks. Currently connections between different products and services can often illuminate effective interventions. However, generating this un- analysts often manually investigate cybercrime forums to derstanding of supply chains currently requires time-consuming understand these supply chains, which is a time-consuming manual effort. process [19]. In this paper, we propose a language-agnostic method to In this paper, we propose, implement, and evaluate a automatically extract supply chains from cybercrime forum posts framework to systematically identify relevant supply chains and replies. Our supply chain detection algorithm can identify 36% and 58% relevant chains within major English and Russian present in cybercrime forums. Our framework is composed of forums, respectively, showing improvements over the baselines of several components which include automated
    [Show full text]
  • The Gateway Trojan
    THE GATEWAY TROJAN Volume 1, Version 1 TABLE OF CONTENTS About This Report ....................................................................................................................................1 Why This Malware?..................................................................................................................................2 The Basic Questions About RATs..........................................................................................................2 Different Breeds of RATs ........................................................................................................................5 Symantec’s Haley Subcategories of RATs .........................................................................................6 Dissecting a RAT .......................................................................................................................................7 Category II: Common RATs ....................................................................................................................9 Back Orifice ...........................................................................................................................................9 Bifrost ................................................................................................................................................. 10 Blackshades ....................................................................................................................................... 11 DarkTrack .........................................................................................................................................
    [Show full text]
  • Automatic Detection of Cybercrime-Suspected Threads in Online Underground Forums
    Graduate Theses, Dissertations, and Problem Reports 2018 Automatic Detection of Cybercrime-suspected Threads in Online Underground Forums Jian Liu Follow this and additional works at: https://researchrepository.wvu.edu/etd Recommended Citation Liu, Jian, "Automatic Detection of Cybercrime-suspected Threads in Online Underground Forums" (2018). Graduate Theses, Dissertations, and Problem Reports. 7204. https://researchrepository.wvu.edu/etd/7204 This Thesis is protected by copyright and/or related rights. It has been brought to you by the The Research Repository @ WVU with permission from the rights-holder(s). You are free to use this Thesis in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you must obtain permission from the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/ or on the work itself. This Thesis has been accepted for inclusion in WVU Graduate Theses, Dissertations, and Problem Reports collection by an authorized administrator of The Research Repository @ WVU. For more information, please contact [email protected]. Automatic Detection of Cybercrime-suspected Threads in Online Underground Forums Jian Liu Thesis submitted to the Benjamin M. Statler College of Engineering and Mineral Resources at West Virginia University in partial fulfillment of the requirements for the degree of Master of Science in Computer Science Yanfang Ye, Ph.D., Committee Chairperson Xin Li, Ph.D. Elaine M.
    [Show full text]
  • The Internet of Things in the Cybercrime Underground
    The Internet of Things in the Cybercrime Underground Stephen Hilt, Vladimir Kropotov, Fernando Mercês, Mayra Rosario, and David Sancho TREND MICRO LEGAL DISCLAIMER Contents The information provided herein is for general information and educational purposes only. It is not intended and should not be construed to constitute legal advice. The information contained herein may not be applicable to all 4 situations and may not reflect the most current situation. Nothing contained herein should be relied on or acted Introduction upon without the benefit of legal advice based on the particular facts and circumstances presented and nothing herein should be construed otherwise. Trend Micro reserves the right to modify the contents of this document 7 at any time without prior notice. The IoT in Underground Communities Translations of any material into other languages are intended solely as a convenience. Translation accuracy is not guaranteed nor implied. If any questions arise related to the accuracy of a translation, please refer to 40 the original language official version of the document. Any Case Studies on IoT Underground discrepancies or differences created in the translation are not binding and have no legal effect for compliance or Criminals enforcement purposes. Although Trend Micro uses reasonable efforts to include accurate and up-to-date information herein, Trend Micro 43 makes no warranties or representations of any kind as to its accuracy, currency, or completeness. You agree Predictions that access to and use of and reliance on this document and the content thereof is at your own risk. Trend Micro disclaims all warranties of any kind, express or implied.
    [Show full text]
  • The Internet of Things in the Cybercrime Underground
    The Internet of Things in the Cybercrime Underground Stephen Hilt, Vladimir Kropotov, Fernando Mercês, Mayra Rosario, and David Sancho TREND MICRO LEGAL DISCLAIMER Contents The information provided herein is for general information and educational purposes only. It is not intended and should not be construed to constitute legal advice. The information contained herein may not be applicable to all 4 situations and may not reflect the most current situation. Nothing contained herein should be relied on or acted Introduction upon without the benefit of legal advice based on the particular facts and circumstances presented and nothing herein should be construed otherwise. Trend Micro reserves the right to modify the contents of this document 7 at any time without prior notice. The IoT in Underground Communities Translations of any material into other languages are intended solely as a convenience. Translation accuracy is not guaranteed nor implied. If any questions arise related to the accuracy of a translation, please refer to 40 the original language official version of the document. Any Case Studies on IoT Underground discrepancies or differences created in the translation are not binding and have no legal effect for compliance or Criminals enforcement purposes. Although Trend Micro uses reasonable efforts to include accurate and up-to-date information herein, Trend Micro 43 makes no warranties or representations of any kind as to its accuracy, currency, or completeness. You agree Predictions that access to and use of and reliance on this document and the content thereof is at your own risk. Trend Micro disclaims all warranties of any kind, express or implied.
    [Show full text]
  • Towards a New Cyber Threat Actor Typology
    Mark de Bruijne, Michel van Eeten, Carlos Hernández Gañán, Wolter Pieters Towards a new cyber threat actor typology A hybrid method for the NCSC cyber security assessment Towards a new cyber threat actor typology A hybrid method for the NCSC cyber security assessment By Mark de Bruijne, Michel van Eeten, Carlos Hernández Gañán, Wolter Pieters Faculty of Technology, Policy and Management Delft University of Technology 1 Preface This report could not have been made without the help of a large number of people. We cannot mention all of these people by name, but our thanks extends to all of them. First of all, the researchers would like to thank all the interviewees, who were promised anonymity, for their precious time and valuable feedback. They have contributed a lot to the report and our understanding of cyber actors and the methods which can be used to classify them. We, furthermore, would like to extend these thanks to the members of the supervisory committee. The committee consisted of Prof. Stijn Ruiter (chair), drs. Olivier Hendriks, drs. Noortje Henrichs, dr. Jan Kortekaas, Prof. Eric Verheul, and drs. Wytske van der Wagen. We appreciated their critical and highly constructive feedback during the entire process. Needless to say, the usual disclaimer applies: The contributions from respondents or members of the supervisory committee do not mean that the respondents, members of the supervisory committee or these institutions automatically agree with the complete content of the report. Also, we would like to emphasize that the report does not necessarily reflect the opinion of or the Minister or the Ministry of Security and Justice.
    [Show full text]
  • Lord of the Flies
    LORD OF THE FLIES: AN OPEN-SOURCE INVESTIGATION INTO SAUD AL-QAHTANI Table of Contents Introduction .................................................................................................................................................................................................... 1 I. Key Findings...............................................................................................................................................................................................3 II. From Poet To Enforcer ........................................................................................................................................................................ 4 III. Leaked Contact Information Owned By Al-Qahtani ............................................................................................................8 Al-Qahtani Twitter Connected To [email protected] And Cell Number ....................................................................8 Royal Court Email Used To Provide Al-Qahtani Phone Number, Email ................................................................... 10 Gmail Linked To Same Al-Qahtani Phone Number And Email As Twitter .............................................................. 11 IV. Hack Forums Activity ....................................................................................................................................................................... 15 Scammed On Day One .....................................................................................................................................................................
    [Show full text]
  • Selling “Slaving”
    SELLING “SL AV ING” OUTING THE PRINCIPAL ENABLERS THAT PROFIT FROM PUSHING MALWARE AND PUT YOUR PRIVACY AT RISK JULY 2015 TABLE OF CONTENTS TABLE OF CONTENTS ..................................................................................................................................................I TABLE OF IMAGES ..........................................................................................................................................................II ABOUT THIS REPORT ..................................................................................................................................................1 EXECUTIVE SUMMARY .............................................................................................................................................3 Dirty Rats: How Hackers Are Peeking Into Your Bedroom ...............................................3 WHAT CAN RATTERS DO? .................................................................................................................................... 6 SELLING “SLAVING” ......................................................................................................................................................7 CHEAP, EASY-TO-USE MALWARE THAT PAYS FOR ITSELF............................................... 8 STORIES OF CRUELTY—“RATS” ON THE ATTACK .......................................................................10 YOUTUBE HAS A RAT PROBLEM................................................................................................................
    [Show full text]