
UNIVERSITY OF CALIFORNIA, SAN DIEGO Understanding URL Abuse for Profit A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science by Neha Chachra Committee in charge: Stefan Savage, Co-Chair Geoffrey M. Voelker, Co-Chair James H. Fowler Kirill Levchenko Lawrence K. Saul 2015 Copyright Neha Chachra, 2015 All rights reserved. The Dissertation of Neha Chachra is approved and is acceptable in quality and form for publication on microfilm and electronically: Co-Chair Co-Chair University of California, San Diego 2015 iii DEDICATION To mom for instilling a love of science in me. iv TABLE OF CONTENTS Signature Page . iii Dedication . iv Table of Contents . v List of Figures . vii List of Tables . viii Acknowledgements . ix Vita................................................................. xii Abstract of the Dissertation . xiv Chapter 1 Introduction . 1 Chapter 2 Using Crawling to Study Large-Scale Fraud on the Web . 5 2.1 Introduction . 5 2.2 Related Work . 7 2.3 Architecture . 8 2.3.1 Selena . 10 2.3.2 Oliver-I . 10 2.3.3 Oliver-II . 12 2.3.4 Stallone . 13 2.3.5 Charlotte . 13 2.4 Browser Instrumentation using Custom Extensions . 16 2.5 Responding to Deterrence. 19 2.6 Summary . 21 2.7 Acknowledgements . 22 Chapter 3 Characterizing Affiliate Marketing Abuse . 23 3.1 Introduction . 23 3.2 Background . 25 3.3 Methodology . 28 3.3.1 Identifying Affiliate URLs and Cookies . 29 3.3.2 User Study. 29 3.3.3 Crawling . 30 3.3.4 Browser Extension Analysis . 33 3.4 Results . 33 3.4.1 Networks Affected by Cookie-Stuffing . 34 v 3.4.2 Prevalence of Cookie-Stuffing Techniques . 36 3.4.3 Fraudulent Browser Extensions . 40 3.4.4 Prevalence of Affiliate Marketing . 44 3.5 Summary . 45 3.6 Acknowledgements . 47 Chapter 4 Characterizing Domain Abuse and the Revenue Impact of Blacklisting 48 4.1 Introduction . 49 4.2 Background . 51 4.3 Data Sets . 54 4.3.1 Authenticity and Ethics . 55 4.3.2 GlavMed and SpamIt . 56 4.3.3 URIBL . 57 4.3.4 Spam Feeds . 57 4.4 Domain Abuse . 58 4.4.1 Overall Observations . 61 4.4.2 Advertising Vectors . 63 4.4.3 Infrastructure Domains . 66 4.4.4 Purchased Traffic . 72 4.5 Blacklisting . 74 4.5.1 Blacklisting Speed . 75 4.5.2 Coverage . 76 4.5.3 Blacklisted Resource . 77 4.5.4 Blacklisting Penalty . 79 4.6 Discussion . 79 4.6.1 A Simple Revenue Model . 80 4.6.2 Changing Blacklisting Penalty . 82 4.6.3 Increasing Coverage . 83 4.7 Related Work . 84 4.8 Summary . 85 4.9 Acknowledgements . 86 Chapter 5 Conclusion . 87 5.1 Dissertation Summary . 87 5.2 Future Directions and Final Thoughts . 88 Bibliography . 91 vi LIST OF FIGURES Figure 2.1. System design for Oliver-I, the second version of the Web crawler built in 2010. 11 Figure 2.2. System design for the proof-of-concept crawler, Charlotte, built in 2015. 14 Figure 3.1. Different actors and revenue flow in the affiliate marketing ecosys- tem. The left half of the figure depicts a potential customer receiv- ing an affiliate cookie, while the right half shows the use of the affiliate cookie to determine payout upon a successful transaction. 25 Figure 3.2. Stuffed cookie distribution for top 10 categories of impacted mer- chants. 35 Figure 4.1. Revenue from clicks on different kinds of referrers. 65 Figure 4.2. Spammers seamlessly switch from one free hosting site to another in the face of takedowns. 69 Figure 4.3. Revenue of domains before and after blacklisting. Note that the x-axis is non-linear. 75 Figure 4.4. The highest cost of domain a spammer can afford (y-axis) against the time delay (x-axis) in blacklisting. 82 vii LIST OF TABLES Table 2.1. Different versions of the Web crawler we built for studying fraudu- lent ecosystems. 9 Table 2.2. The table shows some of the supported features for interacting with Web pages and the corresponding challenges we faced. 16 Table 3.1. Examples of affiliate.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages113 Page
-
File Size-