
Automatic Detection of Click Fraud in Online Advertisements by Abhishek Agarwal, M.S. A Thesis In COMPUTER SCIENCE Submitted to the Graduate Faculty of Texas Tech University in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE Approved Dr. Rattikorn Hewett Chair of Committee Dr. Sunho Lim Dr. Eunseog Youn Peggy Gordon Miller Dean of the Graduate School August, 2012 Texas Tech University, Abhishek Agarwal, August 2012 ACKNOWLEDGMENTS I would like to thank Dr. Rattikorn Hewett for her guidance throughout my Master‟s research. Her in-depth knowledge of the subject, focus on clarity and quality of work has helped me learn skills which will help me for the rest of my career. Her guidance on the research is invaluable and has helped me cope with the challenges I faced throughout the course of this work. ii Texas Tech University, Abhishek Agarwal, August 2012 TABLE OF CONTENTS Acknowledgments ........................................................................................................ ii Abstract ......................................................................................................................... v List of Tables ............................................................................................................... vi List of Figures ............................................................................................................. vii Motivation ..................................................................................................................... 1 Contributions ...................................................................................................... 5 Background Work ........................................................................................................ 7 Preliminaries ................................................................................................................. 9 Terms.................................................................................................................. 9 Problem Statement ........................................................................................... 10 Assumptions ..................................................................................................... 10 Mathematical Theory of Evidence ................................................................... 11 Mass Functions ............................................................................................... 12 Combination Rule ........................................................................................... 14 Proposed Dempster Shafer Theory for Click Fraud Detection ............................. 16 The Core Element of Dempster Shafer Theory................................................ 16 Mass functions for Click Fraud Detection ....................................................... 17 Evidence 1: Number of clicks on the ad ......................................................... 17 Evidence 2: Time spent in browsing ............................................................... 18 Evidence 3: Ad-Visit after non-ad visit ............................................................ 18 Evidence 4: Time of Click ............................................................................... 19 Evidence 5: Place of origin of click ................................................................. 20 Evidence 6: Creating of membership.............................................................. 21 Evidence 7: Adding a product in shopping cart .............................................. 22 Data Set & Illustration .............................................................................................. 24 Data Description............................................................................................... 24 Example of belief computation using mass function and combination ........... 28 Evaluation ................................................................................................................... 34 Case Study 1 ..................................................................................................... 34 Case Study 2 ..................................................................................................... 45 iii Texas Tech University, Abhishek Agarwal, August 2012 Discussion & Conclusions .......................................................................................... 48 Bibliography ............................................................................................................... 50 iv Texas Tech University, Abhishek Agarwal, August 2012 ABSTRACT Increasing advancement, access and availability of the Internet Technology have intensified the growth of Internet users over the last decade. This has made online advertising a popular venue for many companies to market their products and services. Today, online advertisement is one of the most important sources of revenues that impact the economy of many large enterprises. In online advertisement, an advertiser pays a broker (e.g., Google, Yahoo), who normally has a search engine, to post its online advertisement, which can be on any appropriate publisher site. The publisher earns revenues from the broker for each click on the advertisement posted on its site, while the advertiser will be charged. Thus, when an excessive number of clicks occur, this can quickly dry up the fund of a rival company and drive it out of the competing advertisement. At the same time, each click adds revenue to the publisher. This motivates click frauds, which refer to malicious acts to create fraudulent clicks with the intent to increase revenue or drive away competitors without real interest in the products or services being advertised. Identifying click frauds is a difficult problem because of the dynamic nature of the click behaviors, some of which are generated by humans and some are by automated software called bots. There have been previous work attempting to identify click frauds using various techniques but they tend to be limited by the types of the data, the way they are processing or assumptions that are not always achievable. This thesis presents an approach to automatically detecting click frauds in online advertising. The approach uses a mathematical theory of evidence to estimate the likelihood of a click whether it is fraud or genuine using web log data of a user‟s activities on the advertiser‟s website. One advantage of the proposed approach is the fact that the likelihood can be computed for each incoming click and thus it gives an online computation of the belief that fits well with the dynamic behaviors of users. The thesis describes the approach and evaluates its validity using two real-world case studies. We believe the approach is general in that it can be applied to any scenario. v Texas Tech University, Abhishek Agarwal, August 2012 LIST OF TABLES 4.1 Fraud certification rules ....................................................................... 23 5.1 Sample log data .................................................................................... 25 5.2 Input from server log ............................................................................ 28 5.3 Coefficient values................................................................................. 29 5.4 Mass function beliefs for illustrated example ...................................... 31 6.1 Computed belief values for Case Study 1 ............................................ 43 6.2 Computed belief values for first IP ...................................................... 46 6.3 Computed belief values for second IP ................................................. 46 6.4 Computed belief values for third IP ..................................................... 46 vi Texas Tech University, Abhishek Agarwal, August 2012 LIST OF FIGURES 1.1 % change of revenue for advertising media (GeekWire, 2012) ............. 1 1.2 Google‟s revenue source distribution in 2011 (Google Earnings Report, 2011) ......................................................................................... 2 1.3 Scenario before click fraud occurred ..................................................... 3 1.4 Scenario after click fraud occurred ........................................................ 4 4.1 Click fraud detection framework using D-S theory ............................. 16 5.1 Legends for timeline diagram .............................................................. 27 5.2 Timeline diagram sample data in Table 5.1 ......................................... 27 5.3 Timeline diagram for Table 5.2 ........................................................... 28 5.4 Combined belief of fraud for input in Figure 5.3 ................................. 33 6.1 Timeline input for Case Study 1 .......................................................... 34 6.2 Belief of fraud from mass function 1 ................................................... 36 6.3 Belief of ~fraud from mass function 2 ................................................. 37 6.4 Belief of ~fraud from mass function 3 ................................................. 38 6.5 Belief of fraud from mass function 4 ................................................... 39 6.6 Belief of fraud from mass function 5 ................................................... 40 6.7 Belief of ~fraud from mass function 6 ................................................. 41 6.8 Belief of ~fraud from mass function 7 ................................................
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages58 Page
-
File Size-