Auction Fraud Detection Using SVM-Based Supervised Learning

AUCTION SHILL DETECTION FRAMEWORK BASED ON SVM A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements For the Degree of Master of Science in Computer Science University of Regina By Swati Ganguly Regina, Saskatchewan December, 2016 Copyright 2016: Swati Ganguly UNIVERSITY OF REGINA FACULTY OF GRADUATE STUDIES AND RESEARCH SUPERVISORY AND EXAMINING COMMITTEE Swati Ganguly, candidate for the degree of Master of Science in Computer Science, has presented a thesis titled, Auction Shill Detection Framework Based on SWM, in an oral examination held on December 9, 2016. The following committee members have found the thesis acceptable in form and content, and that the candidate demonstrated satisfactory knowledge of the subject material. External Examiner: Dr. Eman Almehdawe, Faculty of Business Administration Supervisor: Dr. Samira Sadaoui, Department of Computer Science Committee Member: Dr. Malek Mouhoub, Department of Computer Science Committee Member: Dr. Lisa Fan, Department of Computer Science Chair of Defense: Dr. Yang Zhao, Department of Mathematcis and Statistics ABSTRACT Online auctioning has attracted serious in-auction fraud, such as shill bidding, given the huge amount of money involved and the anonymity of users. Due to the fact that shill bidding is difficult to detect as well as to prove, very few researchers have been successful in designing online shill detection systems that can be adopted by auction sites. We introduce an efficient SVM-based two-phased In-Auction Fraud Detection (IAFD) model. This supervised model is first trained offline for identifying ‘Normal’ and ‘Suspicious’ bidders. For this process, we identify a collection of the most relevant fraud classification features rather than uncertain or general features, like feedback ratings. The model then can be launched online at the end of the bidding period and before the auction is finalized to detect suspicious bidders and redirect for further investigation. This will be beneficial for other legitimate bidders who otherwise might be victimized if an infected auction is finalized and payment done. We propose a robust process to build the optimal IAFD model, which comprises of data cleaning, scaling, clustering, labeling and sampling, as well as learning via SVM. Since labelled auction data are lacking and unavailable, we apply hierarchical clustering and our own labelling technique to generate a high-quality training dataset. We utilize a hybrid method of over-sampling and under- sampling which proved to be more effective in solving the issue of highly imbalanced fraud datasets. Numerous pre-processing and classification experiments are carried out using different functions in Weka toolkit, firstly to verify their applicability with respect to the training dataset and secondly to determine how these functions are impacting the i model performance. Once the final model is built incorporating the relevant functions, this model is tested with commercial auction data from eBay to detect shill bidders. The classification results exhibit excellent performance in terms of detection and false alarm rates. Also when compared to other SVM-based fraud detection systems, our model outperforms the outcomes of those systems. ii ACKNOWLEDGEMENT I take this opportunity to express my heartfelt gratitude towards my supervisor Dr. Samira Sadaoui for giving me a chance to work as a research assistant under her guidance and complete my thesis successfully. Her profound knowledge and careful supervision have acted as invaluable support factors in motivating me to pursue the best during the two years journey of my graduation. I thank her for the generous financial support which helped me in achieving my goals. I deeply appreciate all the academic support I received from the Faculty of Graduate Sciences and Research and Department of Computer Science. I would like to specially thank Dr. Malek Mouhoub, Dr. Yiyu Yao and Dr. Lisa Fan for teaching me some interesting and relevant courses which helped me in developing a strong hold in the topics of Data Mining, Machine Learning and Artificial Intelligence. I would like to express my deepest regards for all the valuable remarks I received from my fellow graduate students and friends to enhance my knowledge. It’s natural to have supportive parents but it’s a blessing to have an equally supportive husband. I thank Homarghya Homroy for being the most caring and motivating husband. Above all, who I am today is solely because of the constant support and sacrifices made by my parents and my elder brother. I owe all my success and achievements to them and thank God to bestow me with such a loving family. iii TABLE OF CONTENTS ABSTRACT .................................................................................................................. i ACKNOWLEDGEMENT .......................................................................................... iii TABLE OF CONTENTS ............................................................................................ iv LIST OF TABLES ...................................................................................................... vi LIST OF FIGURES .................................................................................................. viii Introduction ................................................................................................................. 1 1.1 Problem Statement and Motivation ................................................................ 1 1.2 Research Contribution ................................................................................... 4 1.3 Thesis Organization ....................................................................................... 5 Overview of Auction Fraud ......................................................................................... 7 2.1 Types of Auction Fraud ................................................................................. 9 2.2 In-Auction Fraud: Shill Bidding .................................................................. 11 2.3 Conclusion .................................................................................................. 13 Review of SVM Methodology .................................................................................... 14 3.1 Foundations of SVM ................................................................................... 14 3.2 Benefits of SVM ......................................................................................... 26 3.3 Performance Evaluation Metrics for Classification ...................................... 28 3.4 Conclusion .................................................................................................. 31 Related Works ........................................................................................................... 32 4.1 Machine Learning for Auction Fraud Detection ........................................... 33 4.2 SVM in Fraud Detection Area ..................................................................... 36 4.3 SVM and Imbalanced Data .......................................................................... 40 4.4 Conclusion .................................................................................................. 41 Overview of In-Auction Fraud Detection Model ...................................................... 43 5.1 Phasing of the Model ................................................................................... 43 5.2 SVM and the Model .................................................................................... 46 5.3 An Integrated SVM Tool for IAFD.............................................................. 48 5.4 Conclusion .................................................................................................. 51 Building Training Dataset for IAFD ......................................................................... 52 6.1 Selection of Classification Features ............................................................. 53 iv 6.2 Construction of Training Dataset ................................................................. 59 6.3 Labeling Data through Clustering ................................................................ 62 6.4 Balancing Data through Sampling ............................................................... 64 6.5 Conclusion .................................................................................................. 68 A Robust Classification Process for In-Auction Fraud Detection............................ 69 7.1 Kernel Selection for IAFD Model................................................................ 69 7.2 Cross-validation in IAFD Model ................................................................. 71 7.3 Parameter Tuning in Weka .......................................................................... 73 7.4 Selection of Performance Metrics for IAFD................................................. 75 7.5 Conclusion .................................................................................................. 77 Experiments and Evaluation ..................................................................................... 78 8.1 Commercial Auction Data ........................................................................... 78 8.2 Pre-processing of the Training Data ............................................................. 80 8.3 Data Clustering ..........................................................................................

Load more