Applying Simulation to the Problem of Detecting Financial Fraud
Total Page:16
File Type:pdf, Size:1020Kb
OF DETECTING FINANCIAL FRAUD OF DETECTING FINANCIAL APPLYING SIMULATION TO THE PROBLEM TO SIMULATION APPLYING ABSTRACT This thesis introduces a financial simulation The resulting behaviour of the agents generate model covering two related financial domains: a synthetic log of all transactions as a result of Mobile Payments and Retail Stores systems. the simulation. This synthetic data can be used to APPLYING SIMULATION TO further advance fraud detection research, without The problem we address in these domains is diffe- leaking sensitive information about the underly- THE PROBLEM OF DETECTING rent types of fraud. We limit ourselves to isolated ing data or breaking any non-disclose agreements. cases of relatively straightforward fraud. However, FINANCIAL FRAUD in this thesis the ultimate aim is to introduce our Using statistics and social network analysis (SNA) approach towards the use of computer simu- on real data we calibrated the relations between lation for fraud detection and its applications in our agents and generate realistic synthetic data financial domains. Fraud is an important problem sets that were verified against the domain and that impact the whole economy. Currently, there validated statistically against the original source. is a lack of public research into the detection of We then used the simulation tools to model com- fraud. One important reason is the lack of tran- mon fraud scenarios to ascertain exactly how saction data which is often sensitive. To address Edgar Alonso Lopez-Rojas effective are fraud techniques such as the simplest this problem we present a mobile money Payment form of statistical threshold detection, which is Simulator (PaySim) and Retail Store Simulator perhaps the most common in use. The preliminary (RetSim), which allow us to generate synthe- results show that threshold detection is effecti- tic transactional data that contains both: normal ve enough at keeping fraud losses at a set level. customer behaviour and fraudulent behaviour. This means that there seems to be little economic These simulations are Multi Agent-Based Simula- room for improved fraud detection techniques. tions (MABS) and were calibrated using real data We also implemented other applications for the from financial transactions. We developed agents simulator tools such as the set up of a triage model that represent the clients and merchants in Pay- and the measure of cost of fraud. This showed Sim and customers and salesmen in RetSim. The to be an important help for managers that aim to normal behaviour was based on behaviour prioritise the fraud detection and want to know observed in data from the field, and is codified in how much they should invest in fraud to keep the the agents as rules of transactions and interaction loses below a desired limit according to different between clients and merchants, or customers and experimented and expected scenarios of fraud. salesmen. Some of these agents were intentionally designed to act fraudulently, based on observed Edgar Alonso Lopez-Rojas patterns of real fraud. We introduced known signatures of fraud in our model and simulations to test and evaluate our fraud detection methods. Blekinge Institute of Technology Doctoral Dissertation Series No. 2016:06 2016:06 ISSN: 11653-2090 Department of Computer Science and Engineering 2016:06 ISBN: 978-91-7295-329-1 Applying Simulation to the Problem of Detecting Financial Fraud Edgar Alonso Lopez-Rojas Blekinge Institute of Technology Doctoral Dissertation Series No 2016:06 Applying Simulation to the Problem of Detecting Financial Fraud Edgar Alonso Lopez-Rojas Doctoral Dissertation in Computer Science Department of Computer Science and Engineering Blekinge Institute of Technology SWEDEN 2016 Edgar Alonso Lopez-Rojas Department of Computer Science and Engineering Publisher: Blekinge Institute of Technology SE-371 79 Karlskrona, Sweden Printed by Lenanders Grafiska, Kalmar, 2016 ISBN: 978-91-7295-329-1 ISSN 1653-2090 urn:nbn:se:bth-12932 “Let’s take flight simulation as an example. If you’re trying to train a pilot, you can simulate almost the whole course. You don’t have to get in an airplane until late in the process.” Roy Romer Abstract This thesis introduces a financial simulation model covering two related financial domains: Mobile Payments and Retail Stores systems. The problem we address in these domains is different types of fraud. We limit ourselves to isolated cases of relatively straightforward fraud. However, in this thesis the ultimate aim is to introduce our approach towards the use of computer simulation for fraud detection and its applications in financial domains. Fraud is an important problem that impact the whole economy. Currently, there is a lack of public research into the detection of fraud. One important reason is the lack of transaction data which is often sensitive. To address this problem we present a mobile money Payment Simulator (PaySim) and Retail Store Simulator (RetSim), which allow us to generate synthetic transactional data that contains both: normal customer behaviour and fraudulent behaviour. These simulations are Multi Agent-Based Simulations (MABS) and were calibrated using real data from financial transactions. We developed agents that represent the clients and merchants in PaySim and customers and salesmen in RetSim. The normal behaviour was based on behaviour observed in data from the field, and is codified in the agents as rules of transactions and interaction between clients and merchants, or customers and salesmen. Some of these agents were intentionally designed to act fraudulently, based on observed patterns of real fraud. We introduced known signatures of fraud in our model and simulations to test and evaluate our fraud detection methods. The resulting behaviour of the agents generate a synthetic log of all transactions as a result of the simulation. This i synthetic data can be used to further advance fraud detection research, without leaking sensitive information about the underlying data or breaking any non-disclose agreements. Using statistics and social network analysis (SNA) on real data we calibrated the relations between our agents and generate realistic synthetic data sets that were verified against the domain and validated statistically against the original source. We then used the simulation tools to model common fraud scenarios to ascertain exactly how effective are fraud techniques such as the simplest form of statistical threshold detection, which is perhaps the most common in use. The preliminary results show that threshold detection is effective enough at keeping fraud losses at a set level. This means that there seems to be little economic room for improved fraud detection techniques. We also implemented other applications for the simulator tools such as the set up of a triage model and the measure of cost of fraud. This showed to be an important help for managers that aim to prioritise the fraud detection and want to know how much they should invest in fraud to keep the loses below a desired limit according to different experimented and expected scenarios of fraud. ii to my beloved family: Helena, Isabella and Linnea iii Acknowledgements Many people had contributed to this work. It’s been an honour for me to gain such an incredible and valuable experience during this time with them. First, I would like to thank my main supervisor Dr. Stefan Axelsson, for being directly involved in my work and co-author of my publications. I would like to thank Prof. Bengt Carlsson for his guidance as a senior researcher and immense support to my work. I have to extend my thanks to all doctoral students, professors and researchers in our department (DIDD) at BTH for sharing this amazing journey with me over the last 5 years. Many of my colleagues have actively participated and contributing to my work with their comments to increase the quality of it. Blekinge Institute of Technology has been economically supporting my research all over these 5 years as a PhD student. I have only words of gratitude for them. This thesis work was possible due to the research project "Scalable resource-efficient systems for big data analytics" funded by the Knowledge Foundation (grant: 20140032) in Sweden. Finally, I would like to thank my family. They are an important source of inspiration, support and motivation to continue my academic journey. My wife Helena, for her unconditional support and my two daughters, Isabella and Linnea. I extend my thanks to Margareta, Tommy, Stefan and Karolina who are my Swedish family. I have the biggest gratitude to my parents Jesus and Soledad who are always there for me. My sister Paula and my other relatives that came to visit me all the way from Colombia and brought part of my native country to my new home country Sweden. Edgar Lopez September 2016, Karlskrona, Sweden. v Preface This thesis is based on the work presented in the following six papers. The papers I, III, IV and V are published in peer-reviewed conference proceedings. Paper II is published in a journal and Paper VI is been submitted to a journal and is currently under peer-reviewing. The included papers have been modified to fit this format, but the content is unchanged. Paper I E. A. Lopez-Rojas, S. Axelsson, and D. Gorton. “RetSim: A Shoe Store Agent-Based Simulation for Fraud Detection”. In: The 25th European Modeling and Simulation Symposium (2013). (Best Paper Award) Paper II E. A. Lopez-Rojas, D. Gorton, and S. Axelsson. “Using the RetSim simulator for fraud detection research”. In: International Journal of Simulation and Process Modelling 10.2 (2015), p. 144 Paper III E. A. Lopez-Rojas and S. Axelsson. “Social Simulation of Com- mercial and Financial Behaviour for Fraud Detection Research”. In: Advances in Computational Social Science and Social Simulation. Barcelona, Spain, 2014 Paper IV E. A. Lopez-Rojas. “Extending the RetSim Simulator for Estimating vii the Cost of fraud in the Retail Store Domain”.