International Journal of Research and Scientific Innovation (IJRSI) | Volume V, Issue VII, July 2018 | ISSN 2321–2705 Enabled Fraud Detection in Credit Card Transactions

Reshma R S# #Research Scholar, Department of Mechanical Engineering APJ Abdul Kalam Technological University, College of Engineering Trivandrum, Kerala, India

Abstract-Fraudulent activities exist in all fields of financial (AUC). One with higher AUC value is considered to be a domain; credit card is not an exception. The increased fraud in better model. credit card transaction is obviously due to its wide spread popularity since the introduction of online banking and e- A. Literature Review commerce platforms. The purpose of the fraud may be to obtain goods without paying, or to obtain unauthorized funds from an [1] discussed credit card fraud detection using account. We cannot rely on pattern based fraud detection since and restricted . In this work three credit the fraudster rarely follows any pattern. As technology advances card fraud data, German , European and Australian data were the fraudulent activities will also increase. The project aims to used . The models were built and roc were plotted. It is shown introduce deep learning technique to detect fraud in credit card that the European credit card data shows greater roc curve transactions. Deep learning consists of neural networks with compared to other, both on autoencoder and restricted many hidden layers. The project uses deep learning techniques Boltzmann machine model. This is mainly due to the huge such as Autoencoder, Restricted Boltzmann Machine, number of data points available for this model. Hence for this Variational Autoencoder and Deep Belief Network to detect work European credit card data is used. The work also shows fraud in the credit card transactions. The aim is to find the best technique among them. Unsupervised deep learning method is that the tanh function work well with autoencoder based used to evaluate fraud since the model learn from data and not model so tanh based function is used as the activation function from labels. both in hidden layer, encoder and decoder section to improve the accuracy of the function. The data is divided in the ratio Keywords: , Deep Learning, Autoencoder, 80: 20 since the amount of fraud is much lesser so this method Variational Autoencoder, Restricted Boltzmann Machine, Deep is adapted to this work too. The work shows that deep belief network learning is the most efficient method for fraud detection. H2O framework was used by the model to understand the pattern in I. INTRODUCTION the underlying dataset. H2O is an open source available for echnology is advancing day by day and this improvement analysing big data. Multi-layered, feed forward neural T in technology has significant influence on various areas network is used by the author to find patterns in credit card including financial transactions. Nowadays credit card is the fraud. [2] Discussed the comparison of available techniques most widely used transaction method, this increased for fraud detection and their demerits. The paper points out popularity of credit card is mainly due to the ease of that simple can be using in case of linear transaction, the increasing popularity of network banking and data and it will fail if the data is non linear. In case of decision mobile banking. Since technology is a two sided coin, there is tree even small changes in the data can affect the structure of also a great improvement in the type of fraud that can occur. the tree. Choosing splitting criteria is another demerit since its Fraudster keeps on changing his fraud pattern and rarely uses selection is too complex. It cannot handle real time data too. same pattern for fraudulent activities. This shows us that From [1] the project adopts the following features that are simple fraud detection method based on labelled data cannot mentioned below. The project aims to improve the fraud work well with all type of fraud. Its high time we should think detection model mentioned in the work by adding variational about unsupervised learning method where we are not autoencoder and deep belief network to it. The tanh activation bounded to the limitations of labelled data. The paper aims to function is used in both hidden and visible layers for getting develop unsupervised deep learning models to detect fraud in better AUC value. The method of finding AUC for credit card transactions. Credit card fraud increases day by determining efficiency of the model is also adopted from the day. The way by which fraud occur rarely follows pattern. work. Above all from the three dataset analysed in the work Unsupervised deep learning method can be used to detect the project choose European credit card fraud data since the fraud in financial transactions. The paper aims to develop four work proves that this is the best data set available for fraud unsupervised deep learning methods to detect anomaly in detection with more data points. The only disadvantage of the credit cards. The data belongs to a European credit card data data is that the data is PCA transformed with unknown which consist of about 284,807 transactions out of which only parameters so the attributes that makes the transaction fraud is few are fraud. The aim is to detect the fraudulent transactions still unknown and we cannot validate the model using other and find out the area under receiver operation characteristic data since we have to mask the data with same parameter www.rsisinternational.org Page 111

International Journal of Research and Scientific Innovation (IJRSI) | Volume V, Issue VII, July 2018 | ISSN 2321–2705 which is unknown. The advantage of the dataset is that since also contains an input, hidden and output layer. The aim of the data set is already PCA transformed we don’t need to each AE model is to reproduce the input at its output with mask the data. The dataset consist of about 2 lakh transactions minimal or zero error. During each step, reconstruction error out of which only few are fraud, this is equivalent to the real is measured and a threshold for the reconstruction error is set case problems where the fraud cases are fewer than the actual so that the model learns to reproduce with minimal error. cases. An autoencoder consist of encoders, lots of hidden layer B. Data and Methodology and decoders. The input is compressed when it traverse through the hidden layer, where the code which helps in the The datasets is credit card fraud data from a European credit regeneration of input at the output layer lies. The autoencoder card company, it is obtained from kaggle. The data contains model in this project is created using tanh fuction. Let x be transactions made by credit cards holder in 2013 September. the input of the function, h the hidden layer and x’ be the Transactions that had happened in two days is shown in the output function. The model created includes 4 layers, two dataset, It is given that the data contains only 492 frauds out encoders and two decoders. The input to the input layer is the of 284,807 transactions which accounts only 0.172 % of all input dimension of the data set. The activation function is transactions. All input variables are converted to numerical represented by a(x). In the model developed, the activation value by PCA transformation due to confidentiality issues, the function used in tanh. The equation for the encoder layer is original features are not provided and Features V1, V2, ... V28 given below. are the principal components obtained after PCA transformation , the features that are not PCA transformed are i(x) (Encoder) = f(a(x)) = tanh (Wx + b) ’Time’ and ’Amount’. Time represents the difference in The input is given to the encoder model where it is seconds between the particular transaction and first multiplied by the bias function It is then compressed in the transaction. ’Amount’ represents the money transaction that hidden layer. The input is reconstructed from its reduced form had occurred, Feature ’Class’ is a variable that shows whether by using the transpose of the activation function in the hidden a transaction is fraudulent or not value 1 shows that the layer and decoder layers. By reducing the mean square error transaction is fraud else it is non fraudulent. The fig 3.8 shows the output model is optimised ie, The model contains 4 layers the distribution of data on the data set. From this we could which contains neuron numbers 29, 7,7, 29. One of the understand that the fraud data is much lesser than the non peculiarities of the model is that autoencoder wants the fraudulent data. When time is plotted against transaction it neurons in the input and output layers to be equal. The first doesn’t show any abnormality so it is proven that we can do two layers are the encoders and the last two layers are the nothing with the time and only the data matters. So we should decoders. The model is trained for 100 epochs and batch size develop a model that should learn from the data. The of 32 and save the best performing model to a file. The Model screenshot of a part of the data is shown below. Checkpoint in Keras is used for this task . The accuracy of the model is determined by measuring the AUC score of the model. ROC curves are very useful tool for understanding the performance of unbalanced data. The true positive rate Vs false positive rate over different threshold values is plotted using ROC. The line should be as closer as possible to the upper left corner which defines high accuracy. The ROC curve with high area denotes high precision which is a sign of low false positive rate, and high recall is related to a low false negative rate. High score in AUC can be obtained if the model generates best output ie, detect all fraud cases without any failure.

2. Restricted Boltzmann Machine (RBM) Figure 1. Dataset Restricted Boltzmann Machine Restricted Boltzmann C. Models Developed machine can be used in regression, , classification, etc. RBMs have two layers of neural networks 1. Autoencoder Model (AE) that is used as layers of DBMs. RBM have an input layer which is also called visible layer. Hidden layer represents the Autoencoder (AE) tries to approximate the following second layer. There is no output layer for the model since the identity function: transformed input at the hidden layer is feeded as input F(x) ≈ X AE is that type of neural network which is teached to produce the input at the output layer. Like any other neural network it www.rsisinternational.org Page 112

International Journal of Research and Scientific Innovation (IJRSI) | Volume V, Issue VII, July 2018 | ISSN 2321–2705

of all the data is splitted into training and validation sets, and train the model on training set, while evaluate the performance on validation. Starting with the hidden layer which have dimensions smaller than the input layer, the learning rate is set to be a small value (such as 0.001), the validation data set is monitored for reconstruction error (not the actual error) The reconstruction error is basically the mean of square of the difference between predicted â and the Figure 2 : Layers of RBM actual data x, averaged over the entire batch. in the second cycle. Data travels from visible layer to hidden 3 Variational Autoencoder layer and vice versa to obtain minimal reconstruction error. A variational autoencoder (VAE) is obtained by combining a series of . Here the input variable is converted to a latent representation and that variable is further converted to the output. The advantage of the model is that the underlying pattern of the data can be obtained. While using generative models a random new output which looks similar to the training data is required and this can be achieved with VAE. It is better to work with VAE than Standard autoencoders when the data is to be changed in a desired way or random way. The model learns to build Figure 3: Working of RBM compact representations and reconstructs the inputs well. Each circles in the figure 2 denotes nodes, and calculations The basic problem for generation in a basic take place there. Nodes are connected across layers, but two autoencoder is that the latent space converts the inputs to Same nodes are not connected, The communications where the encoded vectors lies, It is not necessary that the between each layers are restricted in RBM. The input is data required is continuous. On the autoencoder, for MINST processed to generate outputs and the output is given to the data each model required distinct encodings type which hidden layer which re-transmits the data to the input layer to makes it easier for decoding them. This is ok if the obtain the exact representation of the input with minimum replication of input is just to be done. Since a generative reconstruction error on the input side of RBM. Here bias model is required, random sampling of latent function for each layers are initialized randomly. In majority of cases the hidden and visible layers are binary valued but Space using a continuous data is required. If the space is for fraud detection it is better to use the Gaussian function discontinuous the decoder will generate an output which is for fraud detection. As shown in the figure 3 whenever the unrealistic since the decoder has no idea to deal with the input signal traverse from input to the ouput layer the data latent space. One of the unique properties that separate will be multiplied by the weight w, and will get added by the autoencoders from VAE is that their latent spaces are bias function b, of the hidden layer. At last the output is continuous which allows random sampling and squezed between zero and one using the sigmoid fuction interpolation. which shows the probability of each hidden layer being 4 Deep Belief Network used. The data is splitted in the ratio 80:20 , and will train the model on the training set. Figure 3 shows Reconstruction A Deep Belief Network (DBN) aims to learn the structure of of RBM, Restricted boltzmann machine is an energy based the input dataset and features detector layers that can be one model. The Energy function E(x,h) is given below. or more than one. A DBN is a combination of that can either be directed or undirected. An Energy function : undirected RBM constitutes the top layer of the network E(x,h) = −hTWx−cTx−bTh where as lower layers are downwardly directed. Deep Belief Networks (DBN) are formed by stacking RBMs in greedy = X j Xk Wj,khjxk −Xkckxk −Xjbjhj manner. The structure of the training data is extracted deeply Distribution : p(x,h) = exp(−E(x,h)) / Z using DBMs.With RBMs as the building blocks; layer-wise training in unsupervised manner can be applied to DBNs the Where Z is the partition function and p(x,h) is the joint steps are as follows probability distribution. x represents visible layer and hidden layer is represented using ’h’. From the energy 1. The input is given to the first RBM layer which act as the function we can derive the value of free energy of x. Free input layer. Energy; F(x) is what we need for test data from which 2. The data in the first layer is processed and the underlying distribution to detect anomalies is obtained. The greater the pattern is analyzed. free energy, the greater the chance for x to be a fraud. First www.rsisinternational.org Page 113

International Journal of Research and Scientific Innovation (IJRSI) | Volume V, Issue VII, July 2018 | ISSN 2321–2705

3. The second layer is trained as another RBM and the output The ROC curve plotted gave an Area of 0.9626. This is given obtained is given to the input of the other layer in the figure 4. The reconstruction error is also plotted it is shown on figure 5. The AUC value is much closer to one 4. Steps second and third can be repeated in required number which states that the model can work well with the data and of times. Here the number of hidden layer chosen is 3 will generate minimum reconstruction error. The threshold for 5. The data is then pre-processed using logistic regression to reconstruction error is set close to zero so the model will obtain the output generate greater output. 6. The output obtained is the result required 2 Restricted Boltzmann Machine Fine-tuning can be done to improve the result using logistic The result obtained when the model is developed using regression unsupervised training strategy used to determine restricted botzmann machine is given below. Here we the weight and bias functions. obtained the area under ROC curve to be 9.600.

II. RESULTS AND DISCUSSION Deep learning based unsupervised learning models were developed using autoencoder, variational autoencoder, restricted boltzmann machine and deep belief network. The result is plotted by using area under ROC curve (AUC). ROC is the receiver operation characteristics. This method is used to determine the precision of the model since the data is highly imbalanced. 1. Autoencoder Model Fig: 6 : Confusion Matrix of RBM model Autoencoder model was developed in python using keras and tensor flow and the following results were obtained. The confusion matrix showing the normal and fraudulent behaviour is given below.

Fig: 7 : ROC curve of RBM model 3. Variational Autoencoder Model Variational autoencoder is the new model introduced in the project. As explained earlier the result is expected to be more than that of the other two model including RBM and autoencoder. Variational autoencoder is a variation of the Autoencoder model which can deal with more unbalanced Fig:4 : Confusion Matrix of AE model data than the former. The result obtained and the ROC curve obtained are given below which shows that the result obtained is the greatest. So this model can be adapted to other fraud detection models.

Fig: 5 : ROC curve of Autoencoder Model Fig: 8 : ROC curve of Variational Autoencoder Model www.rsisinternational.org Page 114

International Journal of Research and Scientific Innovation (IJRSI) | Volume V, Issue VII, July 2018 | ISSN 2321–2705

For the fraud detection model developed variational autoencoder generated a larger AUC value of 0.9655 followed by deep belief network, autoencoder and restricted Boltzmann machine. If the data set is one in a real scenario the AUC score obtained would be higher than this data. For the same dataset different model generated different AUC curve . Cascading auto encoder to form variational autoencoder proves a better alternative.

AUC score Models AUC Value Autoencoder 0.9626 Variational Autoencoder 0.9655 Restricted Boltzmann Machine 0.9600 Fig: 9 : Confusion Matrix of Variational Autoencoder Model Deep Belief Network 0.9631 4. Deep Belief Network Deep belief network (DBN) is many layered restricted III. CONCLUSIONS boltzmann machine. The output obtained is shown in the The credit card fraud detection model using fig:10. The model produced an AUC value of 0.9631. Autoencoder, Variational Autoencoder, Restricted Boltzmann machine and Deep Belief Network were succesfully implemented and ROC curve were plotted. The result obtained in case of the variational autoencoder is higher than the others. This fraud detection model is trained using the European credit card fraud data. The data availability in case of credit card fraud detection is scarce. Since model learn from data and not from labels it can be transfered to other data set too. ROC curves were plotted and AUC was taken as accuracy criterion since the data is highly imbalanced. The variational autoencoder model created a greater AUC of 0.9655 which is greater than the other model used. If the model is used with real time data the AUC value could be improved. ACKNOWLEDGMENT Author is thankful to the faculty at College of Engineering, Trivandrum for guidance and technical support. Fig: 10 : Confusion Matrix of Deep Belief Network REFERENCES [1]. Pumsirirat,A.andL.Yan(2018). Credit card fraud detection using deep learning based on auto-encoder and restricted boltzmann machine. (IJACSA) International Journal of Advanced Computer Science and Applications 9, 18–25. [2]. Malini, N. and M. Pushpa (2017). Analysis on credit card fraud identification techniques based on knn and outlier detection. 3rd International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics(AEEEICB17) 978 1(4), 1–4. [3]. https://www.kaggle.com/mlg-ulb/creditcardfraud [4]. https://weiminwang.blog/2017/08/05/credit-card-fraud-detection- 2-using-restricted-boltzmann-machine-in-tensorflow

Fig: 11 : ROC curve of Deep Belief Network www.rsisinternational.org Page 115