Arxiv:2101.04285V1 [Cs.LG] 12 Jan 2021 Ever, Declined Transactions Also Contain Risk Indicators and 1 Introduction Can Be Utilized in an Unsupervised Setting
Total Page:16
File Type:pdf, Size:1020Kb
Explainable Deep Behavioral Sequence Clustering for Transaction Fraud Detection Wei Min1, Weiming Liang1, Hang Yin1, Zhurong Wang1, Mei Li1, Alok Lal2 1eBay China 2eBay USA Abstract and is harder to fabricate, therefore brings opportunities to boost the capability of risk management. Recently, with the In e-commerce industry, user-behavior sequence data has booming of deep learning, there is a growing trend to lever- been widely used in many business units such as search age user behavioral data in risk management by learning and merchandising to improve their products. However, it is rarely used in financial services not only due to its 3V charac- the representation of click-stream sequence. For example, teristics – i.e. Volume, Velocity and Variety – but also due to e-commerce giants such as JD, Alibaba use recurrent neu- its unstructured nature. In this paper, we propose a Financial ral network to model the sequence of user clicks for fraud Service scenario Deep learning based Behavior data repre- detection(Wang et al. 2017; Li et al. 2019), and Zhang et.al. sentation method for Clustering (FinDeepBehaviorCluster) (Zhang, Zheng, and Min 2018) use both convolution neu- to detect fraudulent transactions. To utilize the behavior se- ral network and recurrent neural network to learn the em- quence data, we treat click stream data as event sequence, bedding of the click stream in online credit loan application use time attention based Bi-LSTM to learn the sequence em- process for default prediction. bedding in an unsupervised fashion, and combine them with However, the common practice for risk management is intuitive features generated by risk experts to form a hy- to use a predictive framework, which is largely relying on brid feature representation. We also propose a GPU powered HDBSCAN (pHDBSCAN) algorithm, which is an engineer- feedback that is often lagged. According to Gartner Re- ing optimization for the original HDBSCAN algorithm based search, ”By 2021, 50% of enterprises will have added un- on FAISS project, so that clustering can be carried out on supervised machine learning to their fraud detection solu- hundreds of millions of transactions within a few minutes. tion suites”, quoted from Begin Investing now in Enhanced The computation efficiency of the algorithm has increased Machine Learning Capabilities for Fraud Detection. Unsu- 500 times compared with the original implementation, which pervised methods, especially clustering techniques are better makes flash fraud pattern detection feasible. Our experimen- suited to discover new types of unseen fraud. tal results show that the proposed FinDeepBehaviorClus- ter framework is able to catch missed fraudulent transactions 1. Fraud is a rare event, outlier detection framework pro- with considerable business values. In addition, rule extrac- vides a different angle to catch bad users that were missed tion method is applied to extract patterns from risky clus- by existing classification models; ters using intuitive features, so that narrative descriptions can 2. Fraud is dynamic, supervised predictive learning can only be attached to the risky clusters for case investigation, and help us learn existing fraud patterns, but unsupervised unknown risk patterns can be mined for real-time fraud de- tection. In summary, FinDeepBehaviorCluster as a comple- clustering is more capable of discovering unknown pat- mentary risk management strategy to the existing real-time terns; fraud detection engine, can further increase our fraud detec- 3. Risk predictive models are usually trained on labeled data, tion and proactive risk defense capabilities. with a performance tag from approved transactions. How- arXiv:2101.04285v1 [cs.LG] 12 Jan 2021 ever, declined transactions also contain risk indicators and 1 Introduction can be utilized in an unsupervised setting. User behavior analysis provides new insights into con- Therefore, clustering techniques are effective complemen- sumers’ interactions with a service or product, many busi- tary solutions to the existing risk predictive models. How- ness units of e-commerce platforms rely on user behaviors ever, it can be argued that the outcome (the membership of heavily and to a great extent. For instance, search and mer- data points) of the clustering task itself does not necessarily chandise are heavily driven by stochastic behaviors of users. explicate the intrinsic patterns of the underlying data. From However, user behavioral data is unstructured and sparse, an intelligent data analysis perspective, clustering explana- it is rarely used in traditional financial services. User be- tion/description techniques are highly desirable as they can havior describes the unique digital signature of the user, provide interesting insights for pattern mining, business rule extraction and domain knowledge discovery. Copyright © 2021, Association for the Advancement of Artificial By combining the advantages of utilizing behavior se- Intelligence (www.aaai.org). All rights reserved. quential data and clustering techniques, we propose a frame- Density-Based Clustering distinctive clusters work called FinDeepBehaviorCluster: firstly, we use time- in the data, based on the idea that a cluster in a data space is a contiguous Cluster region of high point density, separated from attention based deep sequence model to learn behavior se- other such clusters by contiguous regions of Partitional algorithms typically Te chnique s low point density quence embedding in an unsupervised fashion, and com- determine all clusters at once. Hierarchical algorithms find bine them with intuitive features from risk experts to form successive clusters using a hybrid behavior representation; secondly,we use HDB- Partition Hierarchical previously established clusters Density-Based SCAN to perform clustering on behavior features, to im- Clustering clustering Clustering prove the computational efficiency, we propose a GPU ac- DBSCAN HDBSCAN celerated version of HDBSCAN (Leland McInnes 2016) Centroid Top Down: Bottom Up: called pHDBSCAN ; thirdly, risky clusters are extracted and Division Agglomerative K-means clustering explanation techniques are used to describe the Agglomerative Hierarchical clusters in conditional statements. We will give a detailed Clustering explanation of the algorithm in Section 3. Figure 1: Commonly used Cluster Techniques To summarize, our key contributions are: • An automatic clustering based fraud detection frame- work utilizing behavioral sequence data, called FinD- 2.2 Clustering Techniques eepBehaviorCluster. Based on experimental results, our There are commonly used clustering techniques such as proposed framework can catch fraudulent transactions partitioning clustering (K-means), hierarchical clustering missed by existing predictive risk models and signifi- (Top Down: Division, and Bottom up: Agglomerative) and cantly reduce the transaction loss. Density-Based clustering (DBSCAN(Ester et al. 1996)), il- • Engineering Excellence: To address the challenge of clus- lustrated in Figure 1. The limitations of K-means clustering tering on industry-scaled data sets, we have a new imple- include: need a pre-defined k, badly impacted by outliers, mentation of GPU powered HDBSCAN (pHDBSCAN) sensitive to initial seeds and not suitable for special data which is several orders of magnitude faster on tens of mil- structures. Hierarchical clustering has lower efficiency, as lions of transactions. it has a time complexity of O(n3), and meanwhile is sen- sitive to outliers and noise. DBSCAN doesn’t perform well when the clusters are of varying densities because the ep- 2 Related Work silon parameter serves as a global density threshold which In this section, several key research areas related to our work can cause challenges on high-dimensional data. Recently, are reviewed. Hierarchical Density-based spatial clustering of applications with noise(HDBSCAN) has become more widely used in 2.1 Click Stream Data for Fraud Detection various industries. It better fits real life data which contains irregularities, such as arbitrary shapes and noise points. The Zhongfang et.al. (Zhongfang Zhuang 2019) proposed a algorithm provides cluster trees with less parameters tuning, framework to learn attributed sequence embedding in an un- and is resistant to noise data. However, the computational supervised fashion, where they used encoder-decoder set- cost has limited its application in large scale data sets. Espe- ting to define the attribute network, used sequence predic- cially in transaction risk domain, high performance cluster- tion setting to define the sequence network, then learned the ing can reduce the decision time and mitigate the loss. embedding by training the integrated network, which set up As mentioned earlier, clustering algorithms lead to clus- a core foundation for user behavior analysis. Longfei et.al. ter assignments which are hard to explain, partially be- (Li et al. 2019) proposed a unified framework that combined cause the results are associated with all the features in a learned embedding from users’ behaviors and static profiles more complicated way. While, explainable AI is a must-have altogether to predict online fraudulent transactions in a su- in financial services, which can increase transparency and pervised fashion. Recurrent layers were used to learn the trust-worthiness of the black-box models. As a best prac- embedding of dynamic click stream data. Their proposed tice in clustering tasks, decision rule generation method is model managed to boost the benchmark