Non-Negative Residual Matrix Factorization with Application to Graph Anomaly Detection∗

Non-Negative Residual Matrix Factorization with Application to Graph Anomaly Detection∗ Hanghang Tong Ching-Yung Lin IBM T.J. Watson Research Center IBM T.J. Watson Research Center [email protected] [email protected] Abstract residual matrix R suggests a significant deviation from low- Given an IP source-destination traffic network, how do we rank structure in the graph). spot mis-behavioral IP sources (e.g., port-scanner)? How From algorithmic aspect, a recent trend in matrix fac- do we find strange users in a user-movie rating graph? torization is to improve the interpretation of such graph min- Moreover, how can we present the results intuitively so that ing results. To name a few, non-negative matrix factoriza- it is relatively easier for data analysts to interpret? tion methods [31] restrict the entries in F and G to be non- We propose NrMF, a non-negative residual matrix fac- negative; example-based methods [15] generate sparse de- torization framework, to address such challenges. We composition by requiring the columns of the matrix F to be present an optimization formulation as well as an effective the actual columns of the original matrix A; etc. By im- algorithm to solve it. Our method can naturally capture ab- posing such non-negativity and/or sparseness constrains on normal behaviors on graphs. In addition, the proposed algo- the factorized matrices, it is relatively easier to interpret the rithm is linear wrt the size of the graph therefore it is suitable community detection results. Actually, it is now widely re- for large graphs. The experimental results on several data alized that non-negativity is a highly desirable property for sets validate its effectiveness as well as efficiency. interpretation since negative values are usually hard to interpret. However, most, if not all, of these constrains (i.e., 1 Introduction non-negativity, sparseness, etc) are imposed on the factor- Graphs appear in a wide range of settings, e.g., social net- ized matrices. Consequently, these existing methods are tai- works, computer networks, user-movie rating graphs in col- lored for the task of community detection. It is not clear laborative filtering, the world wide web, biological networks, how to improve the interpretation for the task of anomaly and many more. How can we find patterns, e.g. communities detection from the algorithmic aspect. Can we impose simi- and anomalies, in a large sparse graph? lar constraints (e.g., non-negativity)on the residual matrix R Naturally, low-rank approximations on the adjacency to improve the interpretation for graph anomaly detection? matrices of the graph provide powerful tools to answer the From application side, it is often the case that anomalies above questions. Formally, let A be the adjacency matrix of on graphs correspond to some actual behaviors/activities of the graph, a rank-r approximation of matrix A is a matrix certain nodes. For instance, we might flag an IP source as a ˜ ˜ ˜ A where A is of rank r and the residual matrix (A − A) suspicious port-scanner if it sends packages to a lot of des- has small norm. The low-rank approximation is usually tinations in an IP traffic network [37]; an IP address might ˜ presented in a factorized form e.g., A = A + R = FG + R be under the DDoS (distributed denial-of-service) attack if it where F, G are the factorized matrices of rank-r, and R receives packages from many different sources [37]; a per- is the residual matrix. The factorized matrices F and G son is flagged as ‘extremely multi-desciplinary’ if s/he pub- can naturally reveal the community structure in the graph. lishes papers in many remotely related fields in an author- The residual matrix R, on the other hand, is often a strong conference network [2]; in certain collusion-type of fraud in indicator for anomalies on graphs (e.g., a large norm of the financial transaction network, a group of users always give good ratings to another group of users in order to artifi- cially boost the reputation of the target group [9], etc. If ∗ Research was sponsored by the Army Research Laboratory and was we map such behaviors/activities (e.g., ‘sends/receives pack- accomplished under Cooperative Agreement Number W911NF-09-2-0053. ages’, ‘publishes papers’, ‘gives good ratings’, etc) to the The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, language of matrix factorization, it also suggests that the cor- either expressed or implied, of the Army Research Laboratory or the responding entries in the residual matrix R should be non- U.S. Government. The U.S. Government is authorized to reproduce and negative. distribute reprints for Government purposes notwithstanding any copyright In response to such challenges, in this paper, we propose notation here on. 143 Copyright © SIAM. Unauthorized reproduction of this article is prohibited. a new matrix factorization (NrMF) for the task of graph the ith row and jth column of the matrix A, and A(:, j) is anomaly detection. To the best of our knowledge, we are the the jth column of A, etc. first to address the interpretation issue of matrix factorization With the above notations, a general matrix factorization for the purpose of graph anomaly detection. The major problem can be formally defined as follows: contributions of the paper can be summarized as follows: PROBLEM 1. Matrix Factorization 1. Problem formulation, presenting a new formulation for Given: A graph An×l, and the rank size r; matrix factorization (NrMF) tailored for graph anomaly Find: Its low-rank approximation structure. That is, find (1) detection; two factorized matrices Fn×r and Gr×l, and the resid- 2. An effective algorithm (AltQP-Inc) to solve the above ual matrix Rn×l; such that (1) An×l ≈ Fn×rGr×l, optimization problem, linear wrt the size of the graph; and (2) Rn×l = An×l − Fn×rGr×l. 3. Proofs and analysis, showing the effectiveness as well as the efficiency of the proposed method; Existing matrix factorization techniques can be viewed 4. Experimental evaluations, demonstrating both the ef- as different instantiations of Problem 1. They differ from fectiveness and efficiency of the proposed method. each other, mainly from the following two aspects: (1) by using the different metrics to measure the approximation The rest of the paper is organized as follows: we introduce accuracy (some norms on the residual matrix R); and (2) by notation and formally define the problem (NrMF) in Sec- imposing the different constraints on the factorized matrices tion 2. We present and analyze the proposed solution for F and G . For example, non-negative matrix factorization NrMF in Section 3 and Section 4, respectively. We provide requires the factorized matrices to be non-negative (see experimental evaluations in Section 5. The related work is Section 6 for a review). reviewed in Section 6. Finally, we conclude in Section 7. In this paper, we present another instantiation of Prob- 2 Problem Definitions lem 1 by imposing the non-negativityconstrains on the resid- Table 1: Symbols ual matrix R. Our problem, Non-Negative Residual Matrix Symbol Definition and Description Factorization (NrMF), is formally defined as follows: A, B,... matrices (bold upper case) PROBLEM 2. Non-Negative Residual Matrix Factorization th th A(i, j) the element at the i row and j column (NrMF) of matrix A Given: A graph A × , and the rank size r; A(i, :) the ith row of matrix A n l A(:, j) the jth column of matrix A Find: Its low-rank approximation structure. That is, find A0 transpose of matrix A two factorized matrices Fn×r and Gr×l, and the resid- a, b,... column vectors (bold lower case) ual matrix Rn×l; such that (1) An×l ≈ Fn×rGr×l; (2) R × A × F × G × A F,G factorized matrices of A n l = n l − n r r l; and (3) for all (i, j) > R residual matrix of A 0, R(i, j) ≥ 0. n number of type 1 objects in A Problem 2 is tailored for the task of graph anomaly de- l number of type 2 objects in A tection, where we explicitly require the corresponding ele- m number of edges in A ments R(i, j) in the residual matrix R to be non-negative r rank size if there exists an edge between node i and node j in the Table 2 lists the main symbols we use throughout the original graph (i.e., A(i, j) > 0). As explained earlier in paper. In this paper, we consider the most general case of Section 1, the residual matrix R is often a good indicator bipartite graphs. We represent a general bipartite graph by for anomalies on graphs. Moreover, many abnormal behav- its adjacency matrix1. Following the standard notation, we iors/activities (e.g., port-scanner, DDoS, etc) can be mapped use capital bold letters for matrices (e.g. A), lower case bold to some non-negative entries in the residual matrix R. For letters for vectors (e.g. a). We denote the transpose with a instance, a large entry in R might indicate a strange interac- 0 prime (i.e., A is the transpose of A). We use subscripts tion between two objects; a heavy row/column of R might to denote the size of matrices/vectors (e.g. An×l means indicate a suspicious object (e.g., port-scanner, or an IP ad- a matrix of size n × l). When the size of a matrix or a dress that is under DDoS attack, etc).

Non-Negative Residual Matrix Factorization with Application to Graph Anomaly Detection∗

Parametrizations of K-Nonnegative Matrices

Crystals and Total Positivity on Orientable Surfaces 3

A Nonconvex Splitting Method for Symmetric Nonnegative Matrix Factorization: Convergence Analysis and Optimality

Minimal Rank Decoupling of Full-Lattice CMV Operators With

Random Matrices, Magic Squares and Matching Polynomials

Z Matrices, Linear Transformations, and Tensors

Totally Positive Toeplitz Matrices and Quantum Cohomology of Partial Flag Varieties

Tropical Totally Positive Matrices 3

[Math.QA] 3 Dec 2001 Minors)

The Symbiotic Relationship of Combinatorics and Matrix Theory

Convex and Semi-Nonnegative Matrix Factorizations: Ding, Li and Jordan 1

Contents September 1