Outlier Detection Through Null Space Analysis of Neural Networks

Outlier Detection through Null Space Analysis of Neural Networks Matthew Cook 1 Alina Zare 1 Paul Gader 2 Abstract encountered in systems deployed for application. Thus, the Many machine learning classification systems ability to accurately identify outliers to ensure reliability lack competency awareness. Specifically, many of ANN-dependent systems is essential. Due to this, out- systems lack the ability to identify when outliers lier detection has been a key aspect of machine learning for (e.g., samples that are distinct from and not rep- some time (Rousseeuw & Driessen, 1999; Breunig et al., resented in the training data distribution) are be- 2000; Kingma & Welling, 2014). In standard outlier detec- ing presented to the system. The ability to de- tion, the goal is to identify inputs that are distinct from the tect outliers is of practical significance since it training data distribution (Bansal et al., 2016). These dis- can help the system behave in an reasonable way tinctions can be subtle, as illustrated by the work in adver- when encountering unexpected data. In prior sarial examples (Szegedy et al., 2013; Fawzi et al., 2018), work, outlier detection is commonly carried out and also can be samples from an unknown class (e.g., pre- in a processing pipeline that is distinct from the senting an image of a dog to an ANN trained to differenti- classification model. Thus, for a complete sys- ate between species of cats). Without mechanisms to detect tem that incorporates outlier detection and clas- outliers, ANNs will classify every sample (sometimes with sification, two models must be trained, increas- high confidence (Moosavi-Dezfooli et al., 2015)) to a class ing the overall complexity of the approach. In found in the training data. In this paper, we focus on the this paper we use the concept of the null space outlier detection paradigm of unknown classes and propose to integrate an outlier detection method directly a method for ANNs to identify when data does not match into a neural network used for classification. Our its trained purpose. method, called Null Space Analysis (NuSA) of A variety of outlier detection methods have been devel- neural networks, works by computing and con- oped in the literature. A common outlier detection method trolling the magnitude of the null space projec- is to use distances to nearby points (Breunig et al., 2000; tion as data is passed through a network. Using Goldstein & Dengel, 2012). An example of this is to use these projections, we can then calculate a score the distance to the k nearest neighbors as an outlier score that can differentiate between normal and abnor- (Ramaswamy et al., 2000). Another example is the Angle- mal data. Results are shown that indicate net- Based Outlier Detection (Kriegel et al., 2008) which uses works trained with NuSA retain their classifica- both the distance and angle between points to identify out- tion performance while also being able to detect liers. Another class of outlier detection algorithms are one- outliers at rates similar to commonly used outlier class classifiers. A popular example of these algorithms is detection algorithms. the One-Class Support Vector Machine (Scholkopf¨ et al., 2001). In this case the Support Vector Machine (SVM) is arXiv:2007.01263v1 [cs.LG] 2 Jul 2020 used to estimate a hyperplane that encompasses the train- 1. Introduction ing data and outliers can be detected based on their distance from the hyperplane. The major and minor principal com- Artificial neural network (ANN) and deep learning-based ponents of known “good” data to generate a hyperplane for approaches are increasingly being incorporated into real- comparison has also been used (Shyu et al., 2003). The Iso- world applications and systems. Unlike curated bench- lation Forest (Liu et al., 2008) is another method that uses mark datasets, outliers and unexpected data is commonly a random forest to find outliers. Methods that identify out- 1Department of Electrical & Computer Engineering Univer- liers by having a large reconstruction error using a model sity of Florida, Gainesville Fl, USA 32611 2Department of Com- fit with the training data have also been used. Recently, the puter & Information Science & Engineering University of Florida, most popular examples of these methods are auto-encoders Gainesville Fl, USA 32611. Correspondence to: Matthew Cook (Aggarwal, 2015) or variational auto-encoders (Kingma & <matthew.cook@ufl.edu>. Welling, 2014). However, all of these methods are capable Presented at the ICML 2020 Workshop on Uncertainty and Ro- of only one thing, outlier detection. We propose a method bustness in Deep Learning. Copyright 2020 by the author(s). Outlier Detection through Null Space Analysis of Neural Networks using Null Space Analysis (NuSA) that is capable of en- coding outlier detection directly into an ANN, thereby in- 8λ > 0 W (xn + λz) = (2) creasing the competency awareness of the ANN. W xn + λW z = W xn = yn: In our NuSA approach, we leverage the null space associ- which implies that there are infinitely many possibilities ated with the weight matrix of a layer in the ANN to per- for mapping unknown inputs to apparently meaningful out- form outlier detection concurrently with classification us- puts. In a multi-layer network, there are many layers of ing the same network. As a refresher, the null space of a matrix-vector multiplications which can be expressed as matrix is defined as: n N (A) = fz 2 R jAz = 0g ; (1) xn;1 = σ1(W1xn) ! xn;2 where A is a linear mapping. In other words, the null space xn;2 = σ2(W2xn;1) !··· xn;Nh of matrix, A, defines the region of the input space that maps xn;Nh = σNH (WNH xn;NH −1) to zero. The motivation to leverage the null space is related to the study of adversarial samples such as those shown where x is an input, NH is the number of hidden layers, in (Nguyen et al., 2014) and to experiences in handwrit- and σh; h = 1; 2;:::;NH are nonlinear functions. More ten word recognition in the 1990s (Chiangand P. D. Gader, succinctly 1997; Gader et al., 1997). The NuSA approach is a par- tial, but important, solution to the problem of competency awareness of ANNs; it is unlikely that there is one method fX (xn) = (3) alone that can alleviate this problem. Outlier samples de- σNh (WNh σNh−1(WNh−1σ(··· σ1(W1x) ··· ))) rived from the null space of a weight matrix of a network can theoretically have infinite magnitude and added to non- Consider a non-outlier sample x ∼ p . Let z 2 outlier samples with no deviation in the ANN output. This D RK and z = σ (W σ(··· σ (W x))) for any h = statement will be made more precise in the next section. h h−1 h−1 1 1 1; 2;:::;Nh. If zh 2 N (Wh) then Whzh = 0 so fX (x+z) = fX (x). Therefore, any input sample, z, that is 2. Null Space Analysis of ANNs the inverse image of a sample, zh of the null space of any of the weight matrices, W , can be added to a legitimate Each layer of an ANN is the projection of a sample by a h input, x and the output will not change. This is one source weight matrix followed by the application of an activation of “adversarial” examples that can cause outliers that are function. This sample is either the input data point at the nothing like any of the true samples to have high outputs first layer or the output of a previous layer for any hidden for at least one class. layers. Every weight matrix has an associated null space, although some may be empty. However, any weight matrix The description above outlines an interesting and unique that projects into a lower dimensional space (i.e., input di- class of adversarial samples for a network. Some adversar- mensionality is larger than the output/subsequent layer di- ial samples are defined via stability, a small change in an mensionality) has a non-empty null space. The overwhelm- input sample (e.g., imperceptible to a human) can produce ing majority of all commonly used deep learning architec- a large change in output. The NuSA approach is focused on tures consist of several subsequent layers that project into the opposite problem, i.e., large changes in an input sample lower dimensional spaces and, thus, have an associated se- can produce a small (or, no) changes in output. A human ries of non-empty null spaces. Any sample with a non-zero would easily disregard this heavily corrupted sample as an projection into any null space in the network, cannot be dis- outlier but, as pointed out in (Chiangand P. D. Gader, 1997; tinguished by the network from those samples without that Gader et al., 1997; Nguyen et al., 2014), the network would null space component. not be able to distinguish the sample from the valid sample. An example of this is shown in Figure 1. For clarity, the null space concept is first illustrated using a simple one-layer ANN with K inputs and M outputs and K > M. It is then illustrated for for multi-layer networks. Assume X = fx1; x2;:::; xN g is a collection of samples drawn from the joint distribution of the classes of interest, that is, xn ∼ pD and that a network, fX has been trained to approximate a function f with yn = f (xn). In the one-layer case, yn = W xn. Since K > M, there is a Figure 1. Left: Original image. Center: Additive null space noise. non-trivial null space N (W ) of dimension K − M. If z 2 Right: Final image, indistinguishable from original image accord- N (W ), then W z = 0 so ing to the network the noise in the center column is sampled from.

Outlier Detection Through Null Space Analysis of Neural Networks

Fastlof: an Expectation-Maximization Based Local Outlier Detection Algorithm

Incremental Local Outlier Detection for Data Streams

A Two-Level Approach Based on Integration of Bagging and Voting for Outlier Detection

Accelerating the Local Outlier Factor Algorithm on a GPU for Intrusion Detection Systems

Arxiv:1904.06034V1 [Stat.ML] 12 Apr 2019 Sity Exactly for a Test Instance

A Comparative Evaluation of Semi- Supervised Anomaly Detection Techniques

Isolation Forest and Local Outlier Factor for Credit Card Fraud Detection System

Anomaly Detection Using Signal Segmentation and One-Class Classification in Diffusion Process of Semiconductor Manufacturing

Anomaly Detection Using Dictionary Learning

A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams

Unsupervised Anomaly Detection Approach for Time-Series in Multi-Domains Using Deep Reconstruction Error

Open Cheng-Kai Chen Thesis Final.Pdf