Nearest-Neighbour-Induced Isolation Similarity and Its Impact on Density-Based Clustering

The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) Nearest-Neighbour-Induced Isolation Similarity and Its Impact on Density-Based Clustering Xiaoyu Qin Kai Ming Ting Ye Zhu Vincent CS Lee Monash University Federation University Deakin University Monash University Victoria, Australia 3800 Victoria, Australia 3842 Victoria, Australia 3125 Victoria, Australia 3800 [email protected] [email protected] [email protected] [email protected] Abstract This paper identifies shortcomings of tree-based method A recent proposal of data dependent similarity called Isola- currently employed in inducing Isolation Similarities (Ting, tion Kernel/Similarity has enabled SVM to produce better Zhu, and Zhou 2018). Instead, we investigate a different classification accuracy. We identify shortcomings of using a method to induce the data dependent Isolation Similarity, tree method to implement Isolation Similarity; and propose and evaluate its clustering performance using DBSCAN (Es- a nearest neighbour method instead. We formally prove the ter et al. 1996) in comparison with the two existing improve- characteristic of Isolation Similarity with the use of the pro- ments of density-based clustering, i.e., DScale (Zhu, Ting, posed method. The impact of Isolation Similarity on density- and Angelova 2018) and DP (Rodriguez and Laio 2014). based clustering is studied here. We show for the first time The rest of the paper is organised as follows. We reiterate that the clustering performance of the classic density-based Isolation Kernel, identify the shortcomings of using the tree- clustering algorithm DBSCAN can be significantly uplifted to surpass that of the recent density-peak clustering algorithm based method to induce Isolation Similarity, provide the pro- DP. This is achieved by simply replacing the distance mea- posed alternative that employs a nearest neighbour method, sure with the proposed nearest-neighbour-induced Isolation the lemma and proof of the characteristic of Isolation Sim- Similarity in DBSCAN, leaving the rest of the procedure un- ilarity, and the investigation in using Isolation Similarity in changed. A new type of clusters called mass-connected clus- density-based clustering. ters is formally defined. We show that DBSCAN, which de- The descriptions of existing works are framed in order to tects density-connected clusters, becomes one which detects clearly differentiate from the contributions we made here. mass-connected clusters, when the distance measure is re- placed with the proposed similarity. We also provide the condition under which mass-connected clusters can be detected, Isolation Kernel while density-connected clusters cannot. Isolation Kernel/Similarity is first proposed by (Ting, Zhu, and Zhou 2018) as a new similarity which can adapt to den- Introduction sity structure of the given dataset, as opposed to commonly Similarity measure is widely used in various data mining and used data independent kernels such as Gaussian and Lapla- machine learning tasks. In clustering analysis, its impact to cian kernels. the quality of result is critical(Steinbach, Ertoz,¨ and Kumar In the classification context, Isolation Kernel has been 2004). A recent proposal of data dependent similarity called shown to be an effective means to improve the accuracy of Isolation Kernel has enabled SVM to produce better clas- SVM, especially in datasets which have varied densities in sification accuracy by simply replacing the commonly used the class overlap regions (Ting, Zhu, and Zhou 2018). This is data independent kernel (such as Gaussian kernel) with Iso- achieved by simply replacing the commonly used data inde- lation Kernel (Ting, Zhu, and Zhou 2018). This is made pos- pendent kernel such as Gaussian and Laplacian kernels with sible on datasets of varied densities because Isolation Kernel the Isolation Kernel. is adaptive to local data distribution such that two points in In the context of SVM classifiers, Isolation Kernel (Ting, a sparse region are more similar than two points of equal Zhu, and Zhou 2018) has been shown to be more effective inter-point distance in a dense region. Despite this success, than existing approaches such as distance metric learning the kernel characteristic has not been formally proven yet. (Zadeh, Hosseini, and Sra 2016; Wang and Sun 2015), multi- This paper extends this line of research by investigating a ple kernel learning (Rakotomamonjy et al. 2008; Gonen¨ and different implementation of Isolation Similarity. We provide Alpaydin 2011) and Random Forest kernel (Breiman 2000; a formal proof of the characteristic of the Isolation Similar- Davies and Ghahramani 2014). ity for the first time since its introduction. In addition, we The characteristic of Isolation Kernel is akin to one aspect focus on using Isolation Similarity to improve the clustering of human-judged similarity as discovered by psychologists performance of density-based clustering. (Krumhansl 1978; Tversky 1977), i.e., human will judge the Copyright c 2019, Association for the Advancement of Artificial two same Caucasians as less similar when compared in Eu- Intelligence (www.aaai.org). All rights reserved. rope (which have many Caucasians) than in Asia. 4755 We restate the definition and kernel characteristic (Ting, Zhu, and Zhou 2018) below. d Let D = fx1; : : : ; xng; xi 2 R be a dataset sam- pled from an unknown probability density function xi ∼ F . Let H (D) denote the set of all partitions H that are admissible under D where each isolating partition θ 2 H isolates one data point from the rest of the points in a random subset D ⊂ D, and jDj = . (a) Axis-parallel splitting (b) NN partitioning d Definition 1 For any two points x; y 2 R , Isolation Figure 1: Examples of two isolation partitioning mecha- Kernel of x and y wrt D is defined to be the expectation nisms: Axis-parallel versus nearest neighbour (NN). On a taken over the probability distribution on all partition- dataset having two (uniform) densities, i.e., the right half ing H 2 H (D) that both x and y fall into the same has a higher density than the left half. isolating partition θ 2 H: K (x; yjD) = EH (D)[I(x; y 2 θ j θ 2 H)] (1) partitions generally satisfy the requirement of small parti- where I(B) is the indicator function which outputs 1 if tions in dense region and large partitions in sparse region. B is true; otherwise, I(B) = 0. However, it produced some undesirable effect, i.e., some partitions are always overextended for the first few splits K In practice, is estimated from a finite number of close to the root of an imbalanced tree1. These are mani- H 2 H (D); i = 1; : : : ; t partitionings i as follows: fested as elongated rectangles in Figure 1(a). t While using balanced trees can be expected to overcome 1 X K (x; yjD) = (x; y 2 θ j θ 2 H ) (2) this problem, the restriction to hyper-rectangles remains due t I i i=1 to the use of axis-parallel splits. To overcome these shortcomings of isolation trees, we The characteristic of Isolation Kernel is: two points propose to use a nearest neighbour partitioning mecha- in a sparse region are more similar than two points nism which creates a Voronoi diagram (Aurenhammer 1991) of equal inter-point distance in a dense region, i.e., where each cell is an isolating partition (i.e., isolating one 0 0 Characteristic of K : 8x; y 2 XS and 8x ; y 2 XT point from the rest of the points in the given sample.) An ex- 0 0 such that k x − y k = k x − y k, K satisfies the ample is provided in Figure 1(b). Note that these partitions following condition: also satisfy the requirement of small partitions in the dense region and large partitions in the sparse region. But they do 0 0 K (x; y) > K (x ; y ) (3) not have the undesirable effect of elongated rectangles. We provide our implementation in the next section. where XS and XT are two subsets of points in sparse and dense regions of Rd, respectively; and k x − y k is the distance between x and y. Nearest neighbour-induced To get the above characteristic, the required property Isolation Similarity of the space partitioning mechanism is to create large Instead of using trees in its first implementation (Ting, Zhu, partitions in the sparse region and small partitions in and Zhou 2018), we propose to implement Isolation Simi- the dense region such that two points are more likely to larity using nearest neighbours. fall into a same partition in a sparse region than two Like the tree method, the nearest neighbour method also points of equal inter-point distance in a dense region. produces each H model which consists of isolating partitions θ, given a subsample of points. Rather than rep- resenting each isolating partition as a hyper-rectangle, it is Shortcomings of represented as a cell in a Voronoi diagram (Aurenhammer tree-based isolation partitioning 1991), where the boundary between two points is the equal Isolation Kernel (Ting, Zhu, and Zhou 2018) employs isola- distance from these two points. tion trees or iForest (Liu, Ting, and Zhou 2008) to measure While the Voronoi diagram is nothing new, its use in mea- the similarity of two points because its space partitioning suring similarity is new. mechanism produces the required partitions which have vol- Using the same notations as used earlier, H is now a umes that are monotonically decreasing wrt the density of Voronoi diagram, built by employing points in D, where the local region. each isolating partition or Voronoi cell θ 2 H isolates one Here we identify two shortcomings in using isolation trees 1Imbalanced trees are a necessary characteristic of isolation to measure Isolation Similarity, i.e., each isolation tree (i) trees (Liu, Ting, and Zhou 2008) for their intended purpose of employs axis-parallel splits; and (ii) is an imbalanced tree.

Nearest-Neighbour-Induced Isolation Similarity and Its Impact on Density-Based Clustering

Engineering Applications of Artificial Intelligence Potential, Challenges and Future Directions for Deep Learning in Prognostics

Implicit Kernel Learning

Exploring Induced Pedagogical Strategies Through a Markov Decision Process Frame- Work: Lessons Learned

Multiple-Kernel Learning for Genomic Data Mining and Prediction

Clustering on Optimized Features

Fredholm Multiple Kernel Learning for Semi-Supervised Domain Adaptation

Study of Kernel Machines Towards Brain-Computer Interfaces Tian Xilan

Multiclass Multiple Kernel Learning

Bridging Deep and Kernel Methods

A Multiple Kernel Density Clustering Algorithm for Incomplete Datasets in Bioinformatics Longlong Liao1,2, Kenli Li3*, Keqin Li4, Canqun Yang1,2 and Qi Tian5

LMKL-Net: a Fast Localized Multiple Kernel Learning Solver Via Deep Neural Networks

Large Scale Multiple Kernel Learning