Deep Clustering for Improved Inter-Cluster Separability and Intra-Cluster Homogeneity with Cohesive Loss

IEICE TRANS. INF. & SYST., VOL.E104–D, NO.5 MAY 2021 776 LETTER Deep Clustering for Improved Inter-Cluster Separability and Intra-Cluster Homogeneity with Cohesive Loss Byeonghak KIM†, Murray LOEW††a), David K. HAN†††b), Nonmembers, and Hanseok KO††††c), Member SUMMARY To date, many studies have employed clustering for the be used for data representation based on deep learning has classification of unlabeled data. Deep separate clustering applies several been proposed. deep learning models to conventional clustering algorithms to more clearly An autoencoder is an unsupervised feature ex- separate the distribution of the clusters. In this paper, we employ a convolutional autoencoder to learn the features of input images. Following this, traction tool used in deep clustering. Representative k-means clustering is conducted using the encoded layer features learned autoencoder-based deep clustering algorithms include auto- by the convolutional autoencoder. A center loss function is then added to encoder-based data clustering (ABDC) [8], deep embed- aggregate the data points into clusters to increase the intra-cluster homo- ded clustering (DEC) [9], improved deep embedded clus- geneity. Finally, we calculate and increase the inter-cluster separability. We combine all loss functions into a single global objective function. Our tering (IDEC) [10], discriminatively boosted clustering new deep clustering method surpasses the performance of existing cluster- (DBC) [11], and deep embedded regularized clustering ing approaches when compared in experiments under the same conditions. (DEPICT) [12]. They are all based on the idea that the neu- key words: separate clustering, convolutional autoencoder, intra-cluster ral network learns the features that are suitable for cluster- homogeneity, inter-cluster separability ing. In addition, deep embedded clustering with data augmentation (DEC-DA) [13] employs random rotation, crop- 1. Introduction ping, shearing, and shifting on the data to generalize the model. Yet, these techniques do not exhibit strong inter- In supervised learning, data classification is a particularly cluster separability nor robust intra-cluster homogeneity, important task. However, individually labeling data points leaving a room for improvements. ff requires significant time and e ort, and it is often impossi- In this paper, we propose a new clustering algorithm ble to fully label datasets for research applications. To over- with the ability to separate clusters more effectively than come this problem, clustering via unsupervised learning has existing deep clustering algorithms by increasing the inter- ff been proposed and is widely utilized. Clustering e ectively cluster separability and the intra-cluster homogeneity. Our groups unlabeled data based on specific criteria of similar- deep separate clustering algorithm focuses on making the ity and can automatically extract semantic information that scattered samples characterized by Gaussian distribution humans cannot abstract. more cohesive between very similar data but, when the sim- Many data-mining researchers have investigated vari- ilarity is low, it separates the data points belonging to more ous types of clustering. Both hard and soft clustering is pos- corresponding clusters. This process can also be effective sible depending on whether an observation point belongs when processing cluster-overlapping data. Our proposed to one or to multiple clusters. Hard clustering includes k- method greatly improves on the performance of existing means clustering [1], k-medoid clustering [2], density-based deep clustering algorithms when tested on public datasets. spatial clustering of applications with noise (DBSCAN) [3], In Sect. 2, the proposed method is described. The pro- hierarchical clustering [4], random binary pattern of patch cess and results of the experiment are presented in Sect. 3. clustering (RBPPC) [5]. Soft clustering includes Gaussian Finally, Sect. 4 provides the conclusions. mixture model-based clustering [6] and fuzzy clustering [7]. Recently, a deep clustering method that extracts features to 2. Proposed Deep Clustering with Cohesive Loss Manuscript received October 26, 2020. Manuscript revised December 31, 2020. In this section, we propose an innovative deep clustering al- Manuscript publicized January 28, 2021. † gorithm based on the separation between clusters using four The author is with the Dept. of Visual Information Processing, loss functions for global optimization. First, deep features Korea University, Seoul, 02841, Korea. ††The author is with the Dept. of Biomedical Engineering, are learned while reconstructing unlabeled data with a con- George Washington University, Washington DC, USA. volutional autoencoder (CAE). Upon the completion of pre- †††The author is with the Dept. of Electrical and Computer En- training of the autoencoder, the second stage is focused on gineering, Drexel University, Philadelphia, PA USA. clustering of deep features using the distances between the †††† The author is with the School of Electrical Engineering, Korea features assigned to the same cluster and the distances be- University, Seoul, 02841, Korea. tween data points assigned to different clusters in an end-to- a) E-mail: [email protected] b) E-mail: [email protected] end manner. c) E-mail: [email protected] (Corresponding author) DOI: 10.1587/transinf.2020EDL8138 Copyright c 2021 The Institute of Electronics, Information and Communication Engineers LETTER 777 + − µ 2 −1/ (1 zi j ) q = (3) ij + − µ 2 −1 j (1 zi j ) The target distribution pij can be expressed based on qij: QT q = i ij pij (4) T Fig. 1 The structure of the feature extraction process with a convolu- j Q i qij tional autoencoder for the Fashion-MNIST dataset. The size of the embedded features is smaller than that of input X. The learned features can be where Q = (eqij − 1)/(e − 1) and T > 0. We set this T value used for clustering. T to 3. The range of Q is 0 to 1 because qij has a value from 0 to 1. In Eq. (4), the exponential function constructs p by adding nonlinearity to q. Therefore, the target distribution p enhances the prediction by giving more emphasis to the 2.1 Feature Extraction with a Convolutional Autoencoder cluster assignments with high probability in q. In addition, the loss is regularized to prevent distortion of the entire fea- An autoencoder is a deep learning approach that learns the ture space by different contributions to the loss depending features of unlabeled data and that is widely employed as a on the density of the cluster, q . Finally, the clustering feature extractor. The structure is shown in Fig. 1. i ij loss function is In training set X = {x ∈ RD}m , x denotes the i-th i i=1 i training data point from m data points, and D is the dimen- pij L = D (P Q) = p log (5) sion of x . The autoencoder loss function is c KL ij q i i j ij L = X − X2 (1) ae 2.3 The Proposed Discriminative Cluster Loss Function In addition, because the autoencoder learns features in order to reconstruct input X as accurately as possible us- The final goal of clustering is to assign data points with a ing output X through the encoder and decoder, autoencoder strong similarity to the same cluster with a certain similar- loss is also referred to as reconstruction loss. Equation (1) ity measurement. Therefore, to achieve robust clustering, can thus be expressed as follows: both the intra-cluster homogeneity and the inter-cluster separability should be increased. In this study, LW is used to 1 m 2 minimize the intra-cluster distance and L reduces the inter- Lr = xi − x (2) B i=1 i 2 m cluster proximity. LW is a function that represents the vari- The embedded features learned by Eq. (2) become the ations between the cluster centroids and the inner cluster input for the subsequent clustering algorithm described in points, as in [16]. LW increases the homogeneity of the inner Sects. 2.2 and 2.3. cluster points by making the data points in the same cluster gather around the centroid. LW is represented by (6): 2.2 Loss Function for Deep Embedded Clustering 1 m 2 LW = zi − µ (6) 2 i=1 yi 2 Autoencoder-based clustering is also known as deep embedded clustering. The main idea of deep embedded cluster- where yi is the predicted cluster label for the i-th sample and µ ∈ Rd µ ing algorithms is to perform clustering the features obtained yi is the centroid of the yi-th cluster. yi can be up- from autoencoders. As mentioned earlier, [13] and also im- dated for an iterative training process. LB is a function cal- prove model generalization using data augmentation tech- culated from the inter-cluster cosine distance. Thus, smaller niques. For this reason, we employ the data augmentation. LB means larger inter-cluster distances. LB can be derived The features in deep embedded clustering generally as follows: have smaller dimensions than the input and output data. n−1 n−1 µ · µ 1 1 j k In this paper, we perform clustering on embedded fea- LB = · · log ReLU +1+ d m µ µ = { ∈ R } 2 nC2 j 2 k 2 tures Z zi i=1, where d is the embedded fea- j=0 k=0 ture size and m is the number of data points. The clus- (7) tering loss function is calculated by applying the Student’s µ µ t-distribution [14] results obtained by performing soft deep where n denotes the number of clusters and j and k are the clustering using the network in Sect. 2.1 to Kullback-Leibler j-th and k-th centroids of the embedded features from the divergence (KLD) [15]. CAE, respectively. nC2 represents the number of all combi- In more detail, k-means clustering is performed on Z nations when pairing two in n clusters. = {µ ∈ Rd}n µ to M j j=1, where j is the j-th element of n An ReLU activation function is used to prevent a nega- centroids.

Deep Clustering for Improved Inter-Cluster Separability and Intra-Cluster Homogeneity with Cohesive Loss

Deep Learning Models for Spatio-Temporal Forecasting and Analysis

Hierarchical Fuzzy Support Vector Machine (SVM) for Rail Data Classification

Multi-View Fuzzy Clustering with the Alternative Learning Between Shared Hidden Space and Partition

Dynamic Topology Model of Q-Learning LEACH Using Disposable Sensors in Autonomous Things Environment

Parameter Specification for Fuzzy Clustering by Q-Learning

Fuzzy Clustering Based Image Denoising and Improved Support Vector Machine (ISVM) Based Nearest Target for Retina Images

An Energy-Efficient Spectrum-Aware Reinforcement Learning-Based Clustering Algorithm for Cognitive Radio Sensor Networks

A Hybrid Technique Based on Fuzzy-C-Means and SVM for Detection of Brain Tumor in MRI Images Dipali B.Birnale¹, Prof .S

Deep Adaptive Fuzzy Clustering for Evolutionary Unsupervised Representation Learning Dayu Tan, Zheng Huang, Xin Peng, Member, IEEE, Weimin Zhong, and Vladimir Mahalec

Intrusion Detection Using Fuzzy Clustering and Artificial Neural

Developing Support Vector Machine with New Fuzzy Selection for the Infringement of a Patent Rights Problem

A Unified Deep Learning Framework for Text Data Mining Using Deep Adaptive Fuzzy Clustering