Nonlinear Dimensionality Reduction for Data Visualization: an Unsupervised Fuzzy Rule-Based Approach Suchismita Das and Nikhil R

1 Nonlinear Dimensionality Reduction for Data Visualization: An Unsupervised Fuzzy Rule-based Approach Suchismita Das and Nikhil R. Pal, Fellow, IEEE Abstract—Here, we propose an unsupervised fuzzy rule-based possible. Note that, the schemes under the first category do dimensionality reduction method primarily for data visualization. not explicitly extract/summarize any information of the data. It considers the following important issues relevant to dimen- On the other hand, dimensionality reduction-based schemes, sionality reduction-based data visualization: (i) preservation of neighborhood relationships, (ii) handling data on a non-linear try to carry the information present in the original data to manifold, (iii) the capability of predicting projections for new its lower dimensional representation. Dimensionality reduction test data points, (iv) interpretability of the system, and (v) the for data visualization via projection can be achieved in many ability to reject test points if required. For this, we use a first- ways such as Principal Component Analysis (PCA) [7], Multi- order Takagi-Sugeno type model. We generate rule antecedents Dimensional Scaling (MDS) [8], and manifold learning [9]. using clusters in the input data. In this context, we also propose a new variant of the Geodesic c-means clustering algorithm. We Some of these methods are linear; for example, PCA, canoni- estimate the rule parameters by minimizing an error function cal correlation analysis [10], linear discriminant analysis [11], that preserves the inter-point geodesic distances (distances over factor analysis [12], locality preserving projections [13], which the manifold) as Euclidean distances on the projected space. We are not suitable if the data set has non-linear structures. Note apply the proposed method on three synthetic and three real- that, a special case of linear projection is feature selection world data sets and visually compare the results with four other standard data visualization methods. The obtained results show [14]–[17]. When data are projected by feature selection, the that the proposed method behaves desirably and performs better features in the reduced space maintain their original identity, than or comparable to the methods compared with. The proposed while in other cases of projections, the new features are method is found to be robust to the initial conditions. The difficult to interpret. Non-linear projections such as Sammon’s predictability of the proposed method for test points is validated projection [18] and manifold learning algorithms preserve by experiments. We also assess the ability of our method to reject output points when it should. Then, we extend this concept to some geometric properties of the data. Although the physical provide a general framework for learning an unsupervised fuzzy meaning of such features is difficult to comprehend, they model for data projection with different objective functions. To produce more useful visualization. A class of non-linear data the best of our knowledge, this is the first attempt to manifold projection methods for visualization can be categorized as learning using unsupervised fuzzy modeling. manifold learning algorithms. In a p-dimensional manifold, Index Terms—Fuzzy rules, geodesic distance, predictability, visu- each point has a local neighborhood that is homeomorphic alization, Takagi-Sugeno system (TS system). to the Euclidean space of the same dimension. In manifold learning for data visualization, the objective is to produce a I. INTRODUCTION low dimensional (usually two or three) representation of the ISUALIZATION is one of the prominent exploratory high dimensional data by preserving the local neighborhood data analysis schemes as it provides insights into the (local geometry). This helps to understand the intrinsic dimen- arXiv:2004.03922v1 [cs.LG] 8 Apr 2020 data.V We come across high dimensional data in various real- sionality of the manifold. There have been many attempts to world problems related to, as examples, finance, meteorology, manifold learning [19]–[22]. computer vision, medical imaging, multimedia information Whether linear or non-linear, dimensionality reduction meth- processing, and text mining [1]–[5]. However, plotting more ods optimize some objective function to get a low dimensional than three dimensions directly is not feasible. Data visualiza- representation. Objective functions could be either convex or tion schemes provide ways to visualize high dimensional data. non-convex. Dimensionality reduction methods having convex They can be roughly divided into two categories. Methods objectives do not suffer from getting stuck at local optima. that fall under the first category provide some mechanism to However, the study in [4] suggests that for dimensionality display more than two dimensions graphically. The Chernoff reduction, convex methods do not necessarily perform better faces [6] is an example of this category. The second category than non-convex methods. This study also claims that non- reduces the dimensionality of the data to two or three. They convex dimensionality reduction methods, like multi-layer aim to represent the data in a lower dimension keeping auto-encoders [23], perform well. Another important aspect the ‘relevant’ information of the original data as intact as is whether the dimensionality reduction method is equipped with predictability or not. Methods that are parametric such Suchismita Das and Nikhil R. Pal are with the Electronics and Communica- as PCA and multi-layer auto-encoder based methods provide tion Sciences Unit, Indian Statistical Institute, Calcutta, 700108, India e-mail: ( suchismitasimply,nrpal59 @gmail.com). a direct mapping from the high dimensional space to the { Manuscript received XXXX} XX, 2020; revised XXXX XX, XXXX. low dimensional space. Thus the trained parametric model 2 can produce the lower dimensional representation for any test 8) Finally, the proposed method performs better or compa- points. With non-parametric methods, such predictions for new rable to several non-fuzzy methods. points are not possible. For example, methods such as local The rest of the paper is organized as follows. In section II, linear embedding (LLE) [19], isometric mapping (ISOMAP) we give a short literature review of dimensionality reduction [20] and t-distributed stochastic neighbor embedding (t-SNE) methods for data visualization. Section III elaborates the [24] are non-parametric and as such do not have predictabil- proposed method. Section IV discusses the experiments and ity. However, in [25] authors proposed some out of sample results. We finally conclude in section V. extension of methods like LLE and ISOMAP. Fuzzy rule-based models are parametric models that are extensively used in different machine learning tasks such as control, II. LITERATURE REVIEW classification, and forecasting [26]–[28]. They learn a function Let X = x = (x , x , , x ) dh : i { i i1 i2 ··· idh ∈ R ∈ from a given set of training points and directly predict outputs 1, 2, , n be the input data set, where, n is the number { ··· }} for test points. Fuzzy rule-based systems can handle non-linear of instances and dh is the dimension of the data. To visualize relationship between input and output. Moreover, they are the data, it will be mapped to a dl-dimensional space, where easy to understand and develop. They provide systems that dl < dh. Let the lower dimensional data be represented by are “explainable”/“comprehensible” at least to some extent. Y = y = (y , y , , y ) dl : i 1, 2, , n . { i i1 i2 ··· idl ∈ R ∈ { ··· }} They store the knowledge as a set of easy-to-interpret fuzzy Dimensionality reduction methods estimate a map from Rdh rules. For these characteristics, fuzzy rule-based systems seem to Rdl keeping some relevant information of X as intact to be a suitable candidate for implementing dimensionality as possible in Y. Generally, for visualization purpose, dl is reduction based data visualization models. However, the lit- chosen as two or three. Other than visualization, dimension- erature is not rich in this area. Apart from the work in ality reduction methods are also applied in data de-noising, [29] we could not find any investigation employing fuzzy compressing, extracting suitable features for classification, rule-based systems for data visualization applications through clustering and so on [2], [4], [31]. We discuss here only the dimensionality reduction. methods which attempt to preserve the structure of the data and Here, we have proposed a fuzzy rule-based dimensionality aid in visualization. The methods aiming at other applications reduction method for visualization of data. We have used a first are not discussed here. order Takagi Sugeno (TS) [26] type model. In the proposed Dimensionality reduction methods could be divided into two model, a fuzzy rule represents a small region of the input space groups: linear and non-linear. Principal component analysis and its associated output region is represented by a hyper- (PCA), classical multi-dimensional scaling (MDS) are among plane. The rule antecedents model the local geometry of the extensively used linear methods [31]. They focus on preserving input data while the consequents model the lower dimensional the large distances of the original space in the obtained lower representation of the input. The rule parameters are learned by dimensional space. However, for data sets where points lie near minimizing an objective function that preserves approximate or on a non-linear

Nonlinear Dimensionality Reduction for Data Visualization: an Unsupervised Fuzzy Rule-Based Approach Suchismita Das and Nikhil R

Nonlinear Dimensionality Reduction by Locally Linear Embedding

Comparison of Dimensionality Reduction Techniques on Audio Signals

Litrec Vs. Movielens a Comparative Study

Unsupervised Speech Representation Learning Using Wavenet Autoencoders Jan Chorowski, Ron J

Unsupervised Speech Representation Learning Using Wavenet Autoencoders

Dimensionality Reduction and Principal Component Analysis

Collaborative Filtering: a Machine Learning Perspective by Benjamin Marlin a Thesis Submitted in Conformity with the Requirement

13 Dimensionality Reduction and Microarray Data

Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization

Collaborative Filtering: a Machine Learning Perspective Chapter 6: Dimensionality Reduction

Accent Transfer with Discrete Representation Learning and Latent Space Disentanglement

Text Comparison Using Word Vector Representations and Dimensionality Reduction