Generalized Autoencoder: a Neural Network Framework for Dimensionality Reduction

Generalized Autoencoder: A Neural Network Framework for Dimensionality Reduction Wei Wang1, Yan Huang1, Yizhou Wang2, Liang Wang1 1Center for Research on Intelligent Perception and Computing, CRIPAC Nat’l Lab of Pattern Recognition, Institute of Automation Chinese Academy of Sciences 2Nat’l Eng. Lab for Video Technology, Key Lab. of Machine Perception (MoE), Sch’l of EECS, Peking University, Beijing, China fwangwei, yhuang, [email protected], [email protected] Abstract The autoencoder algorithm and its deep version as traditional dimensionality reduction methods have achieved great success via the powerful representability of neural networks. However, they just use each instance to reconstruct itself and ignore to explicitly model the data relation so as to discover the underlying effective manifold structure. In this paper, we propose a dimensionality reduction method by manifold learning, which iteratively explores data rela- (a) (b) tion and use the relation to pursue the manifold structure. The method is realized by a so called “generalized autoen- Figure 1. Traditional autoencoder and the proposed generalized coder” (GAE), which extends the traditional autoencoder autoencoder. (a) In the traditional autoencoder, xi is only used 0 2 in two aspects: (1) each instance xi is used to reconstruct to reconstruct itself and the reconstruction error jjxi − xijj just measures the distance between x and x0 . (b) In the generalized a set of instances fxjg rather than itself. (2) The recon- i i 0 2 autoencoder, xi involves in the reconstruction of a set of instances struction error of each instance (jjxj −xijj ) is weighted by 0 2 a relational function of x and x defined on the learned fxj ; xk; :::g. Each reconstruction error sij jjxj − xijj measures a i j weighted distance between x and x0 . manifold. Hence, the GAE captures the structure of the j i data space through minimizing the weighted distances between reconstructed instances and the original ones. The pace”. Various methods of dimensionality reduction have generalized autoencoder provides a general neural network been proposed to discover the underlying manifold struc- framework for dimensionality reduction. In addition, we ture, which plays an important role in many tasks, such as propose a multilayer architecture of the generalized autoen- pattern classification [9] and information visualization [16]. coder called deep generalized autoencoder to handle highly Principal Component Analysis (PCA) is one of the complex datasets. Finally, to evaluate the proposed method- most popular linear dimensionality reduction techniques s, we perform extensive experiments on three datasets. The [10][17]. It projects the original data onto its principal di- experiments demonstrate that the proposed methods achieve rections with the maximal variance, and does not consider promising performance. any data relation. Linear Discriminant Analysis (LDA) is a supervised 1. Introduction method to find a linear subspace, which is optimal for dis- criminating data from different classes [2]. Marginal Fisher Real-world data, such as images of faces and digits, usu- Analysis (MFA) extends LDA by characterizing the intra- ally have a high dimension which leads to the well-known class compactness and interclass separability [19]. These curse of dimensionality in statistical pattern recognition. two methods use class label information as a weak data re- However, high dimensional data usually lies in a lower di- lation to seek a low-dimensional separating subspace. mensional manifold, so-called “intrinsic dimensionality s- However, the low-dimensional manifold structure of re- 1490 al data is usually very complicated. Generally, just using ent reconstruction sets and weights. Inspired by existing a simple parametric model, such as PCA, it is not easy to dimensionality reduction ideas, we derive six implementa- capture such structures. Exploiting data relations has been tions of GAE including both supervised and unsupervised proved to be a promising means to discover the underly- models. Furthermore, we propose a deep GAE (dGAE), ing structure. For example, ISOMAP [15] learns a low- which extends GAE to multiple layers. dimensional manifold by retaining the geodesic distance be- The rest of the paper is organized as follows. In Section tween pairwise data in the original space. In Locally Linear 2, we first introduce the formulation of the generalized au- Embedding (LLE) [12], each data point is a linear combi- toencoder and its connection to graph embedding, then pro- nation of its neighbors. The linear relation is preserved in pose the generalized autoencoder as a general framework of the projected low-dimensional space. Laplacian Eigenmap- dimensionality reduction and derive its six implementation- s (LE) [1] purses a low-dimensional manifold by minimiz- s. In Section 2.3, we illustrate deep generalized autoencoder ing the pairwise distance in the projected space weighted by as a multilayer network of GAE. The experimental results the corresponding distance in the original space. They have on three datasets are presented in Section3. Finally, we dis- a common weakness of suffering from the out-of-sample cuss the proposed method and conclude the paper in Section problem. Neighborhood Preserving Embedding (NPE) [3] 4. and Locality Preserving Projection (LPP) [4] are linear ap- proximations to LLE and LE to handle the out-of-sample 2. Dimensionality Reduction by Manifold problem, respectively. Learning Although these methods exploit local data relation to learn manifold, the relation is fixed and defined on the o- 2.1. The method riginal high-dimensional space. Such relation may not be Algorithm 1 shows the proposed method. (1) We initial- valid on the manifold, e.g. the geodesic nearest neighbor on ize the data relation by computing the data pairwise sim- a manifold may not be the Euclidian nearest neighbor in the ilarity/distance in the original high dimension space, then original space. determine a “relation/reconstruction” set Ωi for each data The autoencoder algorithm [13] belongs to a special fam- point xi. For example, the set can be composed of k-nearest ily of dimensionality reduction methods implemented using neighbors. (2) Local manifold structure around each data is artificial neural networks. It aims to learn a compressed learned from its reconstruction set using the proposed “gen- representation for an input through minimizing its recon- eralized autoencoder” (GAE) introduced below. (3) On the struction error. Fig.1(a) shows the architecture of an autoen- learned manifold, data relation is updated according to the coder. Recently, the autoencoder algorithm and its exten- pairwise similarity/distance defined on the hidden represen- sions [8][18][11] demonstrate a promising ability to learn tation derived from the GAE. Step (2) and (3) are iterated meaningful features from data, which could induce the “in- until convergence or the maximum iteration number being trinsic data structure”. However, these methods just con- reached. sider self-reconstruction and ignore to explicitly model the data relation. 2.2. Generalized Autoencoder In this paper, we propose a method of dimension re- The generalized autoencoder consists of two parts, an duction by manifold learning, which extends the tradition- encoder and a decoder. As shown in Fig.1 (b), the encoder al autoencoder to iteratively explore data relation and use dx maps an input xi 2 R to a reduced hidden representation the relation to pursue the manifold structure. The proposed dy yi 2 R by a function g(), method is realized by a so called “generalized autoencoder” (GAE). As shown in Fig.1, it differs from the traditional au- yi = g(W xi) (1) toencoder in two aspects, (i) the GAE learns a compressed representation yi for an instance xi, and builds the relation where g() is either the identity function for a linear projec- 1 between xi and other data fxj; xk:::g by using yi to recon- tion or a sigmoid function 1+e−W x for a nonlinear mapping. struct each element in the set, not just xi itself. (ii) The The parameter W is a dy × dx weight matrix. In this paper, GAE imposes a relational weight sij on the reconstruction we ignore the bias terms of the neural network for simple 0 2 error jjxj − xijj . Hence, the GAE captures the structure of notation. the data space through minimizing the weighted reconstruc- 0 dx The decoder reconstructs xi 2 R from the hidden rep- tion error. When replacing the sigmoid function in the GAE resentation yi with a linear transformation, we show that the linear GAE 0 0 x = f(W yi) (2) and the linear graph embedding [19] have similar objective i 0 functions. This indicates that the GAE can also be a general The parameter W is another dx × dy weight matrix, which framework for dimensionality reduction by defining differ- can be W T . Similar to g(), f() is either the identity function 491 Algorithm 1 Iterative learning procedure for Generalized is a constant and B is a constraint matrix to avoid a trivial Autoencoder solution 1. n Input: training set fxig1 The linear graph embedding (LGE) assumes that the 0 Parameters: Θ = (W; W ) low-dimensional representation can be obtained by a linear Notation: Ωi: reconstruction set for xi projection y = XT w, where w is the projection vector and Si: the set of reconstruction weight for xi n X = [x1; x2; :::; xn]. The objective function of LGE is fyig1 : hidden representation n 1. Compute the reconstruction weights Si from fxig and ∗ X T T 2 1 w = arg min sijjjw xi − w xjjj (6) T T determine the reconstruction set Ωi, e.g. by k-nearest w XBX w=c T i;j neighbor or w w=c 2. Minimize E in Eqn.4 using the stochastic gradient Similarly, the hidden representation of the generalized descent and update Θ for t steps autoencoder induces a low-dimensional manifold of data n 3.

Generalized Autoencoder: a Neural Network Framework for Dimensionality Reduction

Training Autoencoders by Alternating Minimization

Nonlinear Dimensionality Reduction by Locally Linear Embedding

Comparison of Dimensionality Reduction Techniques on Audio Signals

Turbo Autoencoder: Deep Learning Based Channel Codes for Point-To-Point Communication Channels

Double Backpropagation for Training Autoencoders Against Adversarial Attack

Litrec Vs. Movielens a Comparative Study

Artificial Intelligence Applied to Electromechanical Monitoring, A

Unsupervised Speech Representation Learning Using Wavenet Autoencoders Jan Chorowski, Ron J

Unsupervised Speech Representation Learning Using Wavenet Autoencoders

Dimensionality Reduction and Principal Component Analysis

Collaborative Filtering: a Machine Learning Perspective by Benjamin Marlin a Thesis Submitted in Conformity with the Requirement

13 Dimensionality Reduction and Microarray Data