
Nonlinear Feature Extraction using Multilayer Perceptron based Alternating Regression for Classification and Multiple-output Regression Problems Ozde Tiryaki1,2 and C. Okan Sakar1 1Department of Computer Engineering, Bahcesehir University, Istanbul, Turkey 2NETAS Telecommunication Company, Kurtkoy, Istanbul, Turkey Keywords: Alternating Regression (Ar), Multiple-output Regression, Neural Networks, Kernel Canonical Correlation Analysis (Kcca), Nonlinear Dimensionality Reduction. Abstract: Canonical Correlation Analysis (CCA) is a data analysis technique used to extract correlated features between two sets of variables. An important limitation of CCA is that it is a linear technique that cannot capture nonlinear relations in complex situations. To address this limitation, Kernel CCA (KCCA) has been proposed which is capable of identifying the nonlinear relations with the use of kernel trick. However, it has been shown that KCCA tends to overfit to the training set without proper regularization. Besides, KCCA is an unsupervised technique which does not utilize class labels for feature extraction. In this paper, we propose the nonlinear version of the discriminative alternating regression (D-AR) method to address these problems. While in linear D-AR two neural networks each with a linear bottleneck hidden layer are combined using alternating regression approach, the modified version of the linear D-AR proposed in this study has a nonlinear activation function in the hidden layers of the alternating multilayer perceptrons (MLP). Experimental results on a classification and a multiple-output regression problem with sigmoid and hyperbolic tangent activation functions show that features found by nonlinear D-AR from training examples accomplish significantly higher accuracy on test set than that of KCCA. 1 INTRODUCTION a nonlinear kernel function and then apply CCA in the transformed space. Kernel CCA is capable of de- Canonical correlation analysis (CCA) (Hotelling, tecting nonlinear relationships under the presence of 1992) is a multivariate statistical analysis technique complex situations. KCCA has been used in a broad used to explore and measure the relations between range of disciplines like biology, neurology, content- two multidimensional variables. In data analysis, un- based image retrieval and natural language processing der the presence of two different input representations (Huang et al., 2009; Li and Shawe-Taylor, 2006; Sun of the same data or two data sources providing sam- and Chen, 2007; Cai and Huang, 2017; Chen et al., ples about the same underlying phenomenon, CCA 2012). is used as an unsupervised feature extraction tech- Another important limitation of CCA and KCCA nique. It aims at finding a pair of linear transforma- is that under the presence of class labels in super- tions such that the transformed variables in the lower vised learning problems, they do not utilize the class dimensional space are maximally correlated. labels for feature extraction but only target to find An important limitation of CCA is that it cannot the maximally correlated covariates of both views. explore the complex relationships between the sets Therefore, covariates explored by these unsupervised of variables because of its linearity. To address this methods preserve the correlated information at the ex- problem, kernel CCA was proposed (Akaho, 2001; pense of losing the important discriminative informa- Melzer et al., 2001; Bach and Jordan, 2003) which tion which can be helpful in separating class examples offers an alternative solution using a method known from each other. as the kernel trick (Scholkopf,¨ 2000). The main idea In this paper, we propose the nonlinear version of of KCCA is to map the original low-dimensional in- the discriminative alternating regression (D-AR) net- put space to a high-dimensional feature space using work (Sakar and Kursun, 2017) which is based on 107 Tiryaki, O. and Sakar, C. Nonlinear Feature Extraction using Multilayer Perceptron based Alternating Regression for Classification and Multiple-output Regression Problems. DOI: 10.5220/0006848901070117 In Proceedings of the 7th International Conference on Data Science, Technology and Applications (DATA 2018), pages 107-117 ISBN: 978-989-758-318-6 Copyright © 2018 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved DATA 2018 - 7th International Conference on Data Science, Technology and Applications the alternating regression (AR) method (Sakar et al., first frame of the video clip (the neutral facial ex- 2014b). The AR approach is first described in (Wold, pression) and the corresponding last frame (the peak 1966) and its neural network adaptations have been frame of the emotion). Each sample in this repre- applied in (Lai and Fyfe, 1998), (Pezeshki et al., sentation has 4096 (64 64) features (pixels). The 2003) and (Hsieh, 2000) to extract robust CCA co- second view consists of× the geometric set of features variates. In the previously proposed linear D-AR (Sakar et al., 2014a; Ulukaya, 2011; Karaali, 2012), (Sakar and Kursun, 2017; Sakar et al., 2014b), two which are constituted by subtracting the coordinates neural networks each with a linear bottleneck hidden of landmark points of the neutral face expression from layer are trained to learn both class labels and co- the coordinates of the landmark points of the target variate outputs using alternating regression approach. expression. The feature vector in the second view Having both class labels and covariate outputs in the consists of 134 features obtained from 67 landmark output layer improves the discriminative power of the points, each of which represented with x and y coor- extracted features. Besides, feature extraction with- dinates. out the use of sensitive sample covariance matrices The Residential Building dataset (Rafiei and makes the network more robust to outliers (Sakar and Adeli, 2015) is one of the most recent regression Kursun, 2017). The non-linear version of D-AR has datasets in UCI Machine Learning Repository (Asun- a nonlinear activation function in the hidden layers cion and Newman, 2007). The dataset consists of 372 of the alternating multilayer perceptrons (MLP). Co- instances with 31 features which are collected under 2 variate outputs are alternated between the correspond- different views. While the first view containing phys- ing MLPs in order to maximize the correlation be- ical and financial values belonging to the project has tween two views. In our experiments, we compare the 12 features, the second view containing general eco- classification and regression performance of the fea- nomic variables and indices consists of 19 features. tures extracted by the proposed nonlinear D-AR with Residential building dataset is a multiple output re- that of linear D-AR, CCA, and KCCA on publicly gression problem that contains two output variables available emotion recognition and residential building which are construction costs and sale prices of single- datasets. We use two nonlinear activation functions, family residential apartments. In this study, we con- sigmoid and hyperbolic tangent, in the hidden layer struct a single non-linear D-AR network that predicts of nonlinear D-AR and present the results for differ- both of these outputs during the feature extraction ent training set sizes and number of covariate outputs. step. The rest of this paper is structured as follows. In Section II, we give brief information on the datasets used; emotion recognition and residential building. 3 METHODS Section III provides background on CCA, KCCA, MLP, and linear D-AR. In Section IV, we present the details of the proposed nonlinear D-AR method. Ex- 3.1 CCA perimental results are given in Section V. The conclu- sions are given in Section VI. Canonical correlation analysis (CCA) (Hotelling, 1992) is a way of measuring the linear relationship between two multidimensional views that are related with each other. Given two datasets X (N m) and Y 2 DATASET (N n) × × X = [x1 x2 x3 xN] ··· (1) The Cohn-Kanade (CK+) facial expression database Y = [y y y y ] 1 2 3 ··· N (Lucey et al., 2010) is a commonly used benchmark- where N is the total number of the instances, m ing dataset in emotion recognition tasks. This dataset and n are the number of features in datasets X and consists of 320 video clips recorded from 118 sub- Y respectively, CCA aims to find two sets of basis jects, each categorized with an emotion label. Each vectors, one for the first view X and the other for the video clip in this dataset belongs to one of the seven second view Y, such that the correlations between emotions which are anger, contempt, disgust, fear, the projections of the variables onto these basis happiness, sadness, and surprise. The samples in vectors are mutually maximized. More formally, this dataset can be represented using different fea- CCA aims tohttps://www.sharelatex.com/project/ ture extraction techniques. In our experimental study, 5a3293785c827c59c12b54c7 maximize the correla- the first view consists of appearance-based features tion between the linear combinations wT X and wTY: (Sakar et al., 2014a; Karaali, 2012; Sakar et al., 2012) x y T T which are obtained using the difference between the ρ = maxwx,wy corr(wx X,wy Y) (2) 108 Nonlinear Feature Extraction using Multilayer Perceptron based Alternating Regression for Classification and Multiple-output Regression Problems as the kernel trick to find nonlinear correlated projec- E[(wT X)(wTY)T ] x y tions. In KCCA, before performing CCA, first each ρ = maxwx,wy T T T T T T E[(wx X)(wx X) ]E[(wy Y)(wy Y) ] view is projected into a higher dimensional feature space using a nonlinear kernel function, where the q T T wx E[XY ])wy data can be linearly separable. In this stage, KCCA = maxwx,wy T T T T maps xi and yi to φ(xi) and φ(yi) wx E[XX ]wxwy E[YY ]wy q (3) x = (x1,...,xn) Sx = (φ1(x),...,φN(x)) where E denotes the expectation. The total covariance 7→ (8) matrix C of (X,Y) y = (y ,...,y ) S = (φ (y),...,φ (y)).
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages11 Page
-
File Size-