A Third Order Based Additional Regularization in Intrinsic Space of the Manifold

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Proceeding of the Electrical Engineering Computer Science and Informatics Proc. EECSI 2019 - Bandung, Indonesia, 18-20 Sept 2019 A Third Order based Additional Regularization in Intrinsic Space of the Manifold Rakesh Kumar Yadav∗, Abhishek∗, Shekhar Verma∗, S Venkatesan∗, M. Syafrullahy, Krisna Adiyartaz ∗Department of IT, Indian Institute of Information Technology-Allahbad, Prayagraj, India-211015 yzProgram of Master of Computer Science, Universitas Budi Luhur, Indonesia Email: ∗fpcl2014003, rsi2016006, sverma, [email protected] [email protected], [email protected] Abstract—Second order graph Laplacian regulariza- Manifold regularization [2], [7]–[10] under SSL tion has the limitation that the solution remains biased framework performs computation on the graph which towards a constant which restricts its extrapolation describes the manifold structure. The nodes of the capability. The lack of extrapolation results in poor generalization. An additional penalty factor is needed graph represent the data instances and similarity or on the function to avoid its over-fitting on seen un- affinity between nodes is represented using edges. labeled training instances. The third order derivative Consider an undirected graph G = (V; W ) where based technique identifies the sharp variations in the V represents n labeled and unlabeled instances. The function and accurately penalizes them to avoid over- edge between data instances x and x is represented fitting. The resultant function leads to a more accurate i j and generic model that exploits the twist and curvature by wij 2 W . The similarity metric between nodes is variations on the manifold. Extensive experiments on used to extract the intrinsic geometry of the manifold synthetic and real-world data set clearly shows that [11], [12]. Graph Laplacian is calculated using L = the additional regularization increases accuracy and Pn D−W , where Dii = j=1 wij is a diagonal matrix generic nature of model. and L estimates the divergence of the function gra- Index Terms—Graph Laplacian, Third order derivative, Regularization, Curvature, Manifold dient. Manifold regularization utilizes the abundant unlabeled data to increase the model accuracy and avoid function over-fitting when labeled data alone I. INTRODUCTION proves to be insufficient. The manifold regularization Machine learning techniques [1] aim to extract the accurately penalizes the function so that its transition underlying common distribution properties from the within similar and non-similar labeled data remains training data set [2] that can be extended to unseen smooth. data instances for the best label prediction. This Graph Laplacian in past has been proved to be prediction often suffers from the model over-fitting an accurate manifold regularization method but due that deviates the model from obtaining a general to its own limitation of extrapolation power, it characteristic. In many of the real world applications, has been used along with Hessian regularization labeled data is available in sparse while unlabeled [13] to achieve better regularization than the graph data is present in abundance. Semi-supervised learn- Laplacian alone. Hessian regularization [14], [15] ing based classification (SSL) [3]–[5] exploits these approximates energy in the neighborhood through combined bundle of labeled as well as unlabeled data the second derivative of the function. It helps in [6] that greatly extends the model proficiency. SSL identifying those geodesic deviating functions which creates a model based on either one of the following is left unpenalized by the graph Laplacian regulariza- predefined assumptions-smoothness, clustering, and tion. This higher order co-regularization [16] method manifold. The clustering assumption states that the successfully discards the effects of noise and high instances belonging to same cluster must share the oscillation. similar class label. The classification boundary thus, Given d dimensional n data points X = passes through the regions between clusters where (x1; x2; :::::; xl; xl+1; :::::; xn) where, for the sake of data is sparsely scattered. The manifold assumption simplicity, first l instances are considered labeled and inherently assumes that the given high dimensional rest n − l are unlabeled data instances. The labels data actually resides on the much lower dimensional of respective input space data is represented using space. Noise, highly correlated attributes and the Y = (y1; y2; :::::; yl). As described above, the graph unknown transformations contribute to the artificially Laplacian on both labeled and unlabeled data is ob- increased dimensions. tained through L = D−W over G = (V; W ) where, 398 Proc. EECSI 2019 - Bandung, Indonesia, 18-20 Sept 2019 n fwijgi=1;j=1 represents the similarity between the on a dense manifold. However, as the neighborhood nodes xi and xj. The connectivity between the nodes becomes sparse, it dips the model’s performance can be defined on the basis of k-nearest neighbor due to its inability to learn the smooth function. technique, where xi and xj are connected if xi is Hence, the model prediction remains accurate as long in the neighborhood of xj or vice-versa. The other as the learned candidate function is able to take method for creating a graph is by employing -radius into account all the neighborhood points at each neighborhood where, xi and xj are connected only data instance. The Graph Laplacian regularization 2 if jjxi − xjjjd < . tends to fail on sparse and rapidly varying manifold. A supervised model can be defined as In order to handle such limitation, we propose a l third order based co-regularization technique fused in ∗ X 2 the existing objective function to further accurately f = argmin V(xi; yi; f) + λjjfjjA (1) f2Hk i=0 penalize the function. On a smooth Riemannian manifold M, the third where V(x ; y ; f) represents the loss function for i i order derivative can be defined as e.g. hinge loss (support vector machine) or square Z 2 2 loss (regression least squares) and λjjfjjA is the f = jj5 5 5 fjj dV (x) a b c TxM⊗TxM⊗TxM (3) ambient space regularization. In order to extract the M intrinsic geometrical information of manifold, one where, 5a 5b 5cf is the third order derivative of more regularization term needs to be included in the f and dV is the volume element which integrates function the whole patches of M. Since, due to the presence l of large unlabeled data, M tends to form the single ∗ X 2 f = argmin V(xi; yi; f) + λjjfjjA large densely connected structure. In order to extract f2Hk i=0 the underlying hidden information, we need to calcu- n (2) X late the tangent space TxM at each data instance of + w (f(x ) − f(x ))2 ij i j X. The third derivative of the underlying curvature at i;j=1 each xi relies on the manifold M’s properties which The last term in the equation (2) penalizes the is independent of the given coordinate representation. function in the intrinsic space to exploit the hidden Thus, we need to evaluate an independent coordinate geometrical information in the data. system for each xi. We assume that given M holds n the basic assumption of local linearity i.e. M follows X 2 T wij(f(xi) − f(xj)) = f Lf euclidean properties around a smaller neighborhood i;j=1 and geodesic being the shortest distance between any where, L denotes the graph Laplacian of the two points on that manifold. The norm of the third manifold. Based on the Representer theorem, the derivative of z converges to Frobenius norm of f optimal function f can be obtained from f = obtained in the independent coordinates Pn aiK(xi; x). Here, K denotes the n×n positive jj 5 5 5 fjj2 = i=1 a b c TxM⊗TxM⊗TxM definite kernel gram matrix and a is the function u 3 2 X @ f (4) coefficient vector. In this work, we proposed a third @x @x @x order derivative based co-regularization technique p;q;r=1 p q r z to overcome the limitation of second order based After the tangent space T M calculation, the higher manifold regularization method. The rest of this x order value for curvature can be obtained at each paper is organized in the following sections, section point as II explains our proposed model, section III represents the extensive experimental results and the last section 3 2 u @ f X (j) = T f(xi) (5) IV draws the conclusion. @x @x @x pqr p q r z i;j=1 II. THIRD ORDER DERIVATIVE AND GRAPH The equation 5 has been used for the calculation LAPLACIAN ASSOCIATED REGULARIZATION of the rate of change of the curvature of the given MODEL manifold where T relates objective function with the A. Proposed Model third order derivative that can be computed by fitting The shortcomings of second order based regular- a third order Taylor expansion of f at each point ization method leads to the need for higher order co- xi. The Taylor polynomial expansion for third order regularization that can neutralize the biasness of the derivative of f is approximated by solution of the function. The third order derivative 3 X f K(0) based technique handles the oscillations and twists in t (x ) = xK = f(0) + f 0(0)x + 3 i K! i i the curvature as well as discards the effect of noise K=0 (6) in the data. The second order derivative estimates f 00(0) f 000(0) the change in curvature and gives accurate value x2 + x3 2! i 3! i 399 Proc. EECSI 2019 - Bandung, Indonesia, 18-20 Sept 2019 where, t3(xi) is the Taylor’s third order derivative, for classification. SVM has been defined that can f(0) is a constant value, f 0, f 00, and f 000 are first, sec- accommodate the higher order regularization term in ond and third order derivative of f respectively.

Load more