Vector-Valued Manifold Regularization

Vector-valued Manifold Regularization HàQuang Minh [email protected] Italian Institute of Technology, Via Morego 30, Genoa 16163,Italy Vikas Sindhwani [email protected] Mathematical Sciences, IBM T.J. Watson Research Center, Yorktown Heights, NY 10598 USA Abstract ing,i.e.,learningfromunlabeledexamplesbyexploit- ing the geometric structure of the marginal probabil- We consider the general problem of learn- ity distribution over the input space, and (2) struc- ing an unknown functional dependency, f : tured multi-output prediction,i.e.,learningtosimulta- X!→ Y,betweenastructuredinputspace neously predict a collection of output variables by ex- X and a structured output space Y,fromla- ploiting their inter-dependencies. We point the reader beled and unlabeled examples. We formu- to Chapelle et al. (2006)andBakir et al. (2007)for late this problem in terms of data-dependent several representative papers on semi-supervised learn- regularization in Vector-valued Reproducing ing and structured prediction respectively. In this pa- Kernel Hilbert Spaces (Micchelli & Pontil, per, we consider a problem at the intersection of these 2005)whichelegantlyextendfamiliarscalar- threads: non-parametric estimation of a vector-valued valued kernel methods to the general set- function, f : X!→ Y,fromlabeledandunlabeledex- ting where Y has a Hilbert space structure. amples. Our methods provide a natural extension of Manifold Regularization (Belkin et al., Our starting point is multivariate regression in 2006)algorithmstoalsoexploitoutput aregularizedleastsquares(RLS)framework(see, inter-dependencies while enforcing smooth- e.g., Brown & Zidek (1980)), which is arguably the ness with respect to input data geometry. classical precursor of much of the modern literature We propose a class of matrix-valued kernels on structured prediction, multi-task learning, multi- which allow efficient implementations of our label classification and related themes that attempt algorithms via the use of numerical solvers to exploit output structure. We adopt the for- for Sylvester matrix equations. On multi- malism of Vector-valued Reproducing Kernel Hilbert label image annotation and text classification Spaces (Micchelli & Pontil, 2005)toposefunction problems, we find favorable empirical com- estimation problems naturally in an RKHS of Y- parisons against several competing alterna- valued functions, where Y in general can be an tives. infinite-dimensional (Hilbert) space. We derive an abstract system of functional linear equations that gives the solution to a generalized Manifold Regulariza- 1. Introduction tion (Belkin et al., 2006)frameworkforvector-valued semi-supervised learning. For multivariate problems The statistical and algorithmic study of regression and with n output variables, the kernel K(·, ·)associated binary classification problems has formed the bedrock with a vector-valued RKHS is matrix-valued, i.e., for of modern machine learning. Motivated by new appli- any x, z ∈X, K(x, z) ∈ Rn×n.Weshowthatanatural cations, data characteristics, and scalability require- choice for a matrix-valued kernel leads to a Sylvester ments, several generalizations and extensions of these Equation, whose solution can be obtained relatively ef- canonical settings have been vigorously pursued in ficiently using techniques in numerical linear algebra. recent years. We point out two particularly domi- This leads to a vector-valued Laplacian Regularized nant threads of research: (1) semi-supervised learn- Least Squares (Laplacian RLS) model that learns not Appearing in Proceedings of the 28 th International Con- only from the geometry of unlabeled data Belkin et al. ference on Machine Learning, Bellevue, WA, USA, 2011. (2006)butalsofromdependenciesamongoutputvari- Copyright 2011 by the author(s)/owner(s). ables estimated using an output graph Laplacian. We Vector-valued Manifold Regularization find encouraging empirical results with this approach extend (1)asfollows, on semi-supervised multi-label classification problems, l in comparison to several recently proposed alterna- ! 1 2 2 T f =argmin (yi − f(xi)) + γA%f%k + γI f Lf tives. We begin this paper with relevant background f l ! ∈Hk i=1 material on Manifold Regularization and multivari- (2) ate RLS. Throughout the paper, we draw attention where γA,γI are referred to as ambient and intrin- to mathematical correspondences between scalar and sic regularization parameters. By using reproducing vector-valued settings. properties of Hk,theRepresentertheoremcancorre- spondingly be extended to show that the minimizer ! N 2. Background has the form, f (x)= i=1 αik(x, xi)involvingboth labeled and unlabeled data." The Laplacian RLS algo- T Let us recall the familiar regression and classification rithm estimates a =[α1 ...αN ] by solving the linear setting where Y = R.Letk : X×X!→ R be a N N N N system [Jl Gk + lγI LGk + lγAIN ]a = y where Gk is standard kernel with an associated RKHS family of the Gram matrix of k with respect to both labeled and functions Hk.Givenacollectionoflabeledexamples, unlabeled examples, IN is the N × N identity matrix, {x ,y }l ,kernel-basedpredictionmethodssetupa N i i i=1 Jl is an N × N diagonal matrix with first l diagonal Tikhonov regularization problem, entries equaling 1 and the rest being 0 valued, and y is the N × 1labelvectorwithyi =0,i>l.Laplacian l RLS and Laplacian SVM tend to give similar empirical ! 1 2 f =argmin V (yi,f(xi)) + γ%f%k (1) performance (Sindhwani et al., 2005). f l ! ∈Hk i=1 Consider now two natural approaches to extending Laplacian RLS for the multivariate case Y = Rn.Let where the choice V (t, y)=(t − y)2 leads to Regular- f =(f1 ...fn)becomponentsofavector-valuedfunc- ized Least Squares (RLS) while V (t, y)=max(0, 1 − th tion where each fi ∈Hk.Letthej output label of yt)leadstotheSVMalgorithm.Bytheclassical the xi be denoted as yij .Then,oneformulationfor Representer theorem (Schölkopf & Smola, 2002), this multivariate LapRLS is to solve, family of algorithms reduces to estimation of finite- dimensional coefficients, a =[α ,...,α ]T ,foramin- l n 1 l ! 1 2 ! f =argmin (y − f (x )) + γ %f %2 imizer that can be shown to have the form f (x)= l ! ! ij j i A j k l fj ∈Hk i=1 j=1 i=1 αik(x, xi). In particular, RLS reduces to solv- 1≤j≤n ing" the linear system, [Gl + γlI ]a = y where y = T k l l l +γI trace[F LF](3) T l [y1 ...yl] , Il is the l×l identity matrix and G denotes k α the Gram matrix of the kernel over the labeled data, where Fij = fj(xi), 1 ≤ i ≤ N,1 ≤ j ≤ n.Let i.e., (Gl ) = k(x ,x ). Let us now review two exten- be an N × n matrix of expansion coefficients, i.e., the k ij i j N sions of this algorithm: first for semi-supervised learn- minimizers have the form fj(x)= αij k(xi,x). It "i=1 ing, and then for multivariate problems where Y = Rn. is easily seen that the solution is given by, N N N α Semi-supervised learning typically proceeds by mak- [Jl Gk + lγI LGk + lγAlIN ] = Y (4) ing assumptions such as smoothness of the predic- where Y is the label matrix with Y =0fori>land tion function with respect to an underlying low- ij all j.Itisclearthatthismultivariatesolutionisequiv- dimensional data manifold or presence of clusters as alent to learning each output independently – ignor- detected using a relatively large set of u unlabeled ex- ing prior knowledge such as the availability of a sim- amples, {x }N=l+u.WewillusethenotationN = i i=l+1 ilarity graph W over output variables. Such prior l+u.InManifoldRegularization(Belkin et al., 2006), out knowledge can naturally be incorporated by adding a anearestneighborgraph,W ,isconstructed,which smoothing term to (3)which,forexample,enforcesf serves as a discrete probe for the geometric structure i to be close to f in the RKHS norm %·% if output of the data. The Laplacian L of this graph provides a j k i is similar to output j,i.e.,(W ) is sufficiently natural intrinsic measure of data-dependent smooth- out ij large. We defer this development to later in the pa- ness: per as both these two solutions are special cases of N abroadervector-valuedRKHSframeworkforLapla- T 1 2 f Lf = Wij (f(xi) − f(xj )) cian RLS where they correspond to certain choices of 2 ! i,j=1 amatrix-valuedkernel.Wefirstgiveaself-contained review of the language of vector-valued RKHS in the where f =[f(x1) ...f(xN )]. Thus, it is natural to following section. Vector-valued Manifold Regularization 3. Vector-Valued RKHS so that Kx is a bounded operator for each x ∈ X.Let ∗ Kx : HK →Ybe the adjoint operator of Kx,then The study of RKHS has been extended to vector- from (6), we have valued functions and further developed and ap- plied in machine learning (see (Carmeli et al., 2006; ∗ f(x)=Kxf for all x ∈ X, f ∈HK . (7) Micchelli & Pontil, 2005; Caponnetto et al., 2008)and references therein). In the following, denote by X a From this we deduce that for all x ∈ X and all f ∈HK , nonempty set, Y arealHilbertspacewiththeinner product (·, ·) , L(Y)theBanachspaceofboundedlin- ||f(x)|| ≤||K∗|| ||f|| ≤ ||K(x, x)|| ||f|| , Y Y x HK # HK ear operators on Y. that is for each x ∈ X,theevaluationoperatorE : X x Let Y denote the vector space of all functions f : ∗ HK →Ydefined by Exf = K f is a bounded linear X→Y.AfunctionK : X×X→L(Y)issaidto x operator. In particular, if κ =supx X ||K(x, x)|| < be an operator-valued positive definite kernel if ∈ # ∞,then||f||∞ =supx X ||f(x)||Y ≤ κ||f||HK for all for each pair (x, z) ∈X×X, K(x, z) ∈L(Y)isa ∈ f ∈HK .Inthispaper,wewillbeconcernedwith self-adjoint operator and kernels for which κ<∞. N (y ,K(x ,x )y ) ≥ 0(5)3.1. Vector-valued Regularized Least Squares ! i i j j Y i,j=1 Let Y be a separable Hilbert space.

Vector-Valued Manifold Regularization

ECS289: Scalable Machine Learning

Semi-Supervised Classification Using Local and Global Regularization

Linear Manifold Regularization for Large Scale Semi-Supervised Learning

Semi-Supervised Deep Learning Using Improved Unsupervised Discriminant Projection?

Hyper-Parameter Optimization for Manifold Regularization Learning

Zero Shot Learning Via Multi-Scale Manifold Regularization

Semi-Supervised Deep Metric Learning Networks for Classiﬁcation of Polarimetric SAR Data

Manifold Regularization and Semi-Supervised Learning: Some Theoretical Analyses

A Graph Based Approach to Semi-Supervised Learning

Cross-Modal and Multimodal Data Analysis Based on Functional Mapping of Spectral Descriptors and Manifold Regularization

Classification by Discriminative Regularization

Approximate Manifold Regularization: Scalable Algorithm and Generalization Analysis