Vector-Valued Manifold Regularization

Vector-valued Manifold Regularization HàQuang Minh [email protected] Italian Institute of Technology, Via Morego 30, Genoa 16163,Italy Vikas Sindhwani [email protected] Mathematical Sciences, IBM T.J. Watson Research Center, Yorktown Heights, NY 10598 USA Abstract ing,i.e.,learningfromunlabeledexamplesbyexploit- ing the geometric structure of the marginal probabil- We consider the general problem of learn- ity distribution over the input space, and (2) struc- ing an unknown functional dependency, f : tured multi-output prediction,i.e.,learningtosimulta- X!→ Y,betweenastructuredinputspace neously predict a collection of output variables by ex- X and a structured output space Y,fromla- ploiting their inter-dependencies. We point the reader beled and unlabeled examples. We formu- to Chapelle et al. (2006)andBakir et al. (2007)for late this problem in terms of data-dependent several representative papers on semi-supervised learn- regularization in Vector-valued Reproducing ing and structured prediction respectively. In this pa- Kernel Hilbert Spaces (Micchelli & Pontil, per, we consider a problem at the intersection of these 2005)whichelegantlyextendfamiliarscalar- threads: non-parametric estimation of a vector-valued valued kernel methods to the general set- function, f : X!→ Y,fromlabeledandunlabeledex- ting where Y has a Hilbert space structure. amples. Our methods provide a natural extension of Manifold Regularization (Belkin et al., Our starting point is multivariate regression in 2006)algorithmstoalsoexploitoutput aregularizedleastsquares(RLS)framework(see, inter-dependencies while enforcing smooth- e.g., Brown & Zidek (1980)), which is arguably the ness with respect to input data geometry. classical precursor of much of the modern literature We propose a class of matrix-valued kernels on structured prediction, multi-task learning, multi- which allow efficient implementations of our label classification and related themes that attempt algorithms via the use of numerical solvers to exploit output structure. We adopt the for- for Sylvester matrix equations. On multi- malism of Vector-valued Reproducing Kernel Hilbert label image annotation and text classification Spaces (Micchelli & Pontil, 2005)toposefunction problems, we find favorable empirical com- estimation problems naturally in an RKHS of Y- parisons against several competing alterna- valued functions, where Y in general can be an tives. infinite-dimensional (Hilbert) space. We derive an abstract system of functional linear equations that gives the solution to a generalized Manifold Regulariza- 1. Introduction tion (Belkin et al., 2006)frameworkforvector-valued semi-supervised learning. For multivariate problems The statistical and algorithmic study of regression and with n output variables, the kernel K(·, ·)associated binary classification problems has formed the bedrock with a vector-valued RKHS is matrix-valued, i.e., for of modern machine learning. Motivated by new appli- any x, z ∈X, K(x, z) ∈ Rn×n.Weshowthatanatural cations, data characteristics, and scalability require- choice for a matrix-valued kernel leads to a Sylvester ments, several generalizations and extensions of these Equation, whose solution can be obtained relatively ef- canonical settings have been vigorously pursued in ficiently using techniques in numerical linear algebra. recent years. We point out two particularly domi- This leads to a vector-valued Laplacian Regularized nant threads of research: (1) semi-supervised learn- Least Squares (Laplacian RLS) model that learns not Appearing in Proceedings of the 28 th International Con- only from the geometry of unlabeled data Belkin et al. ference on Machine Learning, Bellevue, WA, USA, 2011. (2006)butalsofromdependenciesamongoutputvari- Copyright 2011 by the author(s)/owner(s). ables estimated using an output graph Laplacian. We Vector-valued Manifold Regularization find encouraging empirical results with this approach extend (1)asfollows, on semi-supervised multi-label classification problems, l in comparison to several recently proposed alterna- ! 1 2 2 T f =argmin (yi − f(xi)) + γA%f%k + γI f Lf tives. We begin this paper with relevant background f l ! ∈Hk i=1 material on Manifold Regularization and multivari- (2) ate RLS. Throughout the paper, we draw attention where γA,γI are referred to as ambient and intrin- to mathematical correspondences between scalar and sic regularization parameters. By using reproducing vector-valued settings. properties of Hk,theRepresentertheoremcancorre- spondingly be extended to show that the minimizer ! N 2. Background has the form, f (x)= i=1 αik(x, xi)involvingboth labeled and unlabeled data." The Laplacian RLS algo- T Let us recall the familiar regression and classification rithm estimates a =[α1 ...αN ] by solving the linear setting where Y = R.Letk : X×X!→ R be a N N N N system [Jl Gk + lγI LGk + lγAIN ]a = y where Gk is standard kernel with an associated RKHS family of the Gram matrix of k with respect to both labeled and functions Hk.Givenacollectionoflabeledexamples, unlabeled examples, IN is the N × N identity matrix, {x ,y }l ,kernel-basedpredictionmethodssetupa N i i i=1 Jl is an N × N diagonal matrix with first l diagonal Tikhonov regularization problem, entries equaling 1 and the rest being 0 valued, and y is the N × 1labelvectorwithyi =0,i>l.Laplacian l RLS and Laplacian SVM tend to give similar empirical ! 1 2 f =argmin V (yi,f(xi)) + γ%f%k (1) performance (Sindhwani et al., 2005). f l ! ∈Hk i=1 Consider now two natural approaches to extending Laplacian RLS for the multivariate case Y = Rn.Let where the choice V (t, y)=(t − y)2 leads to Regular- f =(f1 ...fn)becomponentsofavector-valuedfunc- ized Least Squares (RLS) while V (t, y)=max(0, 1 − th tion where each fi ∈Hk.Letthej output label of yt)leadstotheSVMalgorithm.Bytheclassical the xi be denoted as yij .Then,oneformulationfor Representer theorem (Schölkopf & Smola, 2002), this multivariate LapRLS is to solve, family of algorithms reduces to estimation of finite- dimensional coefficients, a =[α ,...,α ]T ,foramin- l n 1 l ! 1 2 ! f =argmin (y − f (x )) + γ %f %2 imizer that can be shown to have the form f (x)= l ! ! ij j i A j k l fj ∈Hk i=1 j=1 i=1 αik(x, xi). In particular, RLS reduces to solv- 1≤j≤n ing" the linear system, [Gl + γlI ]a = y where y = T k l l l +γI trace[F LF](3) T l [y1 ...yl] , Il is the l×l identity matrix and G denotes k α the Gram matrix of the kernel over the labeled data, where Fij = fj(xi), 1 ≤ i ≤ N,1 ≤ j ≤ n.Let i.e., (Gl ) = k(x ,x ). Let us now review two exten- be an N × n matrix of expansion coefficients, i.e., the k ij i j N sions of this algorithm: first for semi-supervised learn- minimizers have the form fj(x)= αij k(xi,x). It "i=1 ing, and then for multivariate problems where Y = Rn. is easily seen that the solution is given by, N N N α Semi-supervised learning typically proceeds by mak- [Jl Gk + lγI LGk + lγAlIN ] = Y (4) ing assumptions such as smoothness of the predic- where Y is the label matrix with Y =0fori>land tion function with respect to an underlying low- ij all j.Itisclearthatthismultivariatesolutionisequiv- dimensional data manifold or presence of clusters as alent to learning each output independently – ignor- detected using a relatively large set of u unlabeled ex- ing prior knowledge such as the availability of a sim- amples, {x }N=l+u.WewillusethenotationN = i i=l+1 ilarity graph W over output variables. Such prior l+u.InManifoldRegularization(Belkin et al., 2006), out knowledge can naturally be incorporated by adding a anearestneighborgraph,W ,isconstructed,which smoothing term to (3)which,forexample,enforcesf serves as a discrete probe for the geometric structure i to be close to f in the RKHS norm %·% if output of the data. The Laplacian L of this graph provides a j k i is similar to output j,i.e.,(W ) is sufficiently natural intrinsic measure of data-dependent smooth- out ij large. We defer this development to later in the pa- ness: per as both these two solutions are special cases of N abroadervector-valuedRKHSframeworkforLapla- T 1 2 f Lf = Wij (f(xi) − f(xj )) cian RLS where they correspond to certain choices of 2 ! i,j=1 amatrix-valuedkernel.Wefirstgiveaself-contained review of the language of vector-valued RKHS in the where f =[f(x1) ...f(xN )]. Thus, it is natural to following section. Vector-valued Manifold Regularization 3. Vector-Valued RKHS so that Kx is a bounded operator for each x ∈ X.Let ∗ Kx : HK →Ybe the adjoint operator of Kx,then The study of RKHS has been extended to vector- from (6), we have valued functions and further developed and ap- plied in machine learning (see (Carmeli et al., 2006; ∗ f(x)=Kxf for all x ∈ X, f ∈HK . (7) Micchelli & Pontil, 2005; Caponnetto et al., 2008)and references therein). In the following, denote by X a From this we deduce that for all x ∈ X and all f ∈HK , nonempty set, Y arealHilbertspacewiththeinner product (·, ·) , L(Y)theBanachspaceofboundedlin- ||f(x)|| ≤||K∗|| ||f|| ≤ ||K(x, x)|| ||f|| , Y Y x HK # HK ear operators on Y. that is for each x ∈ X,theevaluationoperatorE : X x Let Y denote the vector space of all functions f : ∗ HK →Ydefined by Exf = K f is a bounded linear X→Y.AfunctionK : X×X→L(Y)issaidto x operator. In particular, if κ =supx X ||K(x, x)|| < be an operator-valued positive definite kernel if ∈ # ∞,then||f||∞ =supx X ||f(x)||Y ≤ κ||f||HK for all for each pair (x, z) ∈X×X, K(x, z) ∈L(Y)isa ∈ f ∈HK .Inthispaper,wewillbeconcernedwith self-adjoint operator and kernels for which κ<∞. N (y ,K(x ,x )y ) ≥ 0(5)3.1. Vector-valued Regularized Least Squares ! i i j j Y i,j=1 Let Y be a separable Hilbert space.

Vector-Valued Manifold Regularization

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support