Vector-Valued Manifold Regularization

Total Page:16

File Type:pdf, Size:1020Kb

Vector-Valued Manifold Regularization Vector-valued Manifold Regularization H`aQuang Minh [email protected] Italian Institute of Technology, Via Morego 30, Genoa 16163,Italy Vikas Sindhwani [email protected] Mathematical Sciences, IBM T.J. Watson Research Center, Yorktown Heights, NY 10598 USA Abstract ing,i.e.,learningfromunlabeledexamplesbyexploit- ing the geometric structure of the marginal probabil- We consider the general problem of learn- ity distribution over the input space, and (2) struc- ing an unknown functional dependency, f : tured multi-output prediction,i.e.,learningtosimulta- X!→ Y,betweenastructuredinputspace neously predict a collection of output variables by ex- X and a structured output space Y,fromla- ploiting their inter-dependencies. We point the reader beled and unlabeled examples. We formu- to Chapelle et al. (2006)andBakir et al. (2007)for late this problem in terms of data-dependent several representative papers on semi-supervised learn- regularization in Vector-valued Reproducing ing and structured prediction respectively. In this pa- Kernel Hilbert Spaces (Micchelli & Pontil, per, we consider a problem at the intersection of these 2005)whichelegantlyextendfamiliarscalar- threads: non-parametric estimation of a vector-valued valued kernel methods to the general set- function, f : X!→ Y,fromlabeledandunlabeledex- ting where Y has a Hilbert space structure. amples. Our methods provide a natural extension of Manifold Regularization (Belkin et al., Our starting point is multivariate regression in 2006)algorithmstoalsoexploitoutput aregularizedleastsquares(RLS)framework(see, inter-dependencies while enforcing smooth- e.g., Brown & Zidek (1980)), which is arguably the ness with respect to input data geometry. classical precursor of much of the modern literature We propose a class of matrix-valued kernels on structured prediction, multi-task learning, multi- which allow efficient implementations of our label classification and related themes that attempt algorithms via the use of numerical solvers to exploit output structure. We adopt the for- for Sylvester matrix equations. On multi- malism of Vector-valued Reproducing Kernel Hilbert label image annotation and text classification Spaces (Micchelli & Pontil, 2005)toposefunction problems, we find favorable empirical com- estimation problems naturally in an RKHS of Y- parisons against several competing alterna- valued functions, where Y in general can be an tives. infinite-dimensional (Hilbert) space. We derive an ab- stract system of functional linear equations that gives the solution to a generalized Manifold Regulariza- 1. Introduction tion (Belkin et al., 2006)frameworkforvector-valued semi-supervised learning. For multivariate problems The statistical and algorithmic study of regression and with n output variables, the kernel K(·, ·)associated binary classification problems has formed the bedrock with a vector-valued RKHS is matrix-valued, i.e., for of modern machine learning. Motivated by new appli- any x, z ∈X, K(x, z) ∈ Rn×n.Weshowthatanatural cations, data characteristics, and scalability require- choice for a matrix-valued kernel leads to a Sylvester ments, several generalizations and extensions of these Equation, whose solution can be obtained relatively ef- canonical settings have been vigorously pursued in ficiently using techniques in numerical linear algebra. recent years. We point out two particularly domi- This leads to a vector-valued Laplacian Regularized nant threads of research: (1) semi-supervised learn- Least Squares (Laplacian RLS) model that learns not Appearing in Proceedings of the 28 th International Con- only from the geometry of unlabeled data Belkin et al. ference on Machine Learning, Bellevue, WA, USA, 2011. (2006)butalsofromdependenciesamongoutputvari- Copyright 2011 by the author(s)/owner(s). ables estimated using an output graph Laplacian. We Vector-valued Manifold Regularization find encouraging empirical results with this approach extend (1)asfollows, on semi-supervised multi-label classification problems, l in comparison to several recently proposed alterna- ! 1 2 2 T f =argmin (yi − f(xi)) + γA%f%k + γI f Lf tives. We begin this paper with relevant background f l ! ∈Hk i=1 material on Manifold Regularization and multivari- (2) ate RLS. Throughout the paper, we draw attention where γA,γI are referred to as ambient and intrin- to mathematical correspondences between scalar and sic regularization parameters. By using reproducing vector-valued settings. properties of Hk,theRepresentertheoremcancorre- spondingly be extended to show that the minimizer ! N 2. Background has the form, f (x)= i=1 αik(x, xi)involvingboth labeled and unlabeled data." The Laplacian RLS algo- T Let us recall the familiar regression and classification rithm estimates a =[α1 ...αN ] by solving the linear setting where Y = R.Letk : X×X!→ R be a N N N N system [Jl Gk + lγI LGk + lγAIN ]a = y where Gk is standard kernel with an associated RKHS family of the Gram matrix of k with respect to both labeled and functions Hk.Givenacollectionoflabeledexamples, unlabeled examples, IN is the N × N identity matrix, {x ,y }l ,kernel-basedpredictionmethodssetupa N i i i=1 Jl is an N × N diagonal matrix with first l diagonal Tikhonov regularization problem, entries equaling 1 and the rest being 0 valued, and y is the N × 1labelvectorwithyi =0,i>l.Laplacian l RLS and Laplacian SVM tend to give similar empirical ! 1 2 f =argmin V (yi,f(xi)) + γ%f%k (1) performance (Sindhwani et al., 2005). f l ! ∈Hk i=1 Consider now two natural approaches to extending Laplacian RLS for the multivariate case Y = Rn.Let where the choice V (t, y)=(t − y)2 leads to Regular- f =(f1 ...fn)becomponentsofavector-valuedfunc- ized Least Squares (RLS) while V (t, y)=max(0, 1 − th tion where each fi ∈Hk.Letthej output label of yt)leadstotheSVMalgorithm.Bytheclassical the xi be denoted as yij .Then,oneformulationfor Representer theorem (Sch¨olkopf & Smola, 2002), this multivariate LapRLS is to solve, family of algorithms reduces to estimation of finite- dimensional coefficients, a =[α ,...,α ]T ,foramin- l n 1 l ! 1 2 ! f =argmin (y − f (x )) + γ %f %2 imizer that can be shown to have the form f (x)= l ! ! ij j i A j k l fj ∈Hk i=1 j=1 i=1 αik(x, xi). In particular, RLS reduces to solv- 1≤j≤n ing" the linear system, [Gl + γlI ]a = y where y = T k l l l +γI trace[F LF](3) T l [y1 ...yl] , Il is the l×l identity matrix and G denotes k α the Gram matrix of the kernel over the labeled data, where Fij = fj(xi), 1 ≤ i ≤ N,1 ≤ j ≤ n.Let i.e., (Gl ) = k(x ,x ). Let us now review two exten- be an N × n matrix of expansion coefficients, i.e., the k ij i j N sions of this algorithm: first for semi-supervised learn- minimizers have the form fj(x)= αij k(xi,x). It "i=1 ing, and then for multivariate problems where Y = Rn. is easily seen that the solution is given by, N N N α Semi-supervised learning typically proceeds by mak- [Jl Gk + lγI LGk + lγAlIN ] = Y (4) ing assumptions such as smoothness of the predic- where Y is the label matrix with Y =0fori>land tion function with respect to an underlying low- ij all j.Itisclearthatthismultivariatesolutionisequiv- dimensional data manifold or presence of clusters as alent to learning each output independently – ignor- detected using a relatively large set of u unlabeled ex- ing prior knowledge such as the availability of a sim- amples, {x }N=l+u.WewillusethenotationN = i i=l+1 ilarity graph W over output variables. Such prior l+u.InManifoldRegularization(Belkin et al., 2006), out knowledge can naturally be incorporated by adding a anearestneighborgraph,W ,isconstructed,which smoothing term to (3)which,forexample,enforcesf serves as a discrete probe for the geometric structure i to be close to f in the RKHS norm %·% if output of the data. The Laplacian L of this graph provides a j k i is similar to output j,i.e.,(W ) is sufficiently natural intrinsic measure of data-dependent smooth- out ij large. We defer this development to later in the pa- ness: per as both these two solutions are special cases of N abroadervector-valuedRKHSframeworkforLapla- T 1 2 f Lf = Wij (f(xi) − f(xj )) cian RLS where they correspond to certain choices of 2 ! i,j=1 amatrix-valuedkernel.Wefirstgiveaself-contained review of the language of vector-valued RKHS in the where f =[f(x1) ...f(xN )]. Thus, it is natural to following section. Vector-valued Manifold Regularization 3. Vector-Valued RKHS so that Kx is a bounded operator for each x ∈ X.Let ∗ Kx : HK →Ybe the adjoint operator of Kx,then The study of RKHS has been extended to vector- from (6), we have valued functions and further developed and ap- plied in machine learning (see (Carmeli et al., 2006; ∗ f(x)=Kxf for all x ∈ X, f ∈HK . (7) Micchelli & Pontil, 2005; Caponnetto et al., 2008)and references therein). In the following, denote by X a From this we deduce that for all x ∈ X and all f ∈HK , nonempty set, Y arealHilbertspacewiththeinner product (·, ·) , L(Y)theBanachspaceofboundedlin- ||f(x)|| ≤||K∗|| ||f|| ≤ ||K(x, x)|| ||f|| , Y Y x HK # HK ear operators on Y. that is for each x ∈ X,theevaluationoperatorE : X x Let Y denote the vector space of all functions f : ∗ HK →Ydefined by Exf = K f is a bounded linear X→Y.AfunctionK : X×X→L(Y)issaidto x operator. In particular, if κ =supx X ||K(x, x)|| < be an operator-valued positive definite kernel if ∈ # ∞,then||f||∞ =supx X ||f(x)||Y ≤ κ||f||HK for all for each pair (x, z) ∈X×X, K(x, z) ∈L(Y)isa ∈ f ∈HK .Inthispaper,wewillbeconcernedwith self-adjoint operator and kernels for which κ<∞. N (y ,K(x ,x )y ) ≥ 0(5)3.1. Vector-valued Regularized Least Squares ! i i j j Y i,j=1 Let Y be a separable Hilbert space.
Recommended publications
  • ECS289: Scalable Machine Learning
    ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Nov 3, 2015 Outline PageRank Semi-supervised Learning Label Propagation Manifold regularization PageRank Ranking Websites Text based ranking systems (a dominated approach in the early 90s) Compute the similarity between query and websites (documents) Keywords are a very limited way to express a complex information Need to rank websites by \popularity", \authority", . PageRank: Developed by Brin and Page (1999) Determine the authority and popularity by hyperlinks PageRank Main idea: estimate the ranking of websites by the link structure. Topology of Websites Transform the hyperlinks to a directed graph: The adjacency matrix A such that Aij = 1 if page j points to page i Transition Matrix Normalize the adjacency matrix so that the matrix is a stochastic matrix (each column sum up to 1) Pij : probability that arriving at page i from page j P: a stochastic matrix or a transition matrix Random walk: step 1 Random walk through the transition matrix Start from [1; 0; 0; 0] (can use any initialization) Random walk: step 1 Random walk through the transition matrix xt+1 = Pxt Random walk: step 2 Random walk through the transition matrix Random walk: step 2 Random walk through the transition matrix xt+2 = Pxt+1 PageRank (convergence) P Start from an initial vector x with i xi = 1 (the initial distribution) For t = 1; 2;::: xt+1 = Pxt Each xt is a probability distribution (sums up to 1) Converges to a stationary distribution π such that π = Pπ if P satisfies the following two conditions: t 1 P is irreducible: for all i; j, there exists some t such that (P )i;j > 0 t 2 P is aperiodic: for all i; j, we have gcdft :(P )i;j > 0g = 1 π is the unique right eigenvector of P with eigenvalue 1.
    [Show full text]
  • Semi-Supervised Classification Using Local and Global Regularization
    Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Semi- supervised Classification Using Local and Global Regularization Fei Wang1, Tao Li2, Gang Wang3, Changshui Zhang1 1Department of Automation, Tsinghua University, Beijing, China 2School of Computing and Information Sciences, Florida International University, Miami, FL, USA 3Microsoft China Research, Beijing, China Abstract works concentrate on the derivation of different forms of In this paper, we propose a semi-supervised learning (SSL) smoothness regularizers, such as the ones using combinato- algorithm based on local and global regularization. In the lo- rial graph Laplacian (Zhu et al., 2003)(Belkin et al., 2006), cal regularization part, our algorithm constructs a regularized normalized graph Laplacian (Zhou et al., 2004), exponen- classifier for each data point using its neighborhood, while tial/iterative graph Laplacian (Belkin et al., 2004), local lin- the global regularization part adopts a Laplacian regularizer ear regularization (Wang & Zhang, 2006)and local learning to smooth the data labels predicted by those local classifiers. regularization (Wu & Sch¨olkopf, 2007), but rarely touch the We show that some existing SSL algorithms can be derived problem of how to derive a more efficient loss function. from our framework. Finally we present some experimental In this paper, we argue that rather than applying a global results to show the effectiveness of our method. loss function which is based on the construction of a global predictor using the whole data set, it would be more desir- Introduction able to measure such loss locally by building some local pre- Semi-supervised learning (SSL), which aims at learning dictors for different regions of the input data space.
    [Show full text]
  • Linear Manifold Regularization for Large Scale Semi-Supervised Learning
    Linear Manifold Regularization for Large Scale Semi-supervised Learning Vikas Sindhwani [email protected] Partha Niyogi [email protected] Mikhail Belkin [email protected] Department of Computer Science, University of Chicago, Hyde Park, Chicago, IL 60637 Sathiya Keerthi [email protected] Yahoo! Research Labs, 210 S Delacey Avenue, Pasadena, CA 91105 Abstract tion (Belkin, Niyogi & Sindhwani, 2004; Sindhwani, The enormous wealth of unlabeled data in Niyogi and Belkin, 2005). In this framework, unla- many applications of machine learning is be- beled data is incorporated within a geometrically mo- ginning to pose challenges to the designers tivated regularizer. We specialize this framework for of semi-supervised learning methods. We linear classification, and focus on the problem of deal- are interested in developing linear classifica- ing with large amounts of data. tion algorithms to efficiently learn from mas- This paper is organized as follows: In section 2, we sive partially labeled datasets. In this paper, setup the problem of linear semi-supervised learning we propose Linear Laplacian Support Vector and discuss some prior work. The general Manifold Machines and Linear Laplacian Regularized Regularization framework is reviewed in section 3. In Least Squares as promising solutions to this section 4, we discuss our methods. Section 5 outlines problem. some research in progress. 1. Introduction 2. Linear Semi-supervised Learning The problem of linear semi-supervised clasification is A single web-crawl by search engines like Yahoo and setup as follows: We are given l labeled examples Google indexes billions of webpages. Only a very l l+u fxi; yigi=1 and u unlabeled examples fxigi=l+1, where small fraction of these web-pages can be hand-labeled d and assembled into topic directories.
    [Show full text]
  • Semi-Supervised Deep Learning Using Improved Unsupervised Discriminant Projection?
    Semi-Supervised Deep Learning Using Improved Unsupervised Discriminant Projection? Xiao Han∗, Zihao Wang∗, Enmei Tu, Gunnam Suryanarayana, and Jie Yang School of Electronics, Information and Electrical Engineering, Shanghai Jiao Tong University, China [email protected], [email protected], [email protected] Abstract. Deep learning demands a huge amount of well-labeled data to train the network parameters. How to use the least amount of labeled data to obtain the desired classification accuracy is of great practical significance, because for many real-world applications (such as medical diagnosis), it is difficult to obtain so many labeled samples. In this paper, modify the unsupervised discriminant projection algorithm from dimen- sion reduction and apply it as a regularization term to propose a new semi-supervised deep learning algorithm, which is able to utilize both the local and nonlocal distribution of abundant unlabeled samples to improve classification performance. Experiments show that given dozens of labeled samples, the proposed algorithm can train a deep network to attain satisfactory classification results. Keywords: Manifold Regularization · Semi-supervised learning · Deep learning. 1 Introduction In reality, one of the main difficulties faced by many machine learning tasks is manually tagging large amounts of data. This is especially prominent for deep learning, which usually demands a huge number of well-labeled samples. There- fore, how to use the least amount of labeled data to train a deep network has become an important topic in the area. To overcome this problem, researchers proposed that the use of a large number of unlabeled data can extract the topol- ogy of the overall data's distribution.
    [Show full text]
  • Hyper-Parameter Optimization for Manifold Regularization Learning
    Cassiano Ot´avio Becker Hyper-parameter Optimization for Manifold Regularization Learning Otimiza¸cao~ de Hiperparametros^ para Aprendizado do Computador por Regulariza¸cao~ em Variedades Campinas 2013 i ii Universidade Estadual de Campinas Faculdade de Engenharia Eletrica´ e de Computa¸cao~ Cassiano Ot´avio Becker Hyper-parameter Optimization for Manifold Regularization Learning Otimiza¸cao~ de Hiperparametros^ para Aprendizado do Computador por Regulariza¸cao~ em Variedades Master dissertation presented to the School of Electrical and Computer Engineering in partial fulfillment of the requirements for the degree of Master in Electrical Engineering. Concentration area: Automation. Disserta¸c~ao de Mestrado apresentada ao Programa de P´os- Gradua¸c~ao em Engenharia El´etrica da Faculdade de Engenharia El´etrica e de Computa¸c~ao da Universidade Estadual de Campinas como parte dos requisitos exigidos para a obten¸c~ao do t´ıtulo de Mestre em Engenharia El´etrica. Area´ de concentra¸c~ao: Automa¸c~ao. Orientador (Advisor): Prof. Dr. Paulo Augusto Valente Ferreira Este exemplar corresponde `a vers~ao final da disserta¸c~aodefendida pelo aluno Cassiano Ot´avio Becker, e orientada pelo Prof. Dr. Paulo Augusto Valente Ferreira. Campinas 2013 iii Ficha catalográfica Universidade Estadual de Campinas Biblioteca da Área de Engenharia e Arquitetura Rose Meire da Silva - CRB 8/5974 Becker, Cassiano Otávio, 1977- B388h BecHyper-parameter optimization for manifold regularization learning / Cassiano Otávio Becker. – Campinas, SP : [s.n.], 2013. BecOrientador: Paulo Augusto Valente Ferreira. BecDissertação (mestrado) – Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação. Bec1. Aprendizado do computador. 2. Aprendizado semi-supervisionado. 3. Otimização matemática. 4.
    [Show full text]
  • Zero Shot Learning Via Multi-Scale Manifold Regularization
    Zero Shot Learning via Multi-Scale Manifold Regularization Shay Deutsch 1,2 Soheil Kolouri 3 Kyungnam Kim 3 Yuri Owechko 3 Stefano Soatto 1 1 UCLA Vision Lab, University of California, Los Angeles, CA 90095 2 Department of Mathematics, University of California Los Angeles 3 HRL Laboratories, LLC, Malibu, CA {shaydeu@math., soatto@cs.}ucla.edu {skolouri, kkim, yowechko,}@hrl.com Abstract performed in a transductive setting, using all unlabeled data where the classification and learning process on the transfer We address zero-shot learning using a new manifold data is entirely unsupervised. alignment framework based on a localized multi-scale While our suggested approach is similar in scope to other transform on graphs. Our inference approach includes a ”visual-semantic alignment” methods (see [5, 10] and ref- smoothness criterion for a function mapping nodes on a erences therein) it is, to the best of our knowledge, the first graph (visual representation) onto a linear space (semantic to use multi-scale localized representations that respect the representation), which we optimize using multi-scale graph global (non-flat) geometry of the data space and the fine- wavelets. The robustness of the ensuing scheme allows us scale structure of the local semantic representation. More- to operate with automatically generated semantic annota- over, learning the relationship between visual features and tions, resulting in an algorithm that is entirely free of man- semantic attributes is unified into a single process, whereas ual supervision, and yet improves the state-of-the-art as in most ZSL approaches it is divided into a number of inde- measured on benchmark datasets.
    [Show full text]
  • Semi-Supervised Deep Metric Learning Networks for Classification of Polarimetric SAR Data
    remote sensing Letter Semi-Supervised Deep Metric Learning Networks for Classification of Polarimetric SAR Data Hongying Liu , Ruyi Luo, Fanhua Shang * , Xuechun Meng, Shuiping Gou and Biao Hou Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, School of Artificial Intelligence, Xidian University, Xi’an 710071, China; [email protected] (H.L.); [email protected] (R.L.); [email protected] (X.M.); [email protected] (S.G.); [email protected] (B.H.) * Corresponding authors: [email protected] Received: 7 April 2020; Accepted: 14 May 2020; Published: 17 May 2020 Abstract: Recently, classification methods based on deep learning have attained sound results for the classification of Polarimetric synthetic aperture radar (PolSAR) data. However, they generally require a great deal of labeled data to train their models, which limits their potential real-world applications. This paper proposes a novel semi-supervised deep metric learning network (SSDMLN) for feature learning and classification of PolSAR data. Inspired by distance metric learning, we construct a network, which transforms the linear mapping of metric learning into the non-linear projection in the layer-by-layer learning. With the prior knowledge of the sample categories, the network also learns a distance metric under which all pairs of similarly labeled samples are closer and dissimilar samples have larger relative distances. Moreover, we introduce a new manifold regularization to reduce the distance between neighboring samples since they are more likely to be homogeneous. The categorizing is achieved by using a simple classifier. Several experiments on both synthetic and real-world PolSAR data from different sensors are conducted and they demonstrate the effectiveness of SSDMLN with limited labeled samples, and SSDMLN is superior to state-of-the-art methods.
    [Show full text]
  • Manifold Regularization and Semi-Supervised Learning: Some Theoretical Analyses
    Manifold Regularization and Semi-supervised Learning: Some Theoretical Analyses Partha Niyogi∗ January 22, 2008 Abstract Manifold regularization (Belkin, Niyogi, Sindhwani, 2004)isage- ometrically motivated framework for machine learning within which several semi-supervised algorithms have been constructed. Here we try to provide some theoretical understanding of this approach. Our main result is to expose the natural structure of a class of problems on which manifold regularization methods are helpful. We show that for such problems, no supervised learner can learn effectively. On the other hand, a manifold based learner (that knows the manifold or “learns” it from unlabeled examples) can learn with relatively few labeled examples. Our analysis follows a minimax style with an em- phasis on finite sample results (in terms of n: the number of labeled examples). These results allow us to properly interpret manifold reg- ularization and its potential use in semi-supervised learning. 1 Introduction A learning problem is specified by a probability distribution p on X × Y according to which labelled examples zi = (xi, yi) pairs are drawn and presented to a learning algorithm (estimation procedure). We are interested in an understanding of the case in which X = IRD, ∗Departments of Computer Science, Statistics, The University of Chicago 1 Y ⊂ IR but pX (the marginal distribution of p on X) is supported on some submanifold M ⊂ X. In particular, we are interested in understanding how knowledge of this submanifold may potentially help a learning algorithm1. To this end, we will consider two kinds of learning algorithms: 1. Algorithms that have no knowledge of the submanifold M but learn from (xi, yi) pairs in a purely supervised way.
    [Show full text]
  • A Graph Based Approach to Semi-Supervised Learning
    A graph based approach to semi-supervised learning Michael Lim 1 Feb 2011 Michael Lim A graph based approach to semi-supervised learning Two papers M. Belkin, P. Niyogi, and V Sindhwani. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research 1-48, 2006. M. Belkin, P. Niyogi. Towards a Theoretical Foundation for Laplacian Based Manifold Methods. Journal of Computer and System Sciences, 2007. Michael Lim A graph based approach to semi-supervised learning What is semi-supervised learning? Prediction, but with the help of unsupervised examples. Michael Lim A graph based approach to semi-supervised learning Practical reasons: unlabeled data cheap More natural model of human learning Why semi-supervised learning? Michael Lim A graph based approach to semi-supervised learning More natural model of human learning Why semi-supervised learning? Practical reasons: unlabeled data cheap Michael Lim A graph based approach to semi-supervised learning Why semi-supervised learning? Practical reasons: unlabeled data cheap More natural model of human learning Michael Lim A graph based approach to semi-supervised learning An example Michael Lim A graph based approach to semi-supervised learning An example Michael Lim A graph based approach to semi-supervised learning An example Michael Lim A graph based approach to semi-supervised learning An example Michael Lim A graph based approach to semi-supervised learning Semi-supervised learning framework 1 l labeled examples (x; y) generated
    [Show full text]
  • Cross-Modal and Multimodal Data Analysis Based on Functional Mapping of Spectral Descriptors and Manifold Regularization
    Journal of Machine Learning Research 1 (2021) 1-37 Submitted 4/21; Published 00/00 Cross-Modal and Multimodal Data Analysis Based on Functional Mapping of Spectral Descriptors and Manifold Regularization Maysam Behmanesh [email protected] Artificial Intelligence Department, Faculty of Computer Engineering, University of Isfahan, Iran Peyman Adibi [email protected] Artificial Intelligence Department, Faculty of Computer Engineering, University of Isfahan, Iran Jocelyn Chanussot [email protected] Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, 38000 Grenoble, France Sayyed Mohammad Saeed Ehsani [email protected] Artificial Intelligence Department, Faculty of Computer Engineering, University of Isfahan, Iran Editor: Abstract Multimodal manifold modeling methods extend the spectral geometry-aware data analysis to learning from several related and complementary modalities. Most of these methods work based on two major assumptions: 1) there are the same number of homogeneous data sam- ples in each modality, and 2) at least partial correspondences between modalities are given in advance as prior knowledge. This work proposes two new multimodal modeling methods. The first method establishes a general analyzing framework to deal with the multimodal information problem for heterogeneous data without any specific prior knowledge. For this purpose, first, we identify the localities of each manifold by extracting local descriptors via spectral graph wavelet signatures (SGWS). Then, we propose a manifold regulariza- tion framework based on the functional mapping between SGWS descriptors (FMBSD) for finding the pointwise correspondences. The second method is a manifold regularized mul- timodal classification based on pointwise correspondences (M2CPC) used for the problem of multiclass classification of multimodal heterogeneous, which the correspondences be- tween modalities are determined based on the FMBSD method.
    [Show full text]
  • Classification by Discriminative Regularization
    Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Classification by Discriminative Regularization 1¤ 2 3 1 1 Bin Zhang , Fei Wang , Ta-Hsin Li , Wen jun Yin , Jin Dong 1IBM China Research Lab, Beijing, China 2Department of Automation, Tsinghua University, Beijing, China 3IBM Watson Research Center, Yorktown Heights, NY, USA ¤Corresponding author, email: [email protected] Abstract Generally achieving a good classifier needs a sufficiently ² ^ Classification is one of the most fundamental problems large amount of training data points (such that may have a better approximation of ), however, in realJ world in machine learning, which aims to separate the data J from different classes as far away as possible. A com- applications, it is usually expensive and time consuming mon way to get a good classification function is to min- to get enough labeled points. imize its empirical prediction loss or structural loss. In Even if we have enough training data points, and the con- this paper, we point out that we can also enhance the ² structed classifier may fit very well on the training set, it discriminality of those classifiers by further incorporat- may not have a good generalization ability on the testing ing the discriminative information contained in the data set as a prior into the classifier construction process. In set. This is the problem we called overfitting. such a way, we will show that the constructed classifiers Regularization is one of the most important techniques will be more powerful, and this will also be validated by to handle the above two problems. First, for the overfitting the final empirical study on several benchmark data sets.
    [Show full text]
  • Approximate Manifold Regularization: Scalable Algorithm and Generalization Analysis
    Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19) Approximate Manifold Regularization: Scalable Algorithm and Generalization Analysis Jian Li1;2 , Yong Liu1 , Rong Yin1;2 and Weiping Wang1∗ 1Institute of Information Engineering, Chinese Academy of Sciences 2School of Cyber Security, University of Chinese Academy of Sciences flijian9026, liuyong, yinrong, [email protected] Abstract vised to approximate Laplacian graph by a line or spanning tree [Cesa-Bianchi et al., 2013] and improved by minimiz- Graph-based semi-supervised learning is one of ing tree cut (MTC) in [Zhang et al., 2016]. (2) Acceler- the most popular and successful semi-supervised ate operations associated with kernel matrix. Several dis- learning approaches. Unfortunately, it suffers from tributed approaches have been applied to semi-supervised high time and space complexity, at least quadratic learning [Chang et al., 2017], decomposing a large scale with the number of training samples. In this pa- problem into smaller ones. Anchor Graph regularization per, we propose an efficient graph-based semi- (Anchor) constructs an anchor graph with the training sam- supervised algorithm with a sound theoretical guar- ples and a few anchor points to approximate Laplacian graph antee. The proposed method combines Nystrom [Liu et al., 2010]. The work of [McWilliams et al., 2013; subsampling and preconditioned conjugate gradi- Rastogi and Sampath, 2017] applied random projections in- ent descent, substantially improving computational cluding Nystrom¨ methods and random features into mani- efficiency and reducing memory requirements. Ex- fold regularization. Gradient methods are introduced to solve tensive empirical results reveal that our method manifold regularization on the primal problem, such as pre- achieves the state-of-the-art performance in a short conditioned conjugate gradient [Melacci and Belkin, 2011], time even with limited computing resources.
    [Show full text]