Simple and Efficient Multiple Kernel Learning by Group Lasso

Total Page:16

File Type:pdf, Size:1020Kb

Simple and Efficient Multiple Kernel Learning by Group Lasso Simple and Efficient Multiple Kernel Learning by Group Lasso Zenglin Xu [email protected] Cluster of Excellence MMCI, Saarland University and MPI Informatics, Saarbruecken, Germany Rong Jin [email protected] Department of Computer Science & Engineering, Michigan State University, East Lansing, MI 48824 USA Haiqin Yang, Irwin King, Michael R. Lyu {HQYANG, KING, LYU}@CSE.CUHK.EDU.HK Department of Computer Science & Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong Abstract 2004b; Ongetal., 2005). It has been regarded as a promising technique for identifying the combina- We consider the problem of how to improve tion of multiple data sources or feature subsets and the efficiency of Multiple Kernel Learning been applied in a number of domains, such as (MKL). In literature, MKL is often solved by genome fusion (Lanckrietet al., 2004a), splice site an alternating approach: (1) the minimization detection (Sonnenburg et al., 2006), image annotation of the kernel weights is solved by compli- (Harchaoui & Bach, 2007) and so on. cated techniques, such as Semi-infinite Lin- ear Programming, Gradient Descent, or Level Multiple kernel learning searches for a combination of method; (2) the maximization of SVM dual base kernel functions/matrices that maximizes a gener- variables can be solved by standard SVM alized performance measure. Typical measures studied solvers. However, the minimization step in for multiple kernel learning, include maximum margin these methods is usually dependent on its classification errors (Lanckriet et al., 2004b; Bach et al., solving techniques or commercial softwares, 2004; Argyriou et al., 2006; Zien & Ong, 2007), kernel- which therefore limits the efficiency and ap- target alignment (Cristianini et al., 2001), Fisher dis- plicability. In this paper, we formulate a criminative analysis (Ye et al., 2007), etc. closed-form solution for optimizing the ker- There are two active research directions in multiple nel weights based on the equivalence between kernel learning. One is to improve the efficiency of group-lasso and MKL. Although this equiva- MKL algorithms. Following the Semi-Definite Program- lence is not our invention, our derived variant ming (SDP) algorithm proposed in the seminal work equivalence not only leads to an efficient algo- of (Lanckrietet al., 2004b), in (Bach etal., 2004), a rithm for MKL, but also generalizes to the case block-norm regularization method based on Second Or- for L -MKL (p ≥ 1 and denoting the L -norm p p der Cone Programming(SOCP) was proposed in order to of kernel weights). Therefore, our proposed al- solve medium-scale problems. Due to the high compu- gorithm provides a unified solution for the en- tation cost of SDP and SOCP, these methods cannot pro- tire family of L -MKL models. Experiments p cess more kernels and more training data. Recent studies on multiple data sets show the promising per- suggest that alternating approaches (Sonnenburg et al., formance of the proposed technique compared 2006; Rakotomamonjy et al., 2008; Xu et al., 2009a) are with other competitive methods. more efficient. These approaches alternate between the optimization of kernel weights and the optimiza- tion of SVM classifiers. In each step, given the cur- 1. Introduction rent solution of kernel weights, it solves a classical Multiple kernel learning (MKL) has been an at- SVM with the combined kernel; then a specific pro- tractive topic in machine learning (Lanckrietet al., cedure is used to update the kernel weights. The ad- vantage of the alternating scheme is that SVM solvers Appearing in Proceedings of the 27 th International Conference can be very efficient due to recent advances in large on Machine Learning, Haifa, Israel, 2010. Copyright 2010 by the scale optimization (Bottou & Lin, 2007). However, the author(s)/owner(s). current approaches for updating the kernel weights are still time consuming and most of them depend on solves the related optimization problem without a Tay- commercial softwares. For example, the well-known lor approximation. Experimental results on multiple data Shogun (www.shogun-toolbox.org/) toolbox of sets show the promising performanceof the proposed op- MKL employs CPLEX for solving the linear program- timization method. ming problem. The rest of this paper is organized as follows. Section The second direction is to improve the accuracy of MKL 2 presents the related work on multiple kernel learning. by exploring the possible combination ways of base ker- Section 3 first presents the variational equivalence be- nels. L1-norm of the kernel weights, also known as tween MKL and group lasso, followed by the optimiza- the simplex constraint, is mostly used in MKL meth- tion method and its generalization to Lp-norm MKL. ods. The advantage of the simplex constraint is that Section 4 shows the experimental results and Section 5 it leads to a sparse solution, i.e., only a few base ker- concludes this paper. nels among manys carry significant weights. However, as argued in (Kloft et al., 2008), the simplex constraint 2. Related Work may discard complementary information when base ker- n×d nels encodes orthogonal information, leading to subop- Let X = (x1,..., xn) ∈ R denote the collection timal performance. To improve the accuracy in this of n training samples that are in a d-dimensional space. n scenario, an L2-norm of the kernel weights, known as Let y = (y1,y2,...,yn) ∈ {−1, +1} denote the bi- a ball constraint, is introduced in their work. A nat- nary class labels for the data points in X. Multiple ker- ural extension to the L2-norm is the Lp-norm, which nel learning is often cast into the following optimization is approximated by the second order Taylor expansion problem: and therefore leads to a convex optimization problem n (Kloft et al., 2009). Another possible extension is to ex- 1 2 min kfkHγ + C `(yif(xi)), (1) f∈Hγ 2 plore the grouping property or the mixed-norm combi- i=1 nation, which is helpful when there are principle com- X ponents among the base kernels (Saketha Nath et al., where Hγ is a reproducing kernel Hilbert space parame- terized by γ and `(·) is a loss function. Hγ is endowed 2009; Szafranski et al., 2008). Other researchers also m study the possibility of non-linear combination of kernels with kernel function κ(·, ·; γ)= j=1 γj κj (·, ·). (Varma & Babu, 2009; Cortes et al., 2009b). Although When the Hinge loss is employed,P the dual problem of the solution space has been enlarged, the non-linear com- MKL (Lanckriet et al., 2004b) is equivalent to: bination usually results in non-convex optimization prob- lem, leading to even higher computational cost. More- m > 1 > over, the solution of nonlinear combination is difficult to min max 1 α − (α ◦ y) γj Kj (α ◦ y), (2) γ∈∆ α∈Q 2 j=1 interpret. It is important to note that the choice of com- X bination usually depends on the composition of base ker- where ∆ is the domain of γ and Q is the domain of α. 1 nels and it is not the case that a non-linear combination m is a vector of all ones, {Kj}j=1 is a group of base kernel is superior than the traditional L1-norm combination. 0 matrices associated with Hj , and ◦ defines the element- To this end, we first derive a variation of the equiv- wise product between two vectors. The domain Q is usu- alence between MKL and group lasso (Yuan& Lin, ally defined as: 2006). Based on the equivalence, we transform the re- Rn > lated convex-concave optimization of MKL into a joint- Q = {α ∈ : α y =0, 0 ≤ α ≤ C}. (3) minimization problem. We further obtain a closed-form solution for the kernel weights, which therefore greatly It is interesting to discuss the domain of γ. When γ ∈ ∆ Rm m improves the computational efficiency. It should be lies in a simplex, i.e., ∆ = {γ ∈ + : j=1 γj = noted that although the consistency between MKL and 1, γj ≥ 0}, we call it L1-norm of kernel weights. Most P group lasso is discussed in (Rakotomamonjy et al., 2008; MKL methods fall in this category. Correspondingly, Rm Bach, 2008), our obtained variational equivalence leads when ∆ = {γ ∈ + : kγkp ≤ 1, γj ≥ 0}, we call to a stable optimization algorithm. On the other hand, the it Lp-norm of kernel weights and the resulting model proposed optimization algorithm could also be instruc- Lp-MKL. A special case is L2-MKL (Kloft et al., 2008; tive to the optimization of group-lasso. We further show Cortes et al., 2009a). that Lp-norm formulation for MKL is also equivalent to It is easy to verify that the overall optimization problem an optimization function with a different group regular- (2) is convex on γ and concave on α. Thus the above op- izer, which does not appear in literature. Compared to timization problem can be regarded as a convex-concave the approach in (Kloft et al., 2009), our approachdirectly problem. Based on the convex-concave nature of the above prob- tween MKL and group lasso is not our invention (Bach, lem, a number of algorithms have been proposed, which 2008), the variational formulation derived in this work alternate between the optimization of kernel weights and is more effective in motivating an efficient optimization the optimization of the SVM classifier. In each step, algorithm for MKL. Furthermore, the alternative formu- given the current solution of kernel weights, they solve lation will lead to the equivalence between Lp-MKL a classical SVM with the combined kernel; then a spe- (p ≥ 1) and group regularizer. cific procedure is used to update the kernel weights. For example, the Semi-Infinite Linear Programming 3.1. Connection between MKL and Group Lasso (SILP) approach, developed in (Sonnenburg et al., 2006; Kloft et al., 2009), constructs a cutting plane model for For convenience of presentation, we first start from the the objective function and updates the kernel weights by setting of L1-norm for kernel weights.
Recommended publications
  • Engineering Applications of Artificial Intelligence Potential, Challenges and Future Directions for Deep Learning in Prognostics
    Engineering Applications of Artificial Intelligence 92 (2020) 103678 Contents lists available at ScienceDirect Engineering Applications of Artificial Intelligence journal homepage: www.elsevier.com/locate/engappai Potential, challenges and future directions for deep learning in prognostics and health management applications Olga Fink a,<, Qin Wang a, Markus Svensén b, Pierre Dersin c, Wan-Jui Lee d, Melanie Ducoffe e a Intelligent Maintenance Systems, ETH Zurich, Switzerland b GE Aviation Digital Group, United Kingdom c Alstom, France d Dutch Railways, Netherlands e Airbus, France ARTICLEINFO ABSTRACT Keywords: Deep learning applications have been thriving over the last decade in many different domains, including Deep learning computer vision and natural language understanding. The drivers for the vibrant development of deep learning Prognostics and health management have been the availability of abundant data, breakthroughs of algorithms and the advancements in hardware. GAN Despite the fact that complex industrial assets have been extensively monitored and large amounts of condition Domain adaptation monitoring signals have been collected, the application of deep learning approaches for detecting, diagnosing Fleet PHM Deep reinforcement learning and predicting faults of complex industrial assets has been limited. The current paper provides a thorough Physics-induced machine learning evaluation of the current developments, drivers, challenges, potential solutions and future research needs in the field of deep learning applied to Prognostics and Health Management (PHM) applications. 1. Today's challenges in PHM applications by measurement and transmission noise. In many cases, the sensors are partly redundant, having several sensors measuring the same system The goal of Prognostics and Health Management (PHM) is to pro- parameter.
    [Show full text]
  • Implicit Kernel Learning
    Implicit Kernel Learning Chun-Liang Li Wei-Cheng Chang Youssef Mroueh Yiming Yang Barnabás Póczos {chunlial, wchang2, yiming, bapoczos}@cs.cmu.edu [email protected] Carnegie Mellon University and IBM Research Abstract discrepancy (MMD) (Gretton et al., 2012)isapowerful two-sample test, which is based on a statistics computed via kernel functions. Even though there is a surge of Kernels are powerful and versatile tools in deep learning in the past years, several successes have machine learning and statistics. Although the been shown by kernel methods and deep feature extrac- notion of universal kernels and characteristic tion. Wilson et al. (2016)demonstratestate-of-the-art kernels has been studied, kernel selection still performance by incorporating deep learning, kernel greatly influences the empirical performance. and Gaussian process. Li et al. (2015); Dziugaite et al. While learning the kernel in a data driven way (2015)useMMDtotraindeepgenerativemodelsfor has been investigated, in this paper we explore complex datasets. learning the spectral distribution of kernel via implicit generative models parametrized by In practice, however, kernel selection is always an im- deep neural networks. We called our method portant step. Instead of choosing by a heuristic, several Implicit Kernel Learning (IKL). The proposed works have studied kernel learning.Multiplekernel framework is simple to train and inference is learning (MKL) (Bach et al., 2004; Lanckriet et al., performed via sampling random Fourier fea- 2004; Bach, 2009; Gönen and Alpaydın, 2011; Duve- tures. We investigate two applications of the naud et al., 2013)isoneofthepioneeringframeworks proposed IKL as examples, including genera- to combine predefined kernels. One recent kernel learn- tive adversarial networks with MMD (MMD ing development is to learn kernels via learning spectral GAN) and standard supervised learning.
    [Show full text]
  • Exploring Induced Pedagogical Strategies Through a Markov Decision Process Frame- Work: Lessons Learned
    Exploring Induced Pedagogical Strategies Through a Markov Decision Process Frame- work: Lessons Learned Shitian Shen Behrooz Mostafavi North Carolina State University North Carolina State University [email protected] [email protected] Tiffany Barnes Min Chi North Carolina State University North Carolina State University [email protected] [email protected] An important goal in the design and development of Intelligent Tutoring Systems (ITSs) is to have a sys- tem that adaptively reacts to students’ behavior in the short term and effectively improves their learning performance in the long term. Inducing effective pedagogical strategies that accomplish this goal is an essential challenge. To address this challenge, we explore three aspects of a Markov Decision Process (MDP) framework through four experiments. The three aspects are: 1) reward function, detecting the impact of immediate and delayed reward on effectiveness of the policies; 2) state representation, explor- ing ECR-based, correlation-based, and ensemble feature selection approaches for representing the MDP state space; and 3) policy execution, investigating the effectiveness of stochastic and deterministic policy executions on learning. The most important result of this work is that there exists an aptitude-treatment interaction (ATI) effect in our experiments: the policies have significantly different impacts on the par- ticular types of students as opposed to the entire population. We refer the students who are sensitive to the policies as the Responsive group. All our following results are based on the Responsive group. First, we find that an immediate reward can facilitate a more effective induced policy than a delayed reward. Second, The MDP policies induced based on low correlation-based and ensemble feature selection ap- proaches are more effective than a Random yet reasonable policy.
    [Show full text]
  • Multiple-Kernel Learning for Genomic Data Mining and Prediction
    bioRxiv preprint doi: https://doi.org/10.1101/415950; this version posted September 13, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Multiple-kernel learning for genomic data mining and prediction Christopher M. Wilson1, Kaiqiao Li2, Pei-Fen Kuan2, and Xuefeng Wang1,2,* 1 Department of Biostatistics and Bioinformatics, Moffitt Cancer Center 2 Department of Applied Mathematics and Statistics, Stony Brook University * [email protected] Abstract Advances in medical technology have allowed for customized prognosis, diagnosis, and personalized treatment regimens that utilize multiple heterogeneous data sources. Multi- ple kernel learning (MKL) is well suited for integration of multiple high throughput data sources, however, there are currently no implementations of MKL in R. In this paper, we give some background material for support vector machine (SVM) and introduce an R package, RMKL, which provides R and C++ code to implement several MKL algorithms for classification and regression problems. The provided implementations of MKL are compared using benchmark data and TCGA ovarian cancer. We demonstrate that combining multiple data sources can lead to a better classification scheme than simply using a single data source. 1 Introduction 1 Integrating multiple heterogeneous high throughput data sources is an emerging topic 2 of interest in cancer research. Making decisions based upon metabolomic, genomic, 3 etc. data sources can lead to better prognosis or diagnosis than simply using clinical 4 data alone.
    [Show full text]
  • Clustering on Optimized Features
    [VOLUME 6 I ISSUE 1 I JAN. – MARCH 2019] e ISSN 2348 –1269, Print ISSN 2349-5138 http://ijrar.com/ Cosmos Impact Factor 4.236 PARAMETER-FREE INCREMENTAL CO- CLUSTERING ON OPTIMIZED FEATURES S. R. Sandhan1 & Prof. S. S. Banait2 1M.E. Student, 2Assistant Professor Department of Computer Engineering, K. K. Wagh Institute of Engineering Education & Research, Nashik, Savitribai Phule Pune University, Maharashtra, India Received: December 05, 2018 Accepted: January 16, 2019 ABSTRACT: To improve effectiveness and efficiency for clustering dynamic multi-modal data in cyber- physical-social systems, several co-clustering approaches have been proposed with the rapid growth of Cyber- Physical-Social Systems. Large amounts of dynamic multi-modal data are being generated and collected, so clustering Multi modal dynamic data is challenging. In high- dimensional dynamic data, classic metrics fail in identifying real similarities between objects. Moreover, the huge number of features makes the cluster interpretation hard. In the proposed method, the single-modality similarity measure is extended to multiple modalities and three operations, those are cluster creating, cluster merging, and instance partitioning, are defined to incrementally integrate new arriving objects to current clustering patterns without introducing additive parameters. Moreover, an adaptive weight scheme is designed to measure the importance of feature modalities based on the intra-cluster scatters and also add feature selection method for reducing time. Extensive experiments on various multimodal dynamic datasets demonstrate the effectiveness of the proposed approach. Key Words: Parameter-Free learning; Multi modal data; Clustering; Feature selection. I. Introduction Clustering is a popular data mining technique that enables to partition data into groups (clusters) in such a way that objects inside a group are similar to each other, and objects belonging to different groups are dissimilar.
    [Show full text]
  • Fredholm Multiple Kernel Learning for Semi-Supervised Domain Adaptation
    Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Fredholm Multiple Kernel Learning for Semi-Supervised Domain Adaptation Wei Wang,1 Hao Wang,1,2 Chen Zhang,1 Yang Gao1 1. Science and Technology on Integrated Information System Laboratory 2. State Key Laboratory of Computer Science Institute of Software, Chinese Academy of Sciences, Beijing 100190, China [email protected] Abstract port vector machine, logistic regression and statistic clas- sifiers, to the target domain. However, they do not ex- As a fundamental constituent of machine learning, domain plicitly explore the distribution inconsistency between the adaptation generalizes a learning model from a source do- main to a different (but related) target domain. In this paper, two domains, which essentially limits their capability of we focus on semi-supervised domain adaptation and explic- knowledge transfer. To this end, many semi-supervised do- itly extend the applied range of unlabeled target samples into main adaptation methods are proposed within the classifier the combination of distribution alignment and adaptive clas- adaptation framework (Chen, Weinberger, and Blitzer 2011; sifier learning. Specifically, our extension formulates the fol- Duan, Tsang, and Xu 2012; Sun et al. 2011; Wang, Huang, lowing aspects in a single optimization: 1) learning a cross- and Schneider 2014). Compared with supervised ones, they domain predictive model by developing the Fredholm integral take into account unlabeled target samples to cope with the based kernel prediction framework; 2) reducing the distribu- inconsistency of data distributions, showing improved clas- tion difference between two domains; 3) exploring multiple sification and generalization capability. Nevertheless, most kernels to induce an optimal learning space.
    [Show full text]
  • Study of Kernel Machines Towards Brain-Computer Interfaces Tian Xilan
    Study of kernel machines towards Brain-computer Interfaces Tian Xilan To cite this version: Tian Xilan. Study of kernel machines towards Brain-computer Interfaces. Artificial Intelligence [cs.AI]. INSA de Rouen, 2012. English. tel-00699659 HAL Id: tel-00699659 https://tel.archives-ouvertes.fr/tel-00699659 Submitted on 21 May 2012 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Th`ese Pr´esent´ee devant l’Institut National des Sciences Appliqu´ees de Rouen pour obtenir Docteur en Sciences Mention Informatique Apprentissage et Noyau pour les Interfaces Cerveau-machine par Xilan TIAN Equipe: LITIS Ecole doctorale: SPMII soutenue le 7 Mai 2012 devant la commission d’examen Rapporteurs : H´el`ene Paugam-Moisy - Universit´ede Lyon Val´erie Louis-Dorr - INPL Directeur : St´ephane Canu - INSA de Rouen Encadrant : Gilles Gasso - INSA de Rouen Examinateurs : Marie-F Lucas - Ecole Centrale de Nantes Alain Rakotomamonjy - Universit´ede Rouen R´esum´e Les Interface Cerveau-Machine (ICM) ont ´et´eappliqu´ees avec succ`es aussi bien dans le do- maine clinique que pour l’am´elioration de la vie quotidienne de patients avec des handicaps. En tant que composante essentielle, le module de traitement du signal d´etermine nettement la performance d’un syst`eme ICM.
    [Show full text]
  • Multiclass Multiple Kernel Learning
    Multiclass Multiple Kernel Learning Alexander Zien [email protected] Cheng Soon Ong [email protected] Max Planck Inst. for Biol. Cybernetics and Friedrich Miescher Lab., Spemannstr. 39, T¨ubingen, Germany. Abstract evant and meaningful features [2, 13, 20]. In many applications it is desirable to learn Since in many real world applications more than two from several kernels. “Multiple kernel learn- classes are to be distinguished, there has been a lot of ing” (MKL) allows the practitioner to opti- work on decomposing multiclass problems into several mize over linear combinations of kernels. By standard binary classification problems, or developing enforcing sparse coefficients, it also general- genuine multiclass classifiers. The latter has so far izes feature selection to kernel selection. We been restricted to single kernels. In this paper we de- propose MKL for joint feature maps. This velop and investigate a multiclass extension of MKL. provides a convenient and principled way for We provide: MKL with multiclass problems. In addition, we can exploit the joint feature map to learn • An intuitive formulation of the multiclass MKL kernels on output spaces. We show the equiv- task, with a new sparsity-promoting regularizer. alence of several different primal formulations • A proof of equivalence with (the multiclass exten- including different regularizers. We present sion of) a previous MKL formulation. several optimization methods, and compare • Several optimization approaches, with a compu- a convex quadratically constrained quadratic tational comparison between three of them. program (QCQP) and two semi-infinite linear • Experimental results on several datasets demon- programs (SILPs) on toy data, showing that strating the method’s utility.
    [Show full text]
  • Bridging Deep and Kernel Methods
    Bridging deep and kernel methods Llu´ıs A. Belanche1 and Marta R. Costa-juss`a2 ∗ 1- Computer Science School - Dept of Computer Science Universitat Polit`ecnica de Catalunya Jordi Girona, 1-3, 08034 Barcelona, Catalonia, Spain 2- TALP Research Center - Dept of Signal Theory and Communications Universitat Polit`ecnica de Catalunya Jordi Girona, 1-3, 08034 Barcelona, Catalonia, Spain Abstract. There has been some exciting major progress in recent years in data analysis methods, including a variety of deep learning architec- tures, as well as further advances in kernel-based learning methods, which have demonstrated predictive superiority. In this paper we provide a brief motivated survey of recent proposals to explicitly or implicitly combine kernel methods with the notion of deep learning networks. 1 Introduction Multilayer artificial neural networks have experienced a rebirth in the data anal- ysis field, displaying impressive empirical results, extending even to classical ar- tificial intelligence domains, such as game playing, computer vision, multimodal recognition, natural language processing and speech processing. The versatility of such methods to perform non-linear feature extraction and deal with a di- versity of complex data –such as video, audio and text– has lead many-layered (“deep”) parametric models to get over well-established learning methods, like kernel machines or classical statistical techniques. Deep neural networks, for ex- ample, are very successful in machine learning applications although their num- ber of parameters is typically orders of magnitude larger than the number of training examples. However, their training is a delicate and costly optimization problem that raises many practical challenges. Learning several representational levels to optimize a final objective is the main trademark of deep learning, from which multiple tasks and fields have benefited lately.
    [Show full text]
  • A Multiple Kernel Density Clustering Algorithm for Incomplete Datasets in Bioinformatics Longlong Liao1,2, Kenli Li3*, Keqin Li4, Canqun Yang1,2 and Qi Tian5
    Liao et al. BMC Systems Biology 2018, 12(Suppl 6):111 https://doi.org/10.1186/s12918-018-0630-6 RESEARCH Open Access A multiple kernel density clustering algorithm for incomplete datasets in bioinformatics Longlong Liao1,2, Kenli Li3*, Keqin Li4, Canqun Yang1,2 and Qi Tian5 From IEEE International Conference on Bioinformatics and Biomedicine 2017 Kansas City, MO, USA. 13-16 November 2017 Abstract Background: While there are a large number of bioinformatics datasets for clustering, many of them are incomplete, i.e., missing attribute values in some data samples needed by clustering algorithms. A variety of clustering algorithms have been proposed in the past years, but they usually are limited to cluster on the complete dataset. Besides, conventional clustering algorithms cannot obtain a trade-off between accuracy and efficiency of the clustering process since many essential parameters are determined by the human user’s experience. Results: The paper proposes a Multiple Kernel Density Clustering algorithm for Incomplete datasets called MKDCI. The MKDCI algorithm consists of recovering missing attribute values of input data samples, learning an optimally combined kernel for clustering the input dataset, reducing dimensionality with the optimal kernel based on multiple basis kernels, detecting cluster centroids with the Isolation Forests method, assigning clusters with arbitrary shape and visualizing the results. Conclusions: Extensive experiments on several well-known clustering datasets in bioinformatics field demonstrate the effectiveness of the proposed MKDCI algorithm. Compared with existing density clustering algorithms and parameter-free clustering algorithms, the proposed MKDCI algorithm tends to automatically produce clusters of better quality on the incomplete dataset in bioinformatics.
    [Show full text]
  • LMKL-Net: a Fast Localized Multiple Kernel Learning Solver Via Deep Neural Networks
    MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com LMKL-Net: A Fast Localized Multiple Kernel Learning Solver via Deep Neural Networks Zhang, Z. TR2018-072 July 12, 2018 Abstract In this paper we propose solving localized multiple kernel learning (LMKL) using LMKL-Net, a feedforward deep neural network. In contrast to previous works, as a learning principle we propose parameterizing both the gating function for learning kernel combination weights and the multiclass classifier in LMKL using an attentional network (AN) and a multilayer perceptron (MLP), respectively. In this way we can learn the (nonlinear) decision function in LMKL (approximately) by sequential applications of AN and MLP. Empirically on benchmark datasets we demonstrate that overall LMKL-Net can not only outperform the state-of-theart MKL solvers in terms of accuracy, but also be trained about two orders of magnitude faster with much smaller memory footprint for large-scale learning. arXiv This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. Copyright c Mitsubishi Electric Research Laboratories, Inc., 2018 201 Broadway, Cambridge, Massachusetts 02139 LMKL-Net: A Fast Localized Multiple Kernel Learning Solver via Deep Neural Networks Ziming Zhang Mitsubishi Electric Research Laboratories (MERL) Cambridge, MA 02139-1955 [email protected] Abstract In this paper we propose solving localized multiple kernel learning (LMKL) using LMKL-Net, a feedforward deep neural network.
    [Show full text]
  • Large Scale Multiple Kernel Learning
    Journal of Machine Learning Research 7 (2006) 1531–1565 Submitted 10/05; Published 7/06 Large Scale Multiple Kernel Learning Sören Sonnenburg [email protected] Fraunhofer FIRST.IDA Kekuléstrasse 7 12489 Berlin, Germany Gunnar Rätsch [email protected] Friedrich Miescher Laboratory of the Max Planck Society Spemannstrasse 39 Tübingen, Germany Christin Schäfer [email protected] Fraunhofer FIRST.IDA Kekuléstrasse 7 12489 Berlin, Germany Bernhard Schölkopf BERNHARD [email protected] Max Planck Institute for Biological Cybernetics Spemannstrasse 38 72076, Tübingen, Germany Editors: Kristin P. Bennett and Emilio Parrado-Hernández Abstract While classical kernel-based learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constrained quadratic program. We show that it can be rewritten as a semi-infinite linear program that can be efficiently solved by recy- cling the standard SVM implementations. Moreover, we generalize the formulation and our method to a larger class of problems, including regression and one-class classification. Experimental re- sults show that the proposed algorithm works for hundred thousands of examples or hundreds of kernels to be combined, and helps for automatic model selection, improving the interpretability of the learning result. In a second part we discuss general speed up mechanism for SVMs, especially when used with sparse feature maps as appear for string kernels, allowing us to train a string kernel SVM on a 10 million real-world splice data set from computational biology.
    [Show full text]