Graph Matching Networks for Learning the Similarity of Graph Structured Objects

Total Page:16

File Type:pdf, Size:1020Kb

Graph Matching Networks for Learning the Similarity of Graph Structured Objects Graph Matching Networks for Learning the Similarity of Graph Structured Objects Yujia Li 1 Chenjie Gu 1 Thomas Dullien 2 Oriol Vinyals 1 Pushmeet Kohli 1 Abstract In the past few years graph neural networks (GNNs) have emerged as an effective class of models for learning rep- This paper addresses the challenging problem of resentations of structured data and for solving various su- retrieval and matching of graph structured ob- pervised prediction problems on graphs. Such models are jects, and makes two key contributions. First, we invariant to permutations of graph elements by design and demonstrate how Graph Neural Networks (GNN), compute graph node representations through a propagation which have emerged as an effective model for var- process which iteratively aggregates local structural infor- ious supervised prediction problems defined on mation (Scarselli et al., 2009; Li et al., 2015; Gilmer et al., structured data, can be trained to produce embed- 2017). These node representations are then used directly for ding of graphs in vector spaces that enables effi- node classification, or pooled into a graph vector for graph cient similarity reasoning. Second, we propose a classification. Problems beyond supervised classification or novel Graph Matching Network model that, given regression are relatively less well-studied for GNNs. a pair of graphs as input, computes a similarity In this paper we study the problem of similarity learning score between them by jointly reasoning on the for graph structured objects, which appears in many impor- pair through a new cross-graph attention-based tant real world applications, in particular similarity based matching mechanism. We demonstrate the ef- retrieval in graph databases. One motivating application fectiveness of our models on different domains is the computer security problem of binary function simi- including the challenging problem of control-flow- larity search, where given a binary which may or may not graph based function similarity search that plays contain code with known vulnerabilities, we wish to check an important role in the detection of vulnerabili- whether any control-flow-graph in this binary is sufficiently ties in software systems. The experimental analy- similar to a database of known-vulnerable functions. This sis demonstrates that our models are not only able helps identify vulnerable statically linked libraries in closed- to exploit structure in the context of similarity source software, a recurring problem (CVE, 2010; 2018) for learning but they can also outperform domain- which no good solutions are currently available. Figure1 specific baseline systems that have been carefully shows one example from this application, where the binary hand-engineered for these problems. functions are represented as control flow graphs annotated with assembly instructions. This similarity learning prob- lem is very challenging as subtle differences can make two 1. Introduction graphs be semantically very different, while graphs with different structures can still be similar. A successful model arXiv:1904.12787v2 [cs.LG] 12 May 2019 Graphs are natural representations for encoding relational for this problem should therefore (1) exploit the graph struc- structures that are encountered in many domains. Expect- tures, and (2) be able to reason about the similarity of graphs edly, computations defined over graph structured data are both from the graph structures as well as from learned se- employed in a wide variety of fields, from the analysis of mantics. molecules for computational biology and chemistry (Gilmer et al., 2017; Yan et al., 2005), to the analysis of knowl- In order to solve the graph similarity learning problem, we edge graphs or graph structured parses for natural language investigate the use of GNNs in this context, explore how understanding. they can be used to embed graphs into a vector space, and learn this embedding model to make similar graphs close 1 2 DeepMind Google. Correspondence to: Yujia Li <yu- in the vector space, and dissimilar graphs far apart. One [email protected]>. important property of this model is that, it maps each graph Proceedings of the 36 th International Conference on Machine independently to an embedding vector, and then all the sim- Learning, Long Beach, California, PMLR 97, 2019. Copyright ilarity computation happens in the vector space. Therefore, 2019 by the author(s). Graph Matching Networks 0xf90 push R15, RSP push R14, RSP 0x138c70 mov R14D , ESI push RBP, RSP push R13, RSP push R15, RSP push R12, RSP push R14, RSP push R12, RSP beddings for similarity learning; (2) we propose the new mov R13D , EDI push RBX, RSP push RBP, RSP mov R12, RDX push RBX, RSP mov RBP, RDI mov EDI, 560 mov RDI, [R12 + 28] mov EBP, EDX xor R15D , R15D 0x138cd3 test RDI, RDI 0x1950 sub RSP, 18 jz 138e4e mov RAX, [28] push RBP, RSP mov [ RSP + 8], RAX push R15, RSP xor EAX, EAX call f70 0x138c8f Graph Matching Networks that computes similarity through push R14, RSP 0x138d02 push R13, RSP cmp RDI, [R12 + 30] jnbe 138e4e push R12, RSP push RBX, RSP movsxd R14, EAX push RAX, RSP 0xfc0 mov EBP, EDX 0x138c9a mov RBX, RAX mov R14D , ESI mov EBX, [R12] lea EAX, R13 + f mov EDX, [RSI + 10] mov R15D , EDI mov ECX, 10 mov R11D , [RSI + 18] cross-graph attention-based matching; (3) empirically we mov EDI, 560 mov R15D , EAX mov R10, [RSI] lea R8, RDI + R14 * 1 mov [ R12 + 28], 0 mov EAX, EDX call 1028 cdq EDX, EAX shr EAX, 3 and R15D , f0 mov EAX, [R10 + RAX * 1 ] idiv EDX, EAX, ECX mov ECX, EDX test RBX, RBX and CL, 7 shr EAX, CL jz 108a mov ECX, 20 cmp R8, [R12 + 20] jmp 138e4e sub ECX, EBX shl/sal EAX, CL 0x196d shr EAX, CL add EDX, EBX show that the proposed graph similarity learning models mov RBX, RAX cmp R11D , EDX 0xfdf cmovbe EDX, R11D test RBX, RBX mov [ RSI + 10], EDX jbe 138d10 jz 19ac test EAX, EAX 0x108a lea R12D , RAX + 2 jz 138d02 lea RAX, 2179f9 xor EAX, EAX test BPL, 8 mov [ RBX], RAX jz 100e 0x138cd3 0x138d02 movsxd R14, EAX mov [ R12 + 28], 0 lea R8, RDI + R14 * 1 achieve good performance across a range of applications, 0x1975 jmp 138e4e cmp R8, [R12 + 20] jbe 138d10 lea R12D , R15 + f mov EAX, R12D 0xff3 0x19ac sar EAX, 1f shr EAX, 1c mov EAX, EBP 0x138d10 and EAX, 3 xor EBX, EBX lea R13D , R15 + RAX * 1 + f 0x100e mov EAX, EDX jmp 1a44 mov [ RBX + 520 ], EAX shr EAX, 3 sar R13D , 4 0x138ce1 mov [ RBX + 520 ], 1 mov EAX, EBP movzx EAX, [R10 + RAX * 1 ] xor EBP, EBP outperforming structure agnostic models and established lea RAX, 21bcc9 sar EAX, 4 lea RDX, b18f00 mov [ RBX + 524 ], 1 cmp EDX, R11D mov ESI, 10 mov [ RBX], RAX and EAX, 3 setl BPL xor EAX, EAX add EBP, EDX test BPL, 8 mov [ RBX + 524 ], EAX mov RDI, RBP mov [ RSI + 10], EBP jmp 1022 call 67b88 jnz 19b3 and DL, 7 movzx ECX, DL bt EAX, ECX jnb 138d8a 0x1022 hand-engineered baselines. 0x19b3 0x138d10bt EBP, 13 0x138d33 0x199b jnb 1035 mov EAX, EBP mov EAX, EBP shr EAX, 3 and EAX, 3 mov EDX, [R10 + RAX * 1 ] mov [ RBX + 520 ], 1 mov ECX, EBP mov [ RBX + 520 ], EAX mov EAX, 1 0x138d8a and CL, 7 0x138cf7 shr EDX, CL mov EAX, EBP cmp RDI, R8 mov R15D , bebbb1b7 jmp 19c6 add EBP, 4 jnb 138e4e jmp 138e4e shr EAX, 4 0x1035 cmp R11D , EBP and EAX, 3 cmovbe EBP, R11D mov EAX, EBP xor R15D , R15D 0x1028 and EDX, f movshr EAX, 1f EAX, EDX mov [ RSI + 10], EBP bt EBP, 1d call f10 jz 138e3c mov [ RBX + 514 ], EAX jnb 104d 0x19c6 shr EAX, 3 0x138d5c and R12D , f0 mov EAX, EBP add R13D , 2 shr EAX, 3 movzx EAX, [R10 + RAX * 1 ] mov [ RBX + 524 ], EAX 0x102d 0x1046 mov ECX, EBP test EBP, 80000 and CL, 7 0x138ce1 shr EAX, CL mov [ RBX + 514 ], EAX 0x138d93 jnz 1a1b movzxor [ RBX + 514 ], 2 EAX, [R10 + RAX 0x138e3c * and1 ]EAX, 1 jmp 1067 lea R9, 10e02a9 xor ECX, ECX 2. Related Work xor EDX, EDX jmp 138daa cmp EBP, R11D setl CL add ECX, EBP mov [ RSI + 10], ECX mov ECX, EAX xor EBP, EBP neg ECX 0x104d 0x19dc xor EDX, ECX add EDX, EAX bt EBP, 1e jmp 138e3e mov ECX, EBP lea RDX, b18f00 jnb 105a shr ECX, 1f cmp EDX, R11D mov EAX, EBP 0x138daa shr EAX, 1c = 6= 0x1a1b and EAX, 2 movsxd RAX, [R12 + 4] 0x1053 lea RAX, RAX + RAX * 2 mov ESI, 10 lea EDX, RAX + RCX * 1 mov RDX, [R9 + RAX * 8 + 8 ] call fc0 or [ RBX + 514 ], 4 mov ECX, EBP test EBP, 40000000 setl BPL shr ECX, 3 Graph Neural Networks and Graph Representation lea ECX, RAX + RCX * 1 + 4 mov EBX, [R10 + RCX * 1 ] 0x138e3e mov ECX, EBP cmovne EDX, ECX and CL, 7 movzx ESI, DL shr EBX, CL mov RDX, R14 xor EAX, EAX mov [ RBX + 514 ], EDX mov ECX, 20 call 67248 test EBP, 10000000 0x105a sub ECX,[ R9 + RAX*8 ] add EBP, EDX shl/sal EBX, CL jz 1a26 bt EBP, 1c shr EBX, CL movsx EAX, [RDX + RBX * 4 + 2 ] jnb 1067 add EAX, EBP cmp R11D , EAX mov RDI, RBP mov [ RSI + 10],jnbe 138de4EBP Learning The history of graph neural networks (GNNs) 0x1a07 0x1060 0x138de1 0x138e49 call 67b88 0x1a20 test EBP, 40000000 or [ RBX + 514 ], 1 mov EAX, R11D add [ R12 + 28], R14 cmove ECX, EAX and DL, 7 mov [ RBX + 514 ], EAX or ECX, 1 mov [ RBX + 514 ], ECX 0x138de4 goes back to at least the early work by Gori et al.(2005) jmp 1a26 0x1067 movsx RCX, [RDX + RBX * 4 ] mov [ RSI + 10], EAX movzx mov R8 D, R12 D ECX, DL movzx EDX, [R12 + RCX * 1 + 8 ] mov ECX, R15D test EDX, EDX mov EDX, R14D jz 138e21 mov ESI, R13D 0x1a26 mov RDI, RBX call 4ab0 bt EAX, ECX 0x138df6 mov RDI, RBX and Scarselli et al.(2009), who proposed to use a propaga- mov ESI, R15D mov ECX, EAX shr ECX, 3 mov EDX, R14D movzx EBP, [R10 + RCX * 1 ] mov ECX, R12D mov ECX, EAX 0x107b and CL, 7 mov R8D, R13D jnb 138d8a shr EBP, CL call 1a56 and EBP, 1 mov [ RBX + 510 ], ffffffff xor ECX, ECX mov RAX, RBX cmp EAX, R11D setl CL jmp 108c add ECX, EAX mov [ RSI + 10], ECX tion process to learn node representations.
Recommended publications
  • Deep Similarity Learning for Sports Team Ranking
    Deep Similarity Learning for Sports Team Ranking Daniel Yazbek†, Jonathan Sandile Sibindi∗ Terence L. Van Zyl School of Computer Science and Applied Mathematics Institute for Intelligent Systems University of the Witswatersrand University of Johannesburg Johannesburg, South Africa Johannesburg, South Africa †[email protected], ∗[email protected] [email protected] Abstract—Sports data is more readily available and conse- feasible and economically desirable to various players in the quently, there has been an increase in the amount of sports betting industry. Further, accurate game predictions can aid analysis, predictions and rankings in the literature. Sports are sports teams in making adjustments to teams to achieve the unique in their respective stochastic nature, making analysis, and accurate predictions valuable to those involved in the sport. In best possible future results for that franchise. This increase response, we focus on Siamese Neural Networks (SNN) in unison in economic upside has increased research and analysis in with LightGBM and XGBoost models, to predict the importance sports prediction [26], using a multitude of machine learning of matches and to rank teams in Rugby and Basketball. algorithms [18]. Six models were developed and compared, a LightGBM, a Popular machine learning algorithms are neural networks, XGBoost, a LightGBM (Contrastive Loss), LightGBM (Triplet Loss), a XGBoost (Contrastive Loss), XGBoost (Triplet Loss). designed to learn patterns and effectively classify data into The models that utilise a Triplet loss function perform better identifiable classes. Classification tasks depend upon labelled than those using Contrastive loss. It is clear LightGBM (Triplet datasets; for a neural network to learn the correlation between loss) is the most effective model in ranking the NBA, producing labels and data, humans must transfer their knowledge to a a state of the art (SOTA) mAP (0.867) and NDCG (0.98) dataset.
    [Show full text]
  • RELIEF Algorithm and Similarity Learning for K-NN
    International Journal of Computer Information Systems and Industrial Management Applications. ISSN 2150-7988 Volume 4 (2012) pp. 445-458 c MIR Labs, www.mirlabs.net/ijcisim/index.html RELIEF Algorithm and Similarity Learning for k-NN Ali Mustafa Qamar1 and Eric Gaussier2 1 Assistant Professor, Department of Computing School of Electrical Engineering and Computer Science (SEECS) National University of Sciences and Technology (NUST), Islamabad, Pakistan [email protected] 2Laboratoire d’Informatique de Grenoble Universit´ede Grenoble France [email protected] Abstract: In this paper, we study the links between RELIEF, has paved the way for a new reasearch theme termed met- a well-known feature re-weighting algorithm and SiLA, a sim- ric learning. Most of the people working in this research ilarity learning algorithm. On one hand, SiLA is interested in area are more interested in learning a distance metric (see directly reducing the leave-one-out error or 0 − 1 loss by reduc- e.g. [1, 2, 3, 4]) as compared to a similarity one. However, ing the number of mistakes on unseen examples. On the other as argued by several researchers, similarities should be pre- hand, it has been shown that RELIEF could be seen as a distance ferred over distances on some of the data sets. Similarity is learning algorithm in which a linear utility function with maxi- usually preferred over the distance metric while dealing with mum margin was optimized. We first propose here a version of text, in which case the cosine similarity has been deemed this algorithm for similarity learning, called RBS (for RELIEF- more appropriate as compared to the various distance met- Based Similarity learning).
    [Show full text]
  • Deep Metric Learning: a Survey
    S S symmetry Review Deep Metric Learning: A Survey Mahmut KAYA 1,* and Hasan ¸SakirBILGE˙ 2 1 Department of Computer Engineering, Engineering Faculty, Siirt University, Siirt 56100, Turkey 2 Department of Electrical - Electronics Engineering, Engineering Faculty, Gazi University, Ankara 06570, Turkey * Correspondence: [email protected] Received: 23 July 2019; Accepted: 20 August 2019; Published: 21 August 2019 Abstract: Metric learning aims to measure the similarity among samples while using an optimal distance metric for learning tasks. Metric learning methods, which generally use a linear projection, are limited in solving real-world problems demonstrating non-linear characteristics. Kernel approaches are utilized in metric learning to address this problem. In recent years, deep metric learning, which provides a better solution for nonlinear data through activation functions, has attracted researchers’ attention in many different areas. This article aims to reveal the importance of deep metric learning and the problems dealt with in this field in the light of recent studies. As far as the research conducted in this field are concerned, most existing studies that are inspired by Siamese and Triplet networks are commonly used to correlate among samples while using shared weights in deep metric learning. The success of these networks is based on their capacity to understand the similarity relationship among samples. Moreover, sampling strategy, appropriate distance metric, and the structure of the network are the challenging factors for researchers to improve the performance of the network model. This article is considered to be important, as it is the first comprehensive study in which these factors are systematically analyzed and evaluated as a whole and supported by comparing the quantitative results of the methods.
    [Show full text]
  • Cross-Domain Visual Matching Via Generalized Similarity Measure and Feature Learning
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 Cross-Domain Visual Matching via Generalized Similarity Measure and Feature Learning Liang Lin, Guangrun Wang, Wangmeng Zuo, Xiangchu Feng, and Lei Zhang Abstract—Cross-domain visual data matching is one of the fundamental problems in many real-world vision tasks, e.g., matching persons across ID photos and surveillance videos. Conventional approaches to this problem usually involves two steps: i) projecting samples from different domains into a common space, and ii) computing (dis-)similarity in this space based on a certain distance. In this paper, we present a novel pairwise similarity measure that advances existing models by i) expanding traditional linear projections into affine transformations and ii) fusing affine Mahalanobis distance and Cosine similarity by a data-driven combination. Moreover, we unify our similarity measure with feature representation learning via deep convolutional neural networks. Specifically, we incorporate the similarity measure matrix into the deep architecture, enabling an end-to-end way of model optimization. We extensively evaluate our generalized similarity model in several challenging cross-domain matching tasks: person re-identification under different views and face verification over different modalities (i.e., faces from still images and videos, older and younger faces, and sketch and photo portraits). The experimental results demonstrate superior performance of our model over other state-of-the-art methods. Index Terms—Similarity model, Cross-domain matching, Person verification, Deep learning. F 1 INTRODUCTION ISUAL similarity matching is arguably considered as V one of the most fundamental problems in computer vision and pattern recognition, and this problem becomes more challenging when dealing with cross-domain data.
    [Show full text]
  • Low-Rank Similarity Metric Learning in High Dimensions
    Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Low-Rank Similarity Metric Learning in High Dimensions Wei Liuy Cun Muz Rongrong Ji\ Shiqian Max John R. Smithy Shih-Fu Changz yIBM T. J. Watson Research Center zColumbia University \Xiamen University xThe Chinese University of Hong Kong fweiliu,[email protected] [email protected] [email protected] [email protected] [email protected] Abstract clude supervised metric learning which is the main focus in the literature, semi-supervised metric learning (Wu et al. Metric learning has become a widespreadly used tool in machine learning. To reduce expensive costs brought 2009; Hoi, Liu, and Chang 2010; Liu et al. 2010a; 2010b; in by increasing dimensionality, low-rank metric learn- Niu et al. 2012), etc. The suite of supervised metric learning ing arises as it can be more economical in storage and can further be divided into three categories in terms of forms computation. However, existing low-rank metric learn- of supervision: the methods supervised by instance-level la- ing algorithms usually adopt nonconvex objectives, and bels (Goldberger, Roweis, and Salakhutdinov 2004; Glober- are hence sensitive to the choice of a heuristic low-rank son and Roweis 2005; Weinberger and Saul 2009), the basis. In this paper, we propose a novel low-rank met- methods supervised by pair-level labels (Xing et al. 2002; ric learning algorithm to yield bilinear similarity func- Bar-Hillel et al. 2005; Davis et al. 2007; Ying and Li 2012), tions. This algorithm scales linearly with input dimen- and the methods supervised by triplet-level ranks (Ying, sionality in both space and time, therefore applicable Huang, and Campbell 2009; McFee and Lanckriet 2010; to high-dimensional data domains.
    [Show full text]
  • Learning Mahalanobis Distance Metric: Considering Instance Disturbance Helps∗
    Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) Learning Mahalanobis Distance Metric: Considering Instance Disturbance Helps∗ Han-Jia Ye, De-Chuan Zhan, Xue-Min Si and Yuan Jiang National Key Laboratory for Novel Software Technology Collaborative Innovation Center of Novel Software Technology and Industrialization Nanjing University, Nanjing, 210023, China fyehj, zhandc, sixm, [email protected] Abstract dissimilar ones are far away. Although ground-truth side in- formation leads to a well-learned Mahalanobis metric [Verma Mahalanobis distance metric takes feature weights and Branson, 2015, Cao et al., 2016], it is in fact unknown and correlation into account in the distance com- during the training process. Therefore, side information is putation, which can improve the performance of often generated based on reliable raw features from various many similarity/dissimilarity based methods, such sources. For example, random choice [Davis et al., 2007], Eu- as kNN. Most existing distance metric learning clidean nearest neighbor selection [Weinberger et al., 2006], methods obtain metric based on the raw features and all-pair enumeration [Xing et al., 2003, Mao et al., 2016]. and side information but neglect the reliability of them. Noises or disturbances on instances will To reduce the uncertainty in side information, [Huang et al., make changes on their relationships, so as to af- 2010] and [Wang et al., 2012] try a selection strategy among fect the learned metric. In this paper, we claim that all target neighbors. While it is more reasonable to assume considering disturbance of instances may help the that there are inaccuracies in feature value collection, since metric learning approach get a robust metric, and the feature inaccuracies or noises will damage the structure propose the Distance metRIc learning Facilitated of neighbors, and consequently affect the reliableness of side by disTurbances (DRIFT) approach.
    [Show full text]
  • Unsupervised Domain Adaptation with Similarity Learning
    Unsupervised Domain Adaptation with Similarity Learning Pedro O. Pinheiro Element AI Montreal, QC, Canada [email protected] Abstract The objective of unsupervised domain adaptation is to µ leverageµ features from a labeled source domain and learn µ µ 3 a classifier for an unlabeled target domain, with a simi- 1 lar but different data distribution. Most deep learning ap- proaches to domain adaptation consist of two steps: (i) learn features that preserve a low risk on labeled samples (source domain)µ and (ii) make the features from both do- mains to be as indistinguishable as possible, so that a clas- µ2 t sifier trained on the source can also be applied on the tar- f(xi) get domain. In general, the classifiersµ in step (i) consist of fully-connected layers applied directly on the indistin- µ4 guishable features learned in (ii). In this paper, we pro- pose a different way to do the classification, using similarity learning. The proposed method learns a pairwise similarity function in which classification can be performed by com- puting similarity between prototype representations of each Figure 1: Illustrative example of our similarity-based clas- category. The domain-invariant features and the categori- sifier. A query target-domain image representation is com- cal prototype representations are learned jointly and in an pared to a set of prototypes, one per category, computed end-to-end fashion. At inference time, images from the tar- from source-domain images. The label of the most similar get domain are compared to the prototypes and the label prototype is assigned to the test image.
    [Show full text]
  • Algorithms for Similarity Relation Learning from High Dimensional Data Phd Dissertation
    University of Warsaw Faculty of Mathematics, Informatics and Mechanics mgr Andrzej Janusz Algorithms for Similarity Relation Learning from High Dimensional Data PhD dissertation Supervisor Prof. dr hab. Nguyen Hung Son Institute of Mathematics University of Warsaw October 2013 Author’s declaration: aware of legal responsibility I hereby declare that I have written this dissertation myself and all the contents of the dissertation have been obtained by legal means. October 31, 2013 . date mgr Andrzej Janusz Supervisor’s declaration: the dissertation is ready to be reviewed October 31, 2013 . date Prof. dr hab. Nguyen Hung Son 3 Abstract The notion of similarity plays an important role in machine learning and artificial intelligence. It is widely used in tasks related to a supervised classification, clustering, an outlier detection and planning [7, 22, 57, 89, 153, 166]. Moreover, in domains such as information retrieval or case-based reasoning, the concept of similarity is essential as it is used at every phase of the reasoning cycle [1]. The similarity itself, however, is a very complex concept that slips out from formal definitions. A similarity of two objects can be different depending on a considered context. In many practical situations it is difficult even to evaluate the quality of similarity assessments without considering the task for which they were performed. Due to this fact the similarity should be learnt from data, specifically for the task at hand. In this dissertation a similarity model, called Rule-Based Similarity, is described and an algorithm for constructing this model from available data is proposed. The model utilizes notions from the rough set theory [108, 110, 113, 114, 115] to derive a similarity function that allows to approximate the similarity relation in a given context.
    [Show full text]
  • Supervised Learning with Similarity Functions
    Supervised Learning with Similarity Functions Purushottam Kar Prateek Jain Indian Institute of Technology Microsoft Research Lab Kanpur, INDIA Bangalore, INDIA [email protected] [email protected] Abstract We address the problem of general supervised learning when data can only be ac- cessed through an (indefinite) similarity function between data points. Existing work on learning with indefinite kernels has concentrated solely on binary/multi- class classification problems. We propose a model that is generic enough to handle any supervised learning task and also subsumes the model previously proposed for classification. We give a “goodness” criterion for similarity functions w.r.t. a given supervised learning task and then adapt a well-known landmarking technique to provide efficient algorithms for supervised learning using “good” similarity func- tions. We demonstrate the effectiveness of our model on three important super- vised learning problems: a) real-valued regression, b) ordinal regression and c) ranking where we show that our method guarantees bounded generalization error. Furthermore, for the case of real-valued regression, we give a natural goodness definition that, when used in conjunction with a recent result in sparse vector re- covery, guarantees a sparse predictor with bounded generalization error. Finally, we report results of our learning algorithms on regression and ordinal regression tasks using non-PSD similarity functions and demonstrate the effectiveness of our algorithms, especially that of the sparse landmark selection algorithm that achieves significantly higher accuracies than the baseline methods while offering reduced computational costs. 1 Introduction The goal of this paper is to develop an extended framework for supervised learning with similarity functions.
    [Show full text]
  • Unsupervised Graph-Based Similarity Learning Using Heterogeneous Features
    Unsupervised Graph-Based Similarity Learning Using Heterogeneous Features by Pradeep Muthukrishnan A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science and Engineering) in The University of Michigan 2011 Doctoral Committee: Professor Dragomir Radkov Radev, Chair Associate Professor Steven P. Abney Assistant Professor Honglak Lee Assistant Professor Qiaozhu Mei Assistant Professor Zeeshan Syed c Pradeep Muthukrishnan 2011 All Rights Reserved Dedicated to my parents ii ACKNOWLEDGEMENTS First and foremost, I am truly grateful to my advisor Dragomir Radev. This thesis would not have been possible without him and without the support he has given me over the past few years. His constant guidance, motivation and patience has been essential in my development as a scientist. Also, I would like to thank Qiaozhu Mei for many insightful discussions and feedback that helped shape and improve this work. I would also like to thank the other members of my dissertation committee, Steve Abney, Honglak Lee and Zeeshan Syed for their careful criticism and insightful feedback that improved the quality of this work. I would like to thank all members of the Computational Linguistics and Infor- mation Retrieval (CLAIR) group in the University of Michigan, Vahed Qazvinian, Ahmed Hassan, Arzucan Ozgur, and Amjad Abu-Jbara for being great colleagues and many insightful discussions that we had together. Special thanks to my roommates Kumar Sricharan and Gowtham Bellala for the innumerable late-night discussions on my research. My heartfelt thanks to all my friends who have been family for me here in Ann Arbor, especially Avani Kamath, Tejas Jayaraman and Gandharv Kashinath.
    [Show full text]
  • Coupled Attribute Similarity Learning on Categorical Data Can Wang, Xiangjun Dong, Fei Zhou, Longbing Cao, Senior Member, IEEE, and Chi-Hung Chi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 4, APRIL 2015 781 Coupled Attribute Similarity Learning on Categorical Data Can Wang, Xiangjun Dong, Fei Zhou, Longbing Cao, Senior Member, IEEE, and Chi-Hung Chi Abstract— Attribute independence has been taken as a major Index Terms— Clustering, coupled attribute similarity, coupled assumption in the limited research that has been conducted on object analysis, similarity analysis, unsupervised learning. similarity analysis for categorical data, especially unsupervised learning. However, in real-world data sources, attributes are more or less associated with each other in terms of certain I. INTRODUCTION coupling relationships. Accordingly, recent works on attribute IMILARITY analysis has been a problem of great dependency aggregation have introduced the co-occurrence of attribute values to explore attribute coupling, but they only Spractical importance in several domains for decades, not present a local picture in analyzing categorical data similarity. least in recent work, including behavior analysis [1], document This is inadequate for deep analysis, and the computational analysis [2], and image analysis [3]. A typical aspect of these complexity grows exponentially when the data scale increases. applications is clustering, in which the similarity is usually This paper proposes an efficient data-driven similarity learning defined in terms of one of the following levels: 1) between approach that generates a coupled attribute similarity measure for nominal objects with attribute couplings to capture a global clusters; 2) between attributes; 3) between data objects; or picture of attribute similarity. It involves the frequency-based 4) between attribute values. The similarity between clusters is intra-coupled similarity within an attribute and the inter-coupled often built on top of the similarity between data objects, e.g., similarity upon value co-occurrences between attributes, as well centroid similarity.
    [Show full text]
  • Learning Similarities for Linear Classification: Theoretical
    Learning similarities for linear classification : theoretical foundations and algorithms Maria-Irina Nicolae To cite this version: Maria-Irina Nicolae. Learning similarities for linear classification : theoretical foundations and al- gorithms. Machine Learning [cs.LG]. Université de Lyon, 2016. English. NNT : 2016LYSES062. tel-01975541 HAL Id: tel-01975541 https://tel.archives-ouvertes.fr/tel-01975541 Submitted on 9 Jan 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. École Doctorale ED488 "Sciences, Ingénierie, Santé" Learning Similarities for Linear Classification: Theoretical Foundations and Algorithms Apprentissage de similarités pour la classification linéaire : fondements théoriques et algorithmes Thèse préparée par Maria-Irina Nicolae pour obtenir le grade de : Docteur de l’Université Jean Monnet de Saint-Etienne Domaine: Informatique Soutenance le 2 Décembre 2016 au Laboratoire Hubert Curien devant le jury composé de : Maria-Florina Balcan Maître de conférences, Carnegie Mellon University Examinatrice Antoine Cornuéjols Professeur, AgroParisTech
    [Show full text]