Extended Discriminative Random Walk: a Hypergraph Approach to Multi-View Multi-Relational Transductive Learning

Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) Extended Discriminative Random Walk: A Hypergraph Approach to Multi-View Multi-Relational Transductive Learning Sai Nageswar Satchidanand, Harini Ananthapadmanaban, Balaraman Ravindran Indian Institute of Technology Madras, Chennai, India [email protected], [email protected], [email protected] Abstract instances are treated as nodes in a graph and the links repre- sent relations between the nodes. Given a small labeled set, Transductive inference on graphs has been garner- the goal is to infer labels for the other nodes in the graph. ing increasing attention due to the connected nature This is an instance of transductive inference, since the la- of many real-life data sources, such as online social beled and unlabeled data together with the graph structure media and biological data (protein-protein interac- is used in the inference procedure [Chakrabarti et al., 1998; tion network, gene networks, etc.). Typically rela- Castillo et al., 2007; Domingos and Richardson, 2001]. tional information in the data is encoded as edges in Many of the learning approaches assume a pair-wise rela- a graph but often it is important to model multi-way tion between the nodes which translate to edges in the graph. interactions, such as in collaboration networks and In this work we are interested in looking at data that have reaction networks. In this work we model multi- multi-way relations between the instances. For e.g., the co- way relations as hypergraphs and extend the dis- author relation is naturally a multi-way relation. Such multi- criminative random walk (DRW) framework, orig- way relations can be modeled using hypergraphs in which the inally proposed for transductive inference on sin- edges are subsets of nodes. The use of hypergraphs enables gle graphs, to the case of multiple hypergraphs. We several extensions to the basic within network classification use the extended DRW framework for inference on setting and such extensions are the key contributions of this multi-view, multi-relational data in a natural way, work. by representing attribute descriptions of the data also as hypergraphs. We further exploit the struc- Hypergraph based modeling for machine learning has gar- ture of hypergraphs to modify the random walk nered some interest recently [Yu et al., 2012; Gao et al., 2011; operator to take into account class imbalance in Sun et al., 2008]. In particular, Zhou and Scholkopf¨ in 2006 the data. This work is among very few approaches [Zhou et al., 2006] extended spectral clustering methods for to explicitly address class imbalance in the in- graphs to hypergraphs and further developed a transductive network classification setting, using random walks. inference setup for embedding, i.e., labelling a partially la- We compare our approach to methods proposed for beled hypergraph. In this approach the hyperedge which is inference on hypergraphs, and to methods proposed being cut is considered as a clique with weight of the hyper- for multi-view data and show that empirically we edge being distributed uniformly over all sub-edges of clique. achieve better performance. We also compare to The spectral formulation then tries to minimize the total num- methods specifically tailored for class-imbalanced ber of these sub-edges across the cut, using a normalized hy- data and show that our approach achieves compa- pergraph cut objective that penalises unbalanced cuts. rable performance even on non-network data. In the case of many sources of connected data, such as online social networks and biological networks, in addition to the relational structure there is rich attribute information as 1 Introduction well. This has lead to the development of collective learn- With the advent of technology for easy generation and stor- ing and inference approaches that work with such attributed age, data sources have become increasingly rich in detail. De- graphs [Desrosiers and Karypis, 2009; Sen et al., 2008]. Col- pending on the nature of the data this poses several challenges lective Classification approaches like Iterative Classification to machine learning and consequently several classes of so- Algorithm (ICA) [Sen et al., 2008] use an augmented descrip- lutions have emerged. Our goal in this work is to bring to- tion of the data where the class-distribution in the neighbour- gether different strands of ideas to develop an unified frame- hood of a node are treated as additional features. These work work for various transductive learning problems on partially well in situations where there is sufficient labeled data to train labeled networked data. Due to the connected nature of many a classifier well. Such methods can also be generalized be- real-life data sources, the problem of within network clas- yond a transductive setting, but that is not of relevance to this sification has become an active area of research in recent work. times [Zhu and B.Goldberg, 2009]. In this setting, the data Another source of richness in data is the availability of 3791 multiple-descriptions of the same data. For example, to clas- graphs and D-Random Walk. sify videos in YouTube we can construct multiple views, such as attributes from video, attributes from speech/sound Hypergraphs in video, text corpus from text description of video etc., and Let G = (V; E) be a hypergraph, where V represents a finite several methods have been proposed to take advantage of set of objects and E the set of hyperedges such that for any the same [Xu et al., 2013; Sun, 2013]. Multi-view meth- ei 2 E; ei ⊆ V . Each edge is associated with a weight w(e). ods have been used extensively in a semi-supervised setting For a vertex v, degree of vertex d(v) = P w(e). with partially labeled training data [Blum and Mitchell, 1998; e2E&v2e For a hyperedge e 2 E, δ(e) represents the degree of the Sindhwani et al., 2005]. However, handling multi-view data edge i.e. δ(e) = jej. Let H be a hypergraph incidence matrix for transductive inference on graphs has not received much with h(v; e) = 1 if vertex v is in edge e. Let W denote the attention and there are only a few results such as [Zhou and diagonal weight matrix containing weights of the hyperedges, Burges, 2007; Shi et al., 2012; Vijayan et al., 2014]. Simi- D denote the diagonal vertex degree matrix containing the larly, the same entities could have different kinds of relations v degrees of vertices and D denote the diagonal edge degree between themselves. “Follows” and “retweets” on Twitter is e matrix containing the degrees of edges. Also, let n = jV j be an example of multiple relations. the total number of instances. One over-arching problem that spans the different settings For an attribute view, let X be n × d categorical attribute described above and in general inductive learning from data, matrix of instances where x represents an attribute vector is that of class imbalance. In many real settings, the differ- i in the dataset, i.e., a column containing the values of this at- ent classes are seldom distributed uniformly. There have been tribute for all elements of the dataset. Let L be a set of labeled different approaches proposed for handling class imbalance instances, assigned to a category from a discrete set Y . The (e.g. [Cieslak et al., 2012]) but there are none that are satis- label of each instance v 2 L is written as y and L denotes factory in the networked data context. v y the set of nodes in class y, with n = jL j. In this work we propose a unified method to address y y the problems discussed above by extending the discrimina- D-Walks tive random walk framework (DRW) [Callut et al., 2008; Mantrach et al., 2011]. DRW is one of the most successful As proposed in [Callut et al., 2008], bounded random D- approaches for within network classification and is based on Walks are a very effective way of classification in a partially transit times in a limited random walk. The method works labeled graph. For a given set of states v0; v1; :::; vN and a very well even when the fraction of labeled nodes on the class y 2 Y , a D-Walk is a sequence of states v0; v1; :::; vl, graph is very small. In this work we extend the DRW frame- such that yv = yv = y and yv 6= y for all 0 < t < l. Let y 0 l t work in several significant ways. Dl denote the event of a D-walk of exactly length l starting and ending on a node labeled y. For a given unlabeled node • First, we extend the DRW framework to accommodate y inference on hypergraphs. We introduce a new random v 2 V , we define E[pt(v) j Dl ], the expected length-limited walk operator on hypergraphs and modify the DRW pro- passage time (pt(v)), as the number of times the random walk cedure appropriately. process reaches node v in a walk of length exactly l as follows: • Second, we modify the random walk operator to han- dle multiple relations and multiple views. This is accom- l−1 l−1 y X X P [Xt = v ^ D ] plished by modeling the attribute descriptions of the data E[pt(v)jDy] = P [X = vjDy] = l l t l P [Dy] as a hypergraph. t=1 t=1 l • Third, we account for class imbalance in the network (1) v data to a limited extent by appropriately reweighting Now, the D-walk betweenness function for a node and class y L the hyperedges with a preponderance of minority class and some maximum walk length is defined as: points.

Extended Discriminative Random Walk: a Hypergraph Approach to Multi-View Multi-Relational Transductive Learning

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support