Heterogeneous Graph Neural Network

Total Page:16

File Type:pdf, Size:1020Kb

Heterogeneous Graph Neural Network Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA Heterogeneous Graph Neural Network Chuxu Zhang Dongjin Song Chao Huang University of Notre Dame NEC Laboratories America, Inc. University of Notre Dame, JD Digits [email protected] [email protected] [email protected] Ananthram Swami Nitesh V. Chawla US Army Research Laboratory University of Notre Dame [email protected] [email protected] ABSTRACT SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’19), Representation learning in heterogeneous graphs aims to pursue August 4–8, 2019, Anchorage, AK, USA. ACM, New York, NY, USA, 11 pages. a meaningful vector representation for each node so as to facili- https://doi.org/10.1145/3292500.3330961 tate downstream applications such as link prediction, personalized recommendation, node classification, etc. This task, however, is challenging not only because of the demand to incorporate het- 1 INTRODUCTION erogeneous structural (graph) information consisting of multiple Heterogeneous graphs (HetG) [26, 27] contain abundant informa- types of nodes and edges, but also due to the need for considering tion with structural relations (edges) among multi-typed nodes as heterogeneous attributes or contents (e:д:, text or image) associ- well as unstructured content associated with each node. For in- ated with each node. Despite a substantial amount of effort has stance, the academic graph in Fig. 1(a) denotes relations between been made to homogeneous (or heterogeneous) graph embedding, authors and papers (write), papers and papers (cite), papers and attributed graph embedding as well as graph neural networks, few venues (publish), etc. Moreover, nodes in this graph carry attributes of them can jointly consider heterogeneous structural (graph) infor- (e:д:, author id) and text (e:д:, paper abstract). Another example mation as well as heterogeneous contents information of each node illustrates user-item relations in the review graph and nodes are effectively. In this paper, we propose HetGNN, a heterogeneous associated with attributes (e:д:, user id), text (e:д:, item description) graph neural network model, to resolve this issue. Specifically, we and image (e:д:, item picture). This ubiquity of HetG has led to an first introduce a random walk with restart strategy to samplea influx of research on corresponding graph mining methods and fixed size of strongly correlated heterogeneous neighbors for each algorithms such as relation inference [2, 25, 33, 35], personalized node and group them based upon node types. Next, we design a recommendation [10, 23], node classification [36], etc. neural network architecture with two modules to aggregate feature Traditionally, a variety of these HetG tasks have relied on fea- information of those sampled neighboring nodes. The first module ture vectors derived from a manual feature engineering tasks. This encodes “deep” feature interactions of heterogeneous contents and requires specifications and computation of different statistics or generates content embedding for each node. The second module properties about the HetG as a feature vector for downstream ma- aggregates content (attribute) embeddings of different neighboring chine learning or analytic tasks. However, this can be very limiting groups (types) and further combines them by considering the im- and not generalizable. More recently, there has been an emergence pacts of different groups to obtain the ultimate node embedding. of representation learning approaches to automate the feature engi- Finally, we leverage a graph context loss and a mini-batch gradient neering tasks, which can then facilitate a multitude of downstream descent procedure to train the model in an end-to-end manner. Ex- machine learning or analytic tasks. Beginning with homogeneous tensive experiments on several datasets demonstrate that HetGNN graphs [6, 20, 29], graph representation learning has been expanded can outperform state-of-the-art baselines in various graph mining to heterogeneous graphs [1, 4], attributed graphs [15, 34] as well tasks, i:e:, link prediction, recommendation, node classification & as specific graphs [22, 28]. For instance, the “shallow” models, e:д:, clustering and inductive node classification & clustering. DeepWalk [20], were initially developed to feed a set of short ran- dom walks over the graph to the SkipGram model [19] so as to KEYWORDS approximate the node co-occurrence probability in these walks Heterogeneous graphs, Graph neural networks, Graph embedding and obtain node embeddings. Subsequently, semantic-aware ap- proaches, e:д:, metapath2vec [4], were proposed to address node ACM Reference Format: and relation heterogeneity in heterogeneous graphs. In addition, Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh content-aware approaches, e:д:, ASNE [15], leveraged both “latent” V. Chawla. 2019. Heterogeneous Graph Neural Network. In The 25th ACM features and attributes to learn node embeddings in the graph. Publication rights licensed to ACM. ACM acknowledges that this contribution was These methods learn node “latent” embeddings directly, but are authored or co-authored by an employee, contractor or affiliate of the United States limited in capturing the rich neighborhood information. The Graph government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes Neural Networks (GNNs) employ deep neural networks to aggre- only. gate feature information of neighboring nodes, which makes the KDD ’19, August 4–8, 2019, Anchorage, AK, USA aggregated embedding more powerful. In addition, the GNNs can © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-6201-6/19/08...$15.00 be naturally applied to inductive tasks involving nodes that are not https://doi.org/10.1145/3292500.3330961 present in the training period. For instance, GCN [12], GraphSAGE 793 Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA Table 1: Model comparison: (1) RL - representation learning? (a) author paper venue user item (2) HG - heterogeneous graph? (3) C - content aware? (4) HC a1 p1 u1 i1 - heterogeneous contents aware? (5) I - inductive inference? v u DW MP2V ASNE SHNE GSAGE GAT a2 p2 1 2 i2 Property HetGNN [20] [4] [15] [34] [7] [31] a p u3 i3 3 3 v2 RL 3 3 3 3 3 3 3 a4 p4 u4 i4 HG 7 3 7 3 7 7 3 academic graph review graph C 7 7 3 3 3 3 3 (b) HC 7 7 3 7 3 3 3 attributes I 7 7 7 7 3 3 3 heterogeneous graph type-1 text d . c attributes same feature transformation function for all node types as their type-2 f a image contents vary from each other. Thus challenge 2 is: how to de- b e . sign node content encoder for addressing content heterogeneity of . different nodes in HetG, as indicated by C2 in Figure 1(b)? g text • (C3) Different types of neighbors contribute differently tothe type-k image node embeddings in HetG. For example, in the academic graph . of Figure 1(a), author and paper neighbors should have more influence on the embedding of author node as a venue node C1 C2 C3 contains diverse topics thus has more general embedding. Most Figure 1: (a) HetG examples: an academic graph and a review of current GNNs mainly focus on homogeneous graphs and do not graph. (b) Challenges of graph neural network for HetG: C1 consider node type impact. Thus challenge 3 is: how to aggregate - sampling heterogeneous neighbors (for node a in this case, feature information of heterogeneous neighbors by considering the node colors denote different types); C2 - encoding heteroge- impacts of different node types, as indicated by C3 in Figure 1(b). neous contents; C3 - aggregating heterogeneous neighbors. To solve these challenges, we propose HetGNN, a heterogeneous [7], and GAT [31] employ convolutional operator, LSTM architec- graph neural network model for representation learning in HetG. ture, and self-attention mechanism to aggregate feature information First, we design a random walk with restart based strategy to sample of neighboring nodes, respectively. The advances and applications fixed size strongly correlated heterogeneous neighbors ofeach of GNNs are largely concentrated on homogeneous graphs. Current node in HetG and group them according to node types. Next, we state-of-the-art GNNs have not well solved the following challenges design a heterogeneous graph neural network architecture with two faced for HetG, which we address in this paper. modules to aggregate feature information of sampled neighbors in previous step. The first module employs recurrent neural network • (C1) Many nodes in HetG may not connect to all types of neigh- to encode “deep” feature interactions of heterogeneous contents bors. In addition, the number of neighboring nodes varies from and obtains content embedding of each node. The second module node to node. For example, in Figure 1(a), any author node has utilizes another recurrent neural network to aggregate content no direct connection to a venue node. Meanwhile, in Figure 1(b), embeddings of different neighboring groups, which are further node a has 5 direct neighbors while node c only has 2. Most combined by an attention mechanism for measuring the different existing GNNs only aggregate feature information of direct (first- impacts of heterogeneous node types and obtaining the ultimate order) neighboring nodes and the feature propagation process node embedding. Finally, we leverage a graph context loss and may weaken the effect of farther neighbors. Moreover, the embed- a mini-batch gradient descent procedure to train the model. To ding generation of “hub” node is impaired by weakly correlated summarize, the main contributions of our work are: neighbors (“noise” neighbors) and the embedding of “cold-start” node is not sufficiently represented due to limited neighbor infor- • We formalize the problem of heterogeneous graph representation mation. Thus challenge 1 is: how to sample heterogeneous neighbors learning which involves both graph structure heterogeneity and that are strongly correlated to embedding generation for each node node content heterogeneity.
Recommended publications
  • Biased Edge Dropout for Enhancing Fairness in Graph Representation
    PREPRINT SUBMITTED TO A JOURNAL. 1 Biased Edge Dropout for Enhancing Fairness in Graph Representation Learning Indro Spinelli, Graduate Student Member, IEEE, Simone Scardapane, Amir Hussain, Senior Member, IEEE and Aurelio Uncini, Member, IEEE Abstract—Graph representation learning has become a ubiqui- I. INTRODUCTION tous component in many scenarios, ranging from social network analysis to energy forecasting in smart grids. In several applica- RAPH structured data, ranging from friendships on tions, ensuring the fairness of the node (or graph) representations G social networks to physical links in energy grids, powers with respect to some protected attributes is crucial for their many algorithms governing our digital life. Social networks correct deployment. Yet, fairness in graph deep learning remains under-explored, with few solutions available. In particular, the topologies define the stream of information we will receive, tendency of similar nodes to cluster on several real-world often influencing our opinion [1][2][3][4]. Bad actors, some- graphs (i.e., homophily) can dramatically worsen the fairness times, define these topologies ad-hoc to spread false informa- of these procedures. In this paper, we propose a biased edge tion [5]. Similarly, recommender systems [6] suggest products dropout algorithm (FairDrop) to counter-act homophily and tailored to our own experiences and history of purchases. improve fairness in graph representation learning. FairDrop can be plugged in easily on many existing algorithms, is efficient, However, pursuing the highest accuracy as the only metric of adaptable, and can be combined with other fairness-inducing interest has let many of these algorithms discriminate against solutions. After describing the general algorithm, we demonstrate minorities in the past [7][8][9], despite the law prohibiting its application on two benchmark tasks, specifically, as a random unfair treatment based on sensitive traits such as race, religion, walk model for producing node embeddings, and to a graph and gender.
    [Show full text]
  • Sustainability Report 2018 NEC Sustainability Report 2018
    Sustainability Report 2018 NEC Sustainability Report 2018 Table of Contents 73 Supply Chain Management 80 Ensuring Quality and Safety 2 Message from the President Social 84 Respecting Human Rights Sustainable Management 91 Personal Information Protection and Privacy 99 Diversity and Inclusion 3 Sustainable Management 107 Creating a Diverse Work Style Environment 7 Priority Management Themes 114 Human Resources Development and Training from an ESG Perspective – Materiality 120 Health and Safety 14 Dialogue Sessions on Materiality with Experts 20 Dialogue and Co-creation with Our Stakeholders 21 Dialogue with Our Diverse Stakeholders Environment – Case Examples 26 CS (Customer Satisfaction) Initiative 126 Environmental Management Initiatives 30 Cooperation with the Local Communities 35 Innovation Management 45 External Ratings and Evaluation Reference Information 133 GRI (Global Reporting Initiative) Index 144 Global Compact 145 ISO26000 Governance 147 Third-party Assurance 148 Information Disclosure Policy 48 Corporate Governance 150 Data Collection 50 Risk Management 50 Compliance and Risk Management 55 Basic Approach on Tax Matters 56 Promoting Fair Commercial Transactions 60 Business Continuity 67 Information Security and Cyber Security NEC is a signatory to the United Nations Global Compact. 1 NEC Sustainability Report 2018 Message from the President Becoming a company that is embraced by and essential to society Since its founding in 1899, NEC has been providing products and services that are centered on IT and networks under the motto of “Better Products, Better Services” as the company contributes to customers and society. In 2014, we created the Brand Statement “Orchestrating a brighter world” and have been promoting business that originates from efforts to address important social issues.
    [Show full text]
  • Combining Collective Classification and Link Prediction
    Combining Collective Classification and Link Prediction Mustafa Bilgic, Galileo Mark Namata, Lise Getoor Dept. of Computer Science Univ. of Maryland College Park, MD 20742 {mbilgic, namatag, getoor}@cs.umd.edu Abstract however, this is rarely the case. Real world collections usu- ally have a number of missing and incorrect labels and links. The problems of object classification (labeling the nodes Other approaches construct a complex joint probabilistic of a graph) and link prediction (predicting the links in model, capable of handling missing values, but in which a graph) have been largely studied independently. Com- inference is often intractable. monly, object classification is performed assuming a com- In this paper, we take the middle road. We propose a plete set of known links and link prediction is done assum- simple yet general approach for combining object classifi- ing a fully observed set of node attributes. In most real cation and link prediction. We propose an algorithm called world domains, however,attributes and links are often miss- Iterative Collective Classification and Link Prediction (IC- ing or incorrect. Object classification is not provided with CLP) that integrates collective object classification and link all the links relevant to correct classification and link pre- prediction by effectively passing up-to-date information be- diction is not provided all the labels needed for accurate tween the algorithms. We experimentally show on many link prediction. In this paper, we propose an approach that different network types that applying ICCLP improves per- addresses these two problems by interleaving object clas- formance over running collective classification and link pre- sification and link prediction in a collective algorithm.
    [Show full text]
  • H-1B Petition Approvals for Initial Benefits by Employers FY07
    NUMBER OF H-1B PETITIONS APPROVED BY USCIS FOR INITIAL BENEFICIARIES FY 2007 Approved Employer Petitions INFOSYS TECHNOLOGIES LIMITED 4,559 WIPRO LIMITED 2,567 SATYAM COMPUTER SERVICES LTD 1,396 COGNIZANT TECH SOLUTIONS US CORP 962 MICROSOFT CORP 959 TATA CONSULTANCY SERVICES LIMITED 797 PATNI COMPUTER SYSTEMS INC 477 US TECHNOLOGY RESOURCES LLC 416 I-FLEX SOLUTIONS INC 374 INTEL CORPORATION 369 ACCENTURE LLP 331 CISCO SYSTEMS INC 324 ERNST & YOUNG LLP 302 LARSEN & TOUBRO INFOTECH LIMITED 292 DELOITTE & TOUCHE LLP 283 GOOGLE INC 248 MPHASIS CORPORATION 248 UNIVERSITY OF ILLINOIS AT CHICAGO 246 AMERICAN UNIT INC 245 JSMN INTERNATIONAL INC 245 OBJECTWIN TECHNOLOGY INC 243 DELOITTE CONSULTING LLP 242 PRINCE GEORGES COUNTY PUBLIC SCHS 238 JPMORGAN CHASE & CO 236 MOTOROLA INC 234 MARLABS INC 229 KPMG LLP 227 GOLDMAN SACHS & CO 224 TECH MAHINDRA AMERICAS INC 217 VERINON TECHNOLOGY SOLUTIONS LTD 213 THE JOHNS HOPKINS MED INSTS OIS 205 YASH TECHNOLOGIES INC 202 ADVANSOFT INTERNATIONAL INC 201 UNIVERSITY OF MARYLAND 199 BALTIMORE CITY PUBLIC SCHOOLS 196 PRICEWATERHOUSECOOPERS LLP 192 POLARIS SOFTWARE LAB INDIA LTD 191 UNIVERSITY OF MICHIGAN 191 EVEREST BUSINESS SOLUTIONS INC 190 IBM CORPORATION 184 APEX TECHNOLOGY GROUP INC 174 NEW YORK CITY PUBLIC SCHOOLS 171 SOFTWARE RESEARCH GROUP INC 167 EVEREST CONSULTING GROUP INC 165 UNIVERSITY OF PENNSYLVANIA 163 GSS AMERICA INC 160 QUALCOMM INCORPORATED 158 UNIVERSITY OF MINNESOTA 151 MASCON GLOBAL CONSULTING INC 150 MICRON TECHNOLOGY INC 149 THE OHIO STATE UNIVERSITY 147 STANFORD UNIVERSITY 146 COLUMBIA
    [Show full text]
  • Finding Experts by Link Prediction in Co-Authorship Networks
    Finding Experts by Link Prediction in Co-authorship Networks Milen Pavlov1,2,RyutaroIchise2 1 University of Waterloo, Waterloo ON N2L 3G1, Canada 2 National Institute of Informatics, Tokyo 101-8430, Japan Abstract. Research collaborations are always encouraged, as they of- ten yield good results. However, the researcher network contains massive amounts of experts in various disciplines and it is difficult for the indi- vidual researcher to decide which experts will match his own expertise best. As a result, collaboration outcomes are often uncertain and research teams are poorly organized. We propose a method for building link pre- dictors in networks, where nodes can represent researchers and links - collaborations. In this case, predictors might offer good suggestions for future collaborations. We test our method on a researcher co-authorship network and obtain link predictors of encouraging accuracy. This leads us to believe our method could be useful in building and maintaining strong research teams. It could also help with choosing vocabulary for expert de- scription, since link predictors contain implicit information about which structural attributes of the network are important with respect to the link prediction problem. 1 Introduction Collaborations between researchers often have a synergistic effect. The combined expertise of a group of researchers can often yield results far surpassing the sum of the individual researchers’ capabilities. However, creating and organizing such research teams is not a straightforward task. The individual researcher often has limited awareness of the existence of other researchers with which collaborations might prove fruitful. Further- more, even in the presence of such awareness, it is difficult to predict in advance which potential collaborations should be pursued.
    [Show full text]
  • Learning to Make Predictions on Graphs with Autoencoders
    Learning to Make Predictions on Graphs with Autoencoders Phi Vu Tran Strategic Innovation Group Booz j Allen j Hamilton San Diego, CA USA [email protected] Abstract—We examine two fundamental tasks associated networks [38], communication networks [11], cybersecurity with graph representation learning: link prediction and semi- [6], recommender systems [16], and knowledge bases such as supervised node classification. We present a novel autoencoder DBpedia and Wikidata [35]. architecture capable of learning a joint representation of both local graph structure and available node features for the multi- There are a number of technical challenges associated with task learning of link prediction and node classification. Our learning to make meaningful predictions on complex graphs: autoencoder architecture is efficiently trained end-to-end in a single learning stage to simultaneously perform link prediction • Extreme class imbalance: in link prediction, the number and node classification, whereas previous related methods re- of known present (positive) edges is often significantly quire multiple training steps that are difficult to optimize. We less than the number of known absent (negative) edges, provide a comprehensive empirical evaluation of our models making it difficult to reliably learn from rare examples; on nine benchmark graph-structured datasets and demonstrate • Learn from complex graph structures: edges may be significant improvement over related methods for graph rep- resentation learning. Reference code and data are available at directed or undirected, weighted or unweighted, highly https://github.com/vuptran/graph-representation-learning. sparse in occurrence, and/or consisting of multiple types. Index Terms—network embedding, link prediction, semi- A useful model should be versatile to address a variety supervised learning, multi-task learning of graph types, including bipartite graphs; • Incorporate side information: nodes (and maybe edges) I.
    [Show full text]
  • Predicting Disease Related Microrna Based on Similarity and Topology
    cells Article Predicting Disease Related microRNA Based on Similarity and Topology Zhihua Chen 1,†, Xinke Wang 2,†, Peng Gao 2,†, Hongju Liu 3,† and Bosheng Song 4,* 1 Institute of Computing Science and Technology, Guangzhou University, Guangzhou 510006, China; [email protected] 2 School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China; [email protected] (X.W.); [email protected] (P.G.) 3 College of Information Technology and Computer Science, University of the Cordilleras, Baguio 2600, Philippines; [email protected] 4 School of Information Science and Engineering, Hunan University, Changsha 410082, China * Correspondence: [email protected]; Tel.: +86-731-88821907 † All authors contributed equally to this work. Received: 4 July 2019; Accepted: 5 November 2019; Published: 7 November 2019 Abstract: It is known that many diseases are caused by mutations or abnormalities in microRNA (miRNA). The usual method to predict miRNA disease relationships is to build a high-quality similarity network of diseases and miRNAs. All unobserved associations are ranked by their similarity scores, such that a higher score indicates a greater probability of a potential connection. However, this approach does not utilize information within the network. Therefore, in this study, we propose a machine learning method, called STIM, which uses network topology information to predict disease–miRNA associations. In contrast to the conventional approach, STIM constructs features according to information on similarity and topology in networks and then uses a machine learning model to predict potential associations. To verify the reliability and accuracy of our method, we compared STIM to other classical algorithms.
    [Show full text]
  • Representation Learning on Graphs
    Representation learning on graphs A/Prof Truyen Tran [email protected] @truyenoz Deakin University truyentran.github.io letdataspeak.blogspot.com goo.gl/3jJ1O0 HCMC, May 2019 11/05/2019 1 Graph representation Graph reasoning Why bother? Graph generation Embedding Graph dynamics Message passing 11/05/2019 2 Why learning of graph representation? Graphs are pervasive in many scientific disciplines. The sub-area of graph representation has reached a certain maturity, with multiple reviews, workshops and papers at top AI/ML venues. Deep learning needs to move beyond vector, fixed-size data. Learning representation as a powerful way to discover hidden patterns making learning, inference and planning easier. 11/05/2019 3 System medicine 11/05/2019 https://www.frontiersin.org/articles/10.3389/fphys.2015.00225/full 4 Biology & pharmacy Traditional techniques: Graph kernels (ML) Molecular fingerprints (Chemistry) Modern techniques Molecule as graph: atoms as nodes, chemical bonds as edges #REF: Penmatsa, Aravind, Kevin H. Wang, and Eric Gouaux. "X- ray structure of dopamine transporter elucidates antidepressant mechanism." Nature 503.7474 (2013): 85-90. 11/05/2019 5 Chemistry DFT = Density Functional Theory Gilmer, Justin, et al. "Neural message passing for quantum chemistry." arXiv preprint arXiv:1704.01212 (2017). • Molecular properties • Chemical-chemical interaction • Chemical reaction • Synthesis planning 11/05/2019 6 Materials science • Crystal properties • Exploring/generating solid structures • Inverse design Xie, Tian, and Jeffrey
    [Show full text]
  • RFP Response to Region 10 ESC
    An NEC Solution for Region 10 ESC Building and School Security Products and Services RFP #EQ-111519-04 January 17, 2020 Submitted By: Submitted To: Lainey Gordon Ms. Sue Hayes Vertical Practice – State and Local Chief Financial Officer Government Region 10 ESC Enterprise Technology Services (ETS) 400 East Spring Valley Rd. NEC Corporation of America Richardson, TX 75081 Cell: 469-315-3258 Office: 214-262-3711 Email: [email protected] www.necam.com 1 DISCLAIMER NEC Corporation of America (“NEC”) appreciates the opportunity to provide our response to Education Service Center, Region 10 (“Region 10 ESC”) for Building and School Security Products and Services. While NEC realizes that, under certain circumstances, the information contained within our response may be subject to disclosure, NEC respectfully requests that all customer contact information and sales numbers provided herein be considered proprietary and confidential, and as such, not be released for public review. Please notify Lainey Gordon at 214-262-3711 promptly upon your organization’s intent to do otherwise. NEC requests the opportunity to negotiate the final terms and conditions of sale should NEC be selected as a vendor for this engagement. NEC Corporation of America 3929 W John Carpenter Freeway Irving, TX 75063 http://www.necam.com Copyright 2020 NEC is a registered trademark of NEC Corporation of America, Inc. 2 Table of Contents EXECUTIVE SUMMARY ...................................................................................................................................
    [Show full text]
  • Khanamk2021m-1A.Pdf (2.656Mb)
    Lakehead University Knowledge Commons,http://knowledgecommons.lakeheadu.ca Electronic Theses and Dissertations Electronic Theses and Dissertations from 2009 2021 Using homophily to analyze and develop link prediction models with deep learning framework Khanam, Kazi Zainab https://knowledgecommons.lakeheadu.ca/handle/2453/4829 Downloaded from Lakehead University, KnowledgeCommons Using Homophily to Analyze and Develop Link Prediction Models with Deep Learning Framework by Kazi Zainab Khanam A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science in the Faculty of Science and Environmental Studies of Lakehead University, Thunder Bay Committee in charge: Dr. Vijay Mago (Principal Supervisor) Dr. Rajesh Sharma (External Examiner) Dr. Yimin Yang (Internal Examiner) Winter 2021 The thesis of Kazi Zainab Khanam, titled Using Homophily to Analyze and Develop Link Prediction Models with Deep Learning Framework, is approved: Chair Date Date Date Lakehead University, Thunder Bay Using Homophily to Analyze and Develop Link Prediction Models with Deep Learning Framework Copyright 2021 by Kazi Zainab Khanam 1 Abstract USING HOMOPHILY TO ANALYZE AND DEVELOP LINK PREDICTION MODELS WITH DEEP LEARNING FRAMEWORK Twitter is a prominent social networking platform where users’ short messages or “tweets” are often used for analysis. However, there has not been much attention paid to mining the medical professions, such as detecting users’ occupations from their biographical content. Mining such information can be useful to build recommender systems for cost-effective ad- vertisements. Conventional classifiers can be used to predict medical occupations, but they tend to perform poorly as there are a variety of occupations. As a result, the main focus of the research is to use various deep learning techniques to examine the textual properties of Twitter users’ biographic contents, network properties, and the impact of homophily of Twitter users employed in medical professional fields.
    [Show full text]
  • Neural Variational Inference for Embedding Knowledge Graphs
    University College London MSc Data Science and Machine Learning Neural Variational Inference For Embedding Knowledge Graphs Author: Supervisor: Alexander Cowen-Rivers Prof. Sebastian Riedel Disclaimer This report is submitted as part requirement for the MSc Degree in Data Science and Machine Learning at University College London. It is substantially the result of my own work except where explicitly indicated in the text. The report may be freely copied and distributed provided the source is explicitly acknowledged.The report may be freely copied and distributed provided the source is explicitly acknowledged. Machine Reading Group October 23, 2018 i \Probability is expectation founded upon partial knowledge. A perfect acquaintance with all the circumstances affecting the occurrence of an event would change expectation into certainty, and leave nether room nor demand for a theory of probabilities." George Boole ii Acknowledgements I would like to thank my supervisors Prof Sebastian Riedel, as well as the other Machine Reading group members, particularly Dr Minervini, who helped through comments and discussions during the writing of this thesis. I would also like to thank my father who nurtured my passion for mathematics. Lastly, I would like to thank Thomas Kipf from the University of Amsterdam, who cleared up some queries I had regarding some of his recent work. iii Abstract Alexander Cowen-Rivers Neural Variational Inference For Embedding Knowledge Graphs Statistical relational learning investigates the development of tools to study graph-structured data. In this thesis, we provide an introduction on how models of the world are learnt using knowledge graphs of known facts of information, later applied to infer new facts about the world.
    [Show full text]
  • A Scalable and Distributed Actor-Based Version of the Node2vec Algorithm
    Workshop "From Objects to Agents" (WOA 2019) A Scalable and Distributed Actor-Based Version of the Node2Vec Algorithm Gianfranco Lombardo Agostino Poggi Department of Engineering and Architecture Department of Engineering and Architecture University of Parma University of Parma Parma, Italy Parma, Italy [email protected] [email protected] Abstract—The analysis of systems that can be modeled as attributed networks: an interaction network and a friendship networks of interacting entities is becoming often more important network. In [3] the authors analyze the same community in in different research fields. The application of machine learning order to extract the giant component without using topology algorithms, like prediction tasks over nodes and edges, requires a manually feature extraction or a learning task to extract them information with a bio-inspired algorithm. In [4] the authors automatically (embedding techniques). Several approaches have uses a temporal network to model the USA financial market in been proposed in the last years and the most promising one order to discover correlations among the dynamics of stocks’ is represented by the Node2Vec algorithm. However, common cluster and to predict economic crises. In [5] the authors limitations of graph embedding techniques are related to memory modeled the interaction between proteins as a network, with requirements and to the time complexity. In this paper, we propose a scalable and distributed version of this algorithm called the aim of automatic predicting a correct label for each protein ActorNode2vec, developed on an actor-based architecture that describing its functionalities. In light of this, this formulation allows to overcome these kind of constraints.
    [Show full text]