Link Prediction Using Supervised Learning ∗

Total Page:16

File Type:pdf, Size:1020Kb

Link Prediction Using Supervised Learning ∗ Link Prediction using Supervised Learning ∗ Mohammad Al Hasan, Vineet Chaoji, Saeed Salem, and Mohammed Zaki Rensselaer Polytechnic Institute, Troy, New York 12180 falhasan, chaojv, salems, [email protected] Abstract parameters. But, a comparatively easier problem is to Social network analysis has attracted much attention in re- understand the association between two specific nodes. cent years. Link prediction is a key research direction within Several variations of the above problems make interest- this area. In this paper, we study link prediction as a su- ing research topics. For instance, some of the interesting pervised learning task. Along the way, we identify a set of questions that can be posed are { how does the associa- features that are key to the performance under the super- tion pattern change over time, what are the factors that vised learning setup. The identified features are very easy to drive the associations, how is the association between compute, and at the same time surprisingly effective in solv- two nodes affected by other nodes. The specific prob- ing the link prediction problem. We also explain the effec- lem instance that we address in this research is to pre- tiveness of the features from their class density distribution. dict the likelihood of a future association between two Then we compare different classes of supervised learning al- nodes, knowing that there is no association between the gorithms in terms of their prediction performance using var- nodes in the current state of the graph. This problem ious performance metrics, such as accuracy, precision-recall, is commonly known as the Link Prediction problem. F-values, squared error etc. with a 5-fold cross validation. We use the coauthorship graph from scientific pub- Our results on two practical social network datasets shows lication data for our experiments. We prepare datasets that most of the well-known classification algorithms (deci- from the coauthorship graphs, where each data point sion tree, k-NN, multilayer perceptron, SVM, RBF network) corresponds to a pair of authors, who never coauthored can predict links with comparable performances, but SVM in training years. Depending on the fact whether they outperforms all of them with narrow margin in all perfor- coauthored in the testing year or not, the data point mance measures. Again, ranking of features with popular has either a positive label or a negative label. We ap- feature ranking algorithms shows that a small subset of fea- ply different types of supervised learning algorithms to tures always plays a significant role in link prediction. build binary classifier models that distinguish the set of authors who will coauthor in the testing year from the 1 Introduction and Background rest who will not coauthor. Predicting prospective links in coauthorship graph Social networks are a popular way to model the interac- is an important research direction, since it is identical, tion among the people in a group or community. They both conceptually and structurally to many practical can be visualized as graphs, where a vertex corresponds social network problems. The primary reason is that to a person in some group and an edge represents some a coauthorship network is a true example of social net- form of association between the corresponding persons. work, where the scientists in the community collaborate The associations are usually driven by mutual interests to achieve a mutual goal. Researchers [20] have shown that are intrinsic to a group. However, social networks that this graph also obeys the power-law distribution, are very dynamic objects, since new edges and vertices an important property of a typical social network. To are added to the graph over the time. Understanding name some practical problems that very closely match the dynamics that drives the evolution of social network with the one we study in this research, we consider is a complex problem due to a large number of variable the task of analyzing and monitoring terrorist networks. The objective in analyzing terrorist networks is to con- ∗This material is based upon work funded in whole or in part jecture that particular individuals are working together by the US Government and any opinions, findings, conclusions, even though their interactions cannot be identified from or recommendations expressed in this material are those of the the current information base. Intuitively, we are pre- author(s) and do not necessarily reflect the views of the US dicting hidden links in a social network formed by the Government. This work was supported in part by NSF CAREER Award IIS-0092978, DOE Career Award DE-FG02-02ER25538, group of terrorists. In general, link prediction provides and NSF grants EIA-0103708 and EMT-0432098. a measure of social proximity between two vertices in a social group, which, if known, can be used to optimize ment in the results. Moreover, we compare different an objective function over the entire group, especially machine learning algorithms for the link prediction task. in the domain of collaborative filtering [22], Knowledge Another recent work by Faloutsos et al. [10], al- Management Systems [8], etc. It can also help in mod- though does not directly perform link prediction, is eling the way a disease, a rumor, a fashion or a joke, or worth mentioning in this context. They introduced an an Internet virus propagates via a social network [13]. object, called connection subgraph, which is defined as Our research has the following contributions: a small subgraph that best captures the relationship between two nodes in a social network. They also pro- 1. We explain the procedural aspect of constructing posed efficient algorithm based on electrical circuit laws a machine learning dataset to perform link predic- to find the connection subgraph from large social net- tion. work efficiently. Connection subgraph can be used to ef- fectively compute several topological feature values for 2. We identify a short list of features for link pre- supervised link prediction problem, especially when the diction in a particular domain, specifically, in the network is very large. coauthorship domain. These features are powerful There are many other interesting recent efforts [11, enough to provide remarkable accuracy and general 3, 5] related to social network, but none of these were enough to be applicable in other social network do- targeted explicitly to solve the link prediction problem. mains. They are also very inexpensive to obtain. Nevertheless, experiences and ideas from these papers 3. We experiment with a set of learning algorithms to were helpful in many aspects of this work. Goldenberg et al. [11] used Bayesian Networks to analyze the evaluate their performance in link prediction prob- social network graphs. Baumes et al. [3] used graph lem and perform a comparative analysis among them. clustering approach to identify sub-communities in a social network. Cai et al. [5] used the concept of relation 4. We evaluate each feature; first visually, by compar- network, to project a social network graph into several ing their class density distribution and then algo- relation graphs and mine those graphs to effectively rithmically through some well known feature rank- answer user's queries. In their model, they extensively ing algorithms. used optimization algorithms to find the most optimal combination of existing relations that best match the 2 Related Work user's query. Although most of the early research in social network 3 Data and Experimental Setup has been done by social scientists and psychologists [19], numerous efforts have been made by computer scien- Consider a social network G = hV; Ei in which each edge tists recently. Most of the work has concentrated on e = hu; vi 2 E represents an interaction between u and analyzing the social network graphs [2, 9]. Few efforts v at a particular time t. In our experimental domain have been made to solve the link prediction problem, the interaction is defined as coauthoring a research specially for social network domain. The closest match article. Each article bears, at least, author information with our work is that of D. Liben, et al. [17], where and publication year. To predict a link, we partition the authors extracted several features from the network the range of publication years into two non-overlapping topology of a coauthorship network. Their experiments sub-ranges. The first sub-range is selected as training evaluated the effectiveness of these features for the link years and the later one as the testing years. Then, prediction problem. The effectiveness was judged by we prepare the classification dataset, by choosing those the factor by which the prediction accuracy was im- author pairs, that appeared in the training years, but proved over a random predictor. This work provides an did not publish any papers together in those years. excellent starting point for link prediction as the fea- Each such pair either represents a positive example or a tures they extracted can be used in a supervised learning negative example, depending on whether those author framework to perform link prediction in a more system- pairs published at least one paper in the testing years atic manner. But, they used features based on network or not. Coauthoring a paper in testing years by a pair topology only. We, on the other hand, added several of authors, establishes a link between them, which was non-topological features and found that they improve not there in the training years. Classification model of the accuracy of link prediction substantially. In prac- link prediction problem needs to predict this link by tice, such non-topological data are available (for exam- successfully distinguishing the positive classes from the ple, overlap of interest between two persons) and they dataset. Thus, link prediction problem can be posed should be exploited to achieve a substantial improve- as a binary classification problem, that can be solved by employing effective features in a supervised learning arbitrary scientists x and y from the social network.
Recommended publications
  • Discover the Golden Paths, Unique Sequences and Marvelous Associations out of Your Big Data Using Link Analysis in SAS® Enterprise Miner TM
    MWSUG 2016 – Paper AA04 Discover the golden paths, unique sequences and marvelous associations out of your big data using Link Analysis in SAS® Enterprise Miner TM Delali Agbenyegah, Alliance Data Systems, Columbus, OH Candice Zhang, Alliance Data Systems, Columbus, OH ABSTRACT The need to extract useful information from large amount of data to positively influence business decisions is on the rise especially with the hyper expansion of retail data collection and storage and the advancement in computing capabilities. Many enterprises now have well established databases to capture Omni channel customer transactional behavior at the product or Store Keeping Unit (SKU) level. Crafting a robust analytical solution that utilizes these rich transactional data sources to create customized marketing incentives and product recommendations in a timely fashion to meet the expectations of the sophisticated shopper in our current generation can be daunting. Fortunately, the Link Analysis node in SAS® Enterprise Miner TM provides a simple but yet powerful analytical tool to extract, analyze, discover and visualize the relationships or associations (links) and sequences between items in a transactional data set up and develop item-cluster induced segmentation of customers as well as next-best offer recommendations. In this paper, we discuss the basic elements of Link Analysis from a statistical perspective and provide a real life example that leverages Link Analysis within SAS Enterprise Miner to discover amazing transactional paths, sequences and links. INTRODUCTION A financial fraud investigator may be interested in exploring the relationship between the financial transactions of suspicious customers, the BNI may want to analyze the social network of individuals identified as terrorists to conduct further investigation or a medical doctor may be interested in understanding the association between different medical treatments on patients and their corresponding results.
    [Show full text]
  • Biased Edge Dropout for Enhancing Fairness in Graph Representation
    PREPRINT SUBMITTED TO A JOURNAL. 1 Biased Edge Dropout for Enhancing Fairness in Graph Representation Learning Indro Spinelli, Graduate Student Member, IEEE, Simone Scardapane, Amir Hussain, Senior Member, IEEE and Aurelio Uncini, Member, IEEE Abstract—Graph representation learning has become a ubiqui- I. INTRODUCTION tous component in many scenarios, ranging from social network analysis to energy forecasting in smart grids. In several applica- RAPH structured data, ranging from friendships on tions, ensuring the fairness of the node (or graph) representations G social networks to physical links in energy grids, powers with respect to some protected attributes is crucial for their many algorithms governing our digital life. Social networks correct deployment. Yet, fairness in graph deep learning remains under-explored, with few solutions available. In particular, the topologies define the stream of information we will receive, tendency of similar nodes to cluster on several real-world often influencing our opinion [1][2][3][4]. Bad actors, some- graphs (i.e., homophily) can dramatically worsen the fairness times, define these topologies ad-hoc to spread false informa- of these procedures. In this paper, we propose a biased edge tion [5]. Similarly, recommender systems [6] suggest products dropout algorithm (FairDrop) to counter-act homophily and tailored to our own experiences and history of purchases. improve fairness in graph representation learning. FairDrop can be plugged in easily on many existing algorithms, is efficient, However, pursuing the highest accuracy as the only metric of adaptable, and can be combined with other fairness-inducing interest has let many of these algorithms discriminate against solutions. After describing the general algorithm, we demonstrate minorities in the past [7][8][9], despite the law prohibiting its application on two benchmark tasks, specifically, as a random unfair treatment based on sensitive traits such as race, religion, walk model for producing node embeddings, and to a graph and gender.
    [Show full text]
  • Link Analysis Using SAS Enterprise Miner
    Link Analysis Using SAS® Enterprise Miner™ Ye Liu, Taiyeong Lee, Ruiwen Zhang, and Jared Dean SAS Institute Inc. ABSTRACT The newly added Link Analysis node in SAS® Enterprise MinerTM visualizes a network of items or effects by detecting the linkages among items in transactional data or the linkages among levels of different variables in training data or raw data. This node also provides multiple centrality measures and cluster information among items so that you can better understand the linkage structure. In addition to the typical linkage analysis, the node also provides segmentation that is induced by the item clusters, and uses weighted confidence statistics to provide next-best-offer lists for customers. Examples that include real data sets show how to use the SAS Enterprise Miner Link Analysis node. INTRODUCTION Link analysis is a popular network analysis technique that is used to identify and visualize relationships (links) between different objects. The following questions could be nontrivial: Which websites link to which other ones? What linkage of items can be observed from consumers’ market baskets? How is one movie related to another based on user ratings? How are different petal lengths, width, and color linked by different, but related, species of flowers? How are specific variable levels related to each other? These relationships are all visible in data, and they all contain a wealth of information that most data mining techniques cannot take direct advantage of. In today’s ever-more-connected world, understanding relationships and connections is critical. Link analysis is the data mining technique that addresses this need. In SAS Enterprise Miner, the new Link Analysis node can take two kinds of input data: transactional data and non- transactional data (training data or raw data).
    [Show full text]
  • Combining Collective Classification and Link Prediction
    Combining Collective Classification and Link Prediction Mustafa Bilgic, Galileo Mark Namata, Lise Getoor Dept. of Computer Science Univ. of Maryland College Park, MD 20742 {mbilgic, namatag, getoor}@cs.umd.edu Abstract however, this is rarely the case. Real world collections usu- ally have a number of missing and incorrect labels and links. The problems of object classification (labeling the nodes Other approaches construct a complex joint probabilistic of a graph) and link prediction (predicting the links in model, capable of handling missing values, but in which a graph) have been largely studied independently. Com- inference is often intractable. monly, object classification is performed assuming a com- In this paper, we take the middle road. We propose a plete set of known links and link prediction is done assum- simple yet general approach for combining object classifi- ing a fully observed set of node attributes. In most real cation and link prediction. We propose an algorithm called world domains, however,attributes and links are often miss- Iterative Collective Classification and Link Prediction (IC- ing or incorrect. Object classification is not provided with CLP) that integrates collective object classification and link all the links relevant to correct classification and link pre- prediction by effectively passing up-to-date information be- diction is not provided all the labels needed for accurate tween the algorithms. We experimentally show on many link prediction. In this paper, we propose an approach that different network types that applying ICCLP improves per- addresses these two problems by interleaving object clas- formance over running collective classification and link pre- sification and link prediction in a collective algorithm.
    [Show full text]
  • Evolving Networks and Social Network Analysis Methods And
    DOI: 10.5772/intechopen.79041 ProvisionalChapter chapter 7 Evolving Networks andand SocialSocial NetworkNetwork AnalysisAnalysis Methods and Techniques Mário Cordeiro, Rui P. Sarmento,Sarmento, PavelPavel BrazdilBrazdil andand João Gama Additional information isis available atat thethe endend ofof thethe chapterchapter http://dx.doi.org/10.5772/intechopen.79041 Abstract Evolving networks by definition are networks that change as a function of time. They are a natural extension of network science since almost all real-world networks evolve over time, either by adding or by removing nodes or links over time: elementary actor-level network measures like network centrality change as a function of time, popularity and influence of individuals grow or fade depending on processes, and events occur in net- works during time intervals. Other problems such as network-level statistics computation, link prediction, community detection, and visualization gain additional research impor- tance when applied to dynamic online social networks (OSNs). Due to their temporal dimension, rapid growth of users, velocity of changes in networks, and amount of data that these OSNs generate, effective and efficient methods and techniques for small static networks are now required to scale and deal with the temporal dimension in case of streaming settings. This chapter reviews the state of the art in selected aspects of evolving social networks presenting open research challenges related to OSNs. The challenges suggest that significant further research is required in evolving social networks, i.e., existent methods, techniques, and algorithms must be rethought and designed toward incremental and dynamic versions that allow the efficient analysis of evolving networks. Keywords: evolving networks, social network analysis 1.
    [Show full text]
  • Finding Experts by Link Prediction in Co-Authorship Networks
    Finding Experts by Link Prediction in Co-authorship Networks Milen Pavlov1,2,RyutaroIchise2 1 University of Waterloo, Waterloo ON N2L 3G1, Canada 2 National Institute of Informatics, Tokyo 101-8430, Japan Abstract. Research collaborations are always encouraged, as they of- ten yield good results. However, the researcher network contains massive amounts of experts in various disciplines and it is difficult for the indi- vidual researcher to decide which experts will match his own expertise best. As a result, collaboration outcomes are often uncertain and research teams are poorly organized. We propose a method for building link pre- dictors in networks, where nodes can represent researchers and links - collaborations. In this case, predictors might offer good suggestions for future collaborations. We test our method on a researcher co-authorship network and obtain link predictors of encouraging accuracy. This leads us to believe our method could be useful in building and maintaining strong research teams. It could also help with choosing vocabulary for expert de- scription, since link predictors contain implicit information about which structural attributes of the network are important with respect to the link prediction problem. 1 Introduction Collaborations between researchers often have a synergistic effect. The combined expertise of a group of researchers can often yield results far surpassing the sum of the individual researchers’ capabilities. However, creating and organizing such research teams is not a straightforward task. The individual researcher often has limited awareness of the existence of other researchers with which collaborations might prove fruitful. Further- more, even in the presence of such awareness, it is difficult to predict in advance which potential collaborations should be pursued.
    [Show full text]
  • Learning to Make Predictions on Graphs with Autoencoders
    Learning to Make Predictions on Graphs with Autoencoders Phi Vu Tran Strategic Innovation Group Booz j Allen j Hamilton San Diego, CA USA [email protected] Abstract—We examine two fundamental tasks associated networks [38], communication networks [11], cybersecurity with graph representation learning: link prediction and semi- [6], recommender systems [16], and knowledge bases such as supervised node classification. We present a novel autoencoder DBpedia and Wikidata [35]. architecture capable of learning a joint representation of both local graph structure and available node features for the multi- There are a number of technical challenges associated with task learning of link prediction and node classification. Our learning to make meaningful predictions on complex graphs: autoencoder architecture is efficiently trained end-to-end in a single learning stage to simultaneously perform link prediction • Extreme class imbalance: in link prediction, the number and node classification, whereas previous related methods re- of known present (positive) edges is often significantly quire multiple training steps that are difficult to optimize. We less than the number of known absent (negative) edges, provide a comprehensive empirical evaluation of our models making it difficult to reliably learn from rare examples; on nine benchmark graph-structured datasets and demonstrate • Learn from complex graph structures: edges may be significant improvement over related methods for graph rep- resentation learning. Reference code and data are available at directed or undirected, weighted or unweighted, highly https://github.com/vuptran/graph-representation-learning. sparse in occurrence, and/or consisting of multiple types. Index Terms—network embedding, link prediction, semi- A useful model should be versatile to address a variety supervised learning, multi-task learning of graph types, including bipartite graphs; • Incorporate side information: nodes (and maybe edges) I.
    [Show full text]
  • Predicting Disease Related Microrna Based on Similarity and Topology
    cells Article Predicting Disease Related microRNA Based on Similarity and Topology Zhihua Chen 1,†, Xinke Wang 2,†, Peng Gao 2,†, Hongju Liu 3,† and Bosheng Song 4,* 1 Institute of Computing Science and Technology, Guangzhou University, Guangzhou 510006, China; [email protected] 2 School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China; [email protected] (X.W.); [email protected] (P.G.) 3 College of Information Technology and Computer Science, University of the Cordilleras, Baguio 2600, Philippines; [email protected] 4 School of Information Science and Engineering, Hunan University, Changsha 410082, China * Correspondence: [email protected]; Tel.: +86-731-88821907 † All authors contributed equally to this work. Received: 4 July 2019; Accepted: 5 November 2019; Published: 7 November 2019 Abstract: It is known that many diseases are caused by mutations or abnormalities in microRNA (miRNA). The usual method to predict miRNA disease relationships is to build a high-quality similarity network of diseases and miRNAs. All unobserved associations are ranked by their similarity scores, such that a higher score indicates a greater probability of a potential connection. However, this approach does not utilize information within the network. Therefore, in this study, we propose a machine learning method, called STIM, which uses network topology information to predict disease–miRNA associations. In contrast to the conventional approach, STIM constructs features according to information on similarity and topology in networks and then uses a machine learning model to predict potential associations. To verify the reliability and accuracy of our method, we compared STIM to other classical algorithms.
    [Show full text]
  • On Some Aspects of Link Analysis and Informal Network in Social Network Platform
    Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320–088X IJCSMC, Vol. 2, Issue. 7, July 2013, pg.371 – 377 RESEARCH ARTICLE On Some Aspects of Link Analysis and Informal Network in Social Network Platform Subrata Paul Department of Computer Science and Engineering, M.I.T.S. Rayagada, Odisha 765017, INDIA [email protected] Abstract— This paper presents a review on the two important aspects of Social Network, namely Link Analysis and Informal Network. Both of these characteristic plays a vital role in analysis of direction of information flow and reliability of the information passed or received. They can be easily visualized or can be studied using the concepts of graph theory. We have deliberately omitted discussing about general definitions of social network and representing the relationship between actors as a graph with nodes and edges. This paper starts with a formal definition of Informal Network and continues with some of its major aspects. Moreover its application is explained with a real life example of a college scenario. In the later part of the paper, some aspects of Informal Network have been presented. The similar kind of real life scenario is also being drawn out here to represent application of informal network. Lastly, this paper ends with a general conclusion on these two topics. Key Terms: - Social Network; Link Analysis; Information Overload; Formal Network; Informal Network I. INTRODUCTION Link analysis is a data-analysis technique used to evaluate relationships (connections) between nodes. Relationships may be identified among various types of nodes (objects), including organizations, people and transactions.
    [Show full text]
  • Representation Learning on Graphs
    Representation learning on graphs A/Prof Truyen Tran [email protected] @truyenoz Deakin University truyentran.github.io letdataspeak.blogspot.com goo.gl/3jJ1O0 HCMC, May 2019 11/05/2019 1 Graph representation Graph reasoning Why bother? Graph generation Embedding Graph dynamics Message passing 11/05/2019 2 Why learning of graph representation? Graphs are pervasive in many scientific disciplines. The sub-area of graph representation has reached a certain maturity, with multiple reviews, workshops and papers at top AI/ML venues. Deep learning needs to move beyond vector, fixed-size data. Learning representation as a powerful way to discover hidden patterns making learning, inference and planning easier. 11/05/2019 3 System medicine 11/05/2019 https://www.frontiersin.org/articles/10.3389/fphys.2015.00225/full 4 Biology & pharmacy Traditional techniques: Graph kernels (ML) Molecular fingerprints (Chemistry) Modern techniques Molecule as graph: atoms as nodes, chemical bonds as edges #REF: Penmatsa, Aravind, Kevin H. Wang, and Eric Gouaux. "X- ray structure of dopamine transporter elucidates antidepressant mechanism." Nature 503.7474 (2013): 85-90. 11/05/2019 5 Chemistry DFT = Density Functional Theory Gilmer, Justin, et al. "Neural message passing for quantum chemistry." arXiv preprint arXiv:1704.01212 (2017). • Molecular properties • Chemical-chemical interaction • Chemical reaction • Synthesis planning 11/05/2019 6 Materials science • Crystal properties • Exploring/generating solid structures • Inverse design Xie, Tian, and Jeffrey
    [Show full text]
  • Khanamk2021m-1A.Pdf (2.656Mb)
    Lakehead University Knowledge Commons,http://knowledgecommons.lakeheadu.ca Electronic Theses and Dissertations Electronic Theses and Dissertations from 2009 2021 Using homophily to analyze and develop link prediction models with deep learning framework Khanam, Kazi Zainab https://knowledgecommons.lakeheadu.ca/handle/2453/4829 Downloaded from Lakehead University, KnowledgeCommons Using Homophily to Analyze and Develop Link Prediction Models with Deep Learning Framework by Kazi Zainab Khanam A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science in the Faculty of Science and Environmental Studies of Lakehead University, Thunder Bay Committee in charge: Dr. Vijay Mago (Principal Supervisor) Dr. Rajesh Sharma (External Examiner) Dr. Yimin Yang (Internal Examiner) Winter 2021 The thesis of Kazi Zainab Khanam, titled Using Homophily to Analyze and Develop Link Prediction Models with Deep Learning Framework, is approved: Chair Date Date Date Lakehead University, Thunder Bay Using Homophily to Analyze and Develop Link Prediction Models with Deep Learning Framework Copyright 2021 by Kazi Zainab Khanam 1 Abstract USING HOMOPHILY TO ANALYZE AND DEVELOP LINK PREDICTION MODELS WITH DEEP LEARNING FRAMEWORK Twitter is a prominent social networking platform where users’ short messages or “tweets” are often used for analysis. However, there has not been much attention paid to mining the medical professions, such as detecting users’ occupations from their biographical content. Mining such information can be useful to build recommender systems for cost-effective ad- vertisements. Conventional classifiers can be used to predict medical occupations, but they tend to perform poorly as there are a variety of occupations. As a result, the main focus of the research is to use various deep learning techniques to examine the textual properties of Twitter users’ biographic contents, network properties, and the impact of homophily of Twitter users employed in medical professional fields.
    [Show full text]
  • Neural Variational Inference for Embedding Knowledge Graphs
    University College London MSc Data Science and Machine Learning Neural Variational Inference For Embedding Knowledge Graphs Author: Supervisor: Alexander Cowen-Rivers Prof. Sebastian Riedel Disclaimer This report is submitted as part requirement for the MSc Degree in Data Science and Machine Learning at University College London. It is substantially the result of my own work except where explicitly indicated in the text. The report may be freely copied and distributed provided the source is explicitly acknowledged.The report may be freely copied and distributed provided the source is explicitly acknowledged. Machine Reading Group October 23, 2018 i \Probability is expectation founded upon partial knowledge. A perfect acquaintance with all the circumstances affecting the occurrence of an event would change expectation into certainty, and leave nether room nor demand for a theory of probabilities." George Boole ii Acknowledgements I would like to thank my supervisors Prof Sebastian Riedel, as well as the other Machine Reading group members, particularly Dr Minervini, who helped through comments and discussions during the writing of this thesis. I would also like to thank my father who nurtured my passion for mathematics. Lastly, I would like to thank Thomas Kipf from the University of Amsterdam, who cleared up some queries I had regarding some of his recent work. iii Abstract Alexander Cowen-Rivers Neural Variational Inference For Embedding Knowledge Graphs Statistical relational learning investigates the development of tools to study graph-structured data. In this thesis, we provide an introduction on how models of the world are learnt using knowledge graphs of known facts of information, later applied to infer new facts about the world.
    [Show full text]