Durham E-Theses
Total Page:16
File Type:pdf, Size:1020Kb
Durham E-Theses Advances in Learning and Understanding with Graphs through Machine Learning BONNER, STEPHEN,ARTHUR,ROBERT How to cite: BONNER, STEPHEN,ARTHUR,ROBERT (2020) Advances in Learning and Understanding with Graphs through Machine Learning, Durham theses, Durham University. Available at Durham E-Theses Online: http://etheses.dur.ac.uk/13747/ Use policy The full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or charge, for personal research or study, educational, or not-for-prot purposes provided that: • a full bibliographic reference is made to the original source • a link is made to the metadata record in Durham E-Theses • the full-text is not changed in any way The full-text must not be sold in any format or medium without the formal permission of the copyright holders. Please consult the full Durham E-Theses policy for further details. Academic Support Oce, Durham University, University Oce, Old Elvet, Durham DH1 3HP e-mail: [email protected] Tel: +44 0191 334 6107 http://etheses.dur.ac.uk 2 Advances in Learning and Understanding with Graphs through Machine Learning Stephen Arthur Robert Bonner A thesis presented for the degree of Doctor of Philosophy at Durham University Department of Computer Science Durham University United Kingdom 11th October 2020 Advances in Learning and Understanding with Graphs through Machine Learning Stephen Arthur Robert Bonner Submitted for the degree of Doctor of Philosophy Graphs have increasingly become a crucial way of representing large, complex and disparate datasets from a range of domains, including many scientific disciplines. Graphs are particularly useful at capturing complex relationships or interdependencies within or even between datasets, and enable unique insights which are not possible with other data formats. Over recent years, significant improvements in the ability of machine learning approaches to automatically learn from and identify patterns in datasets have been made. However due to the unique nature of graphs, and the data they are used to represent, employing machine learning with graphs has thus far proved challenging. A review of relevant literature has revealed that key challenges include issues arising with macro-scale graph learning, interpretability of machine learned representations and a failure to incorporate the temporal dimension present in many datasets. Thus, the work and contributions presented in this thesis primarily investigate how modern machine learning techniques can be adapted to tackle key graph mining tasks, with a particular focus on optimal macro-level representation, interpretability and incorporating temporal dynamics into the learning process. The majority of methods employed are novel approaches centred around attempting to use artificial neural networks in order to learn from graph datasets. Firstly, by devising a novel graph fingerprint technique, it is demonstrated that this can successfully be applied to two different tasks whilst out-performing established baselines, namely graph comparison and classification. Secondly, it is shown that a mapping can be found between certain topological features and graph embeddings. This, for perhaps the the first time, suggests that it is possible that machines are learning something analogous to human knowledge acquis- ition, thus bringing interpretability to the graph embedding process. Thirdly, in exploring two new models for incorporating temporal information into the graph learning process, it is found ii that including such information is crucial to predictive performance in certain key tasks, such as link prediction, where state-of-the-art baselines are out-performed. The overall contribution of this work is to provide greater insight into and explanation of the ways in which machine learning with respect to graphs is emerging as a crucial set of techniques for understanding complex datasets. This is important as these techniques can potentially be applied to a broad range of scientific disciplines. The thesis concludes with an assessment of limitations and recommendations for future research. iii Declaration The work in this thesis is based on research carried out within the Innovative Computing Group at the Department of Computer Science, Durham University, UK. No part of this thesis has been submitted elsewhere for any other degree or qualification, and it is all the author's work unless referenced to the contrary below. Note on Publications Included in this Thesis At the time of submission, three chapters of this thesis contain content which has been published at peer-reviewed conferences or journals. Content from Chapter3 has been published as the following works: Stephen Bonner, John Brennan, Ibad Kureshi, Andrew Stephen McGough, and Georgios Theodoro- poulos. Efficient comparison of massive graphs through the use of `graph fingerprints'. In KDD Workshop on Mining and Learning with Graphs (MLG), 2016 Stephen Bonner, John Brennan, Georgios Theodoropoulos, Ibad Kureshi, and Andrew Stephen McGough. Gfp-x: A parallel approach to massive graph comparison using spark. IEEE International Conference on Big Data, pages 3298{3307, 2016 Stephen Bonner, John Brennan, Georgios Theodoropoulos, Ibad Kureshi, and Andrew Stephen McGough. Deep topology classification: A new approach for massive graph classification. In IEEE International Conference on Big Data, pages 3290{3297. IEEE, 2016 iv Content from Chapter4 has been published as the following works: Stephen Bonner, John Brennan, Ibad Kureshi, Georgios Theodoropoulos, Andrew Stephen McGough, and Boguslaw Obara. Evaluating the quality of graph embeddings via to- pological feature reconstruction. In IEEE International Conference on Big Data, pages 2691{2700. IEEE, 2017 Stephen Bonner, Ibad Kureshi, John Brennan, Georgios Theodoropoulos, Andrew Stephen McGough, and Boguslaw Obara. Exploring the semantic content of unsupervised graph embeddings: An empirical study. Data Science and Engineering, 4(3):269{289, 2019 Content from Chapter5 has been published as the following works: Stephen Bonner, John Brennan, Ibad Kureshi, Georgios Theodoropoulos, Andrew Stephen McGough, and Boguslaw Obara. Temporal graph offset reconstruction: Towards temporally robust graph representation learning. In IEEE International Conference on Big Data, pages 3737{3746. IEEE, 2018 Stephen Bonner, Amir Atapour-Abarghouei, Philip T Jackson, John Brennan, Ibad Kureshi, Georgios Theodoropoulos, Andrew Stephen McGough, and Boguslaw Obara. Temporal neighbourhood aggregation: Predicting future links in temporal graphs via recurrent vari- ational graph convolutions. In IEEE International Conference on Big Data, 2019 These chapters are presented largely as submitted, although additional experimentation has been performed, as well as alterations made to the referencing and notation to improve consistency. Note on Publications Not Included in this Thesis As well as the above papers, the following works have been published during the period of research for this thesis; however, these publications do not fit into the narrative of this thesis and have not been included in the text. v Stephen Bonner, Andrew Stephen McGough, Ibad Kureshi, John Brennan, Georgios Theodoro- poulos, Laura Moss, David Corsar, and Grigoris Antoniou. Data quality assessment and anomaly detection via map/reduce and linked data: a case study in the medical domain. In IEEE International Conference on Big Data, pages 737{746. IEEE, 2015 Stephen Bonner, Ibad Kureshi, John Brennan, and Georgios Theodoropoulos. Exploring the evolution of big data technologies. In Software Architecture for Big Data and the Cloud, pages 253{283. Elsevier, 2017 Stephen Bonner and Flavian Vasile. Causal embeddings for recommendation. In Proceedings of the 12th ACM Conference on Recommender Systems, pages 104{112. ACM, 2018 Daniel Justus, John Brennan, Stephen Bonner, and Andrew Stephen McGough. Predicting the computational cost of deep learning models. In IEEE International Conference on Big Data, pages 3873{3882. IEEE, 2018 Nik Khadijah Nik Aznan, Stephen Bonner, Jason Connolly, Noura Al Moubayed, and Toby Breckon. On the classification of ssvep-based dry-eeg signals via convolutional neural networks. In IEEE International Conference on Systems, Man, and Cybernetics, pages 3726{3731. IEEE, 2018 David Rohde, Stephen Bonner, Travis Dunlop, Flavian Vasile, and Alexandros Karatzoglou. Re- cogym: A reinforcement learning environment for the problem of product recommendation in online advertising. REVAL Workshop at the 12th ACM Conference on Recommender Systems, 2018 Philip T Jackson, Amir Atapour-Abarghouei, Stephen Bonner, Toby P Breckon, and Boguslaw Obara. Style augmentation: Data augmentation via style randomization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 83{92, 2019 Nik Khadijah Nik Aznan, Amir Atapour-Abarghouei, Stephen Bonner, Jason Connolly, Noura Al Moubayed, and Toby Breckon. Simulating brain signals: Creating synthetic eeg data via neural-based generative models for improved ssvep classification. In International Joint Conference on Neural Networks, pages 1{8, 2019 Stephen Bonner and David Rohde. Latent variable session-based recommendation. Second Symposium on Advances in Approximate Bayesian Inference, 2019 vi Otmane Sakhi, Stephen Bonner, David Rohde, and Flavian Vasile. Reconsidering analytical variational bounds for output layers of deep networks. Bayesian Deep Learning - Neural Information Processing Systems