Learning Topologies of Acyclic Networks with Tree Structures

University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Doctoral Dissertations Graduate School 8-2019 Learning Topologies of Acyclic Networks with Tree Structures Firoozeh Sepehr University of Tennessee Follow this and additional works at: https://trace.tennessee.edu/utk_graddiss Recommended Citation Sepehr, Firoozeh, "Learning Topologies of Acyclic Networks with Tree Structures. " PhD diss., University of Tennessee, 2019. https://trace.tennessee.edu/utk_graddiss/5636 This Dissertation is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Doctoral Dissertations by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a dissertation written by Firoozeh Sepehr entitled "Learning Topologies of Acyclic Networks with Tree Structures." I have examined the final electronic copy of this dissertation for form and content and recommend that it be accepted in partial fulfillment of the requirements for the degree of Doctor of Philosophy, with a major in Computer Science. Donatello Materassi, Major Professor We have read this dissertation and recommend its acceptance: Michael Langston, Seddik Djouadi, Hamparsum Bozdogan Accepted for the Council: Dixie L. Thompson Vice Provost and Dean of the Graduate School (Original signatures are on file with official studentecor r ds.) Learning Topologies of Acyclic Networks with Tree Structures A Dissertation Presented for the Doctor of Philosophy Degree The University of Tennessee, Knoxville Firoozeh Sepehr August 2019 c by Firoozeh “Dawn” Sepehr, 2019 All Rights Reserved. ii Acknowledgments I would like to give a special thanks to my advisor, Dr Donatello Materassi, for his advice and support throughout all my experience as a PhD student. I would not be where I am today without him inspiring me at every step of this incredible and challenging journey. Also, thank you to my committee, Dr Djouadi, Dr Langston and Dr Bozdogan, for agreeing to review my work and for their guidance and feedback on my work. I would also like to thank my family, friends, and colleagues who believed in me and supported me throughout all these years. My PhD research has been partially funded by the National Science Foundation: NSF (CNS CAREER #1553504) and I am so grateful for their support of my work. iii Abstract Network topology identification is known as the process of revealing the interconnections of a network where each node is representative of an atomic entity in a complex system. This procedure is an important topic in the study of dynamic networks since it has broad applications spanning different scientific fields. Furthermore, the study of tree structured networks is deemed significant since a large amount of scientific work is devoted to them and the techniques targeting trees can often be further extended to study more general structures. This dissertation considers the problem of learning the unknown structure of a network when the underlying topology is a directed tree, namely, it does not contain any cycles. The first result of this dissertation is an algorithm that consistently learns a tree structure when only a subset of the nodes is observed, given that the unobserved nodes satisfy certain degree conditions. This method makes use of an additive metric and statistics of the observed data only up to the second order. As it is shown, an additive metric can always be defined for networks with special dynamics, for example when the dynamics is linear. However, in the case of generic networks, additive metrics cannot always be defined. Thus, we derive a second result that solves the same problem, but requires the statistics of the observed data up to the third order, as well as stronger degree conditions for the unobserved nodes. Moreover, for both cases, it is shown that the same degree conditions are also necessary for a consistent reconstruction, achieving the fundamental limitations. The third result of this dissertation provides a technique to approximate a complex network via a simpler one when the assumption of linearity is exploited. The goal of this approximation is to highlight the most significant connections which could potentially reveal more information about the network. In order to show the reliability of this method, we consider high frequency financial data and show how well the businesses are clustered together according to their sector. iv Table of Contents 1 Introduction1 1.1 Techniques to Learn Network Structures.......................2 1.2 Learning Techniques for Networks with Unmeasurable (Hidden) Nodes......5 1.3 The Importance of Tree Structured Networks....................8 1.4 Approximation using Simpler Structures.......................9 1.5 Contributions of this Dissertation...........................9 2 Preliminaries, Background, Assumptions and Problem Formulation 12 2.1 Graphs with All Visible Nodes............................ 12 2.2 Graphs with Hidden Nodes.............................. 17 2.3 Linear Dynamic Influence Models.......................... 22 2.4 Problem Statement.................................. 25 3 Learning Linear Networks with Tree Structures 27 3.1 Reconstruction of Rooted Trees with Hidden Nodes via An Additive Metric.... 27 3.2 An Algorithm to Learn Latent Polyforest Networks................. 31 3.2.1 Step A. Obtain the Visible Descendants of Each Root............ 31 3.2.2 Step B. Learn the Structure of Each Rooted Tree.............. 33 3.2.3 Step C. Merge the Rooted Trees into the Polyforest Skeleton........ 34 3.2.4 Step D. Identify the Link Orientations.................... 36 3.2.5 Putting It All Together............................ 38 3.3 Numerical Example.................................. 39 3.4 Fundamental Limitations............................... 41 v 4 Learning Non-linear Networks with Tree Structures 46 4.1 An Algorithm to Learn Latent Polytree Networks.................. 46 4.1.1 Task 1. Determine the Visible Nodes of Each Rooted Subtree....... 47 4.1.2 Task 2. Determine the Collapsed Representation of the Quasi-skeleton of Each Rooted Subtree............................. 50 4.1.3 Task 3. Merge the Hidden Clusters of the Collapsed Rooted Subtrees... 51 4.1.4 Task 4. Determine the Quasi-skeleton of the Latent Polytree from the Collapsed Quasi-skeleton of the Latent Polytree............... 52 4.1.5 Task 5. Obtain the Pattern of the Latent Polytree from the Quasi-skeleton of the Latent Polytree............................. 56 4.1.6 Putting It All Together............................ 57 4.2 Additional Examples................................. 58 4.2.1 A Star Network................................ 58 4.2.2 A Polytree Network with Only Type-I Hidden Nodes............ 59 4.2.3 A Polyforest Network............................ 61 4.3 Fundamental Limitations............................... 63 5 Using Polytrees to Approximate Networks 66 5.1 Approximation Algorithm.............................. 66 5.1.1 Step A. Determine the Skeleton of the Approximating Polytree....... 67 5.1.2 Step B. Assign Orientations to the Edges in the Skeleton.......... 68 5.1.3 Putting It All Together............................ 75 5.2 Real Data Application: Stock Market Analysis.................... 76 6 Conclusions 81 Bibliography 83 Appendix 93 A Proofs Related to Chapter2.............................. 94 A.1 Proof of Lemma 2.27............................. 94 A.2 Proof of Lemma 2.30............................. 94 vi A.3 Proof of Proposition 2.33........................... 95 A.4 Proof of Proposition 2.34........................... 96 B Proofs Related to Chapter3.............................. 99 B.1 Proof of Theorem 3.2............................. 99 B.2 Proof of Proposition 3.3........................... 99 B.3 Proof of Theorem 3.5............................. 100 B.4 Proof of Theorem 3.6............................. 102 B.5 Proof of Theorem 3.8............................. 106 B.6 Proof of Proposition 3.9........................... 107 B.7 Proof of Theorem 3.10............................ 107 B.8 Proof of Lemma 3.11............................. 108 B.9 Proof of Lemma 3.12............................. 108 B.10 Proof of Theorem 3.13............................ 108 C Proofs Related to Chapter4.............................. 110 C.1 Proof of Theorem 4.1............................. 110 C.2 Explanation of the Reconstruction Algorithm for Latent Rooted Trees... 110 C.3 Proof of Theorem 4.2............................. 113 C.4 Proof of Theorem 4.3............................. 115 C.5 Proof of Theorem 4.4............................. 115 C.6 Proof of Theorem 4.5............................. 116 C.7 Proof of Lemma 4.6............................. 118 C.8 Proof of Theorem 4.7............................. 119 C.9 Proof of Theorem 4.8............................. 121 D Proofs Related to Chapter5.............................. 123 D.1 Proof of Theorem 5.2............................. 123 D.2 Proof of Proposition 5.5........................... 124 D.3 Proof of Theorem 5.6............................. 124 Vita 125 vii List of Tables 5.1 Percentage of inter-cluster links in the approximated skeleton [1].......... 80 5.2 Percentage of common edges in the approximated skeleton [1]........... 80 viii List of Figures 2.1 A directed graph (a), an undirected graph (b), a partially directed graph (c), and skeleton of the graph in a (d) [2,3].........................

Load more