Analysis of Different Random Graph Models in the Identification of Network Motifs in Complex Networks

Analysis of Different Random Graph Models in the Identification of Network Motifs in Complex Networks Vom Fachbereich Informatik der Technischen Universität Kaiserslautern zur Verleihung des akademischen Grades Doktor der Naturwissenschaften (Dr. rer. nat.) genenehmigte Dissertation von Wolfgang Eugen Schlauch Datum der wissenschaftlichen Aussprache: 3. Dezember 2016 Dekan: Prof. Dr. Klaus Schneider Berichtserstatter: Prof. Dr. Katharina Anna Zweig Prof. Dr. Clemence Mangien D 386 CONTENTS I introduction5 1motivation 7 1.1 General Introduction to Network Analysis . 7 1.2 Which Model to Use . 9 1.3 Subgraphs . 9 1.4 The missing ingredient . 11 II definitions 13 2 graph models 17 2.0.1 The Er˝os-Rényimodel - the random graph . 18 2.1 Fixed Degree Sequence model . 18 2.1.1 Havel-Hakimi-Algorithm . 19 2.1.2 Configuration Model . 19 2.1.3 Switching Algorithm . 21 2.1.4 Sequential Importance Sampling . 23 3mathematicaltoolkit 27 3.1 Statistical Analysis of Differences in Results . 28 3.2 Simple Independence Model . 30 3.3 Critique on the applicability of sim to all graphs . 31 3.3.1 Example 1 ................................... 32 3.3.2 Example 2 ................................... 33 III comparison of the different null models based on artificial and real-world undirected graphs 35 4 comparativeanalysisbasedonthemodels 39 4.1 Erd˝os-Rényigraph . 41 4.2 Forest-Fire graph . 43 5 sequential importance sampling—which probability distribution to use? 45 6 analysis of undirected real-world graphs 47 6.1 Datasets . 47 7 stable measures 51 7.1 Diameter . 52 7.1.1 Approximating the Diameter . 52 7.1.2 Model comparison . 53 7.2 Distance . 56 7.2.1 Approximating the Distance . 56 7.2.2 Model comparison . 57 7.3 Implications . 60 7.4 Multiple Edges and Self-Loops . 60 i Contents 8 sensitive measures 67 8.1 Average Neighbor Degree . 67 8.1.1 Approximating the Average Neighbor Degree . 68 8.1.2 Model comparison . 68 8.2 Common Neighbors . 70 8.2.1 Approximating the Co-Occurrence . 71 8.2.2 Comparison of the models . 72 8.2.3 Local Co-Occurrences . 74 8.3 Implications . 74 IV comparison of the different null models based on directed graphs 79 9analysisofdirectedgraphs 83 9.1 A short history of motif analysis . 83 9.2 Data........................................... 85 10 on motifs 87 11differentmodelsunderinvestigation 91 11.1 On the Directed Configuration Model . 91 11.2 On the Sequential Importance Sampling . 99 11.3 A Faster Option to Calculate the Expected Number of Motifs . 105 11.4 Revisiting the Bifan . 120 11.4.1 Summary . 125 12 node-basedparticipationestimationinmotifs 127 12.1 Constructing Position-Based Equations . 128 12.2 On the Sum of Equations . 130 12.3 How do the Equations for Positions Fare? . 135 12.4 Revisiting the Bifan . 137 13 on predicting co-purchased items 141 13.0.1 Television Series Prediction . 143 13.1 The model to use and open problems . 152 V summary 155 14 summary and conclusions 157 14.1 Summary . 157 14.2 Conclusions to draw . 159 14.3 Future Work . 159 VI appendix 169 comparison with the results of itzkovitz et al. 171 directed graphs - tables 175 1 On the Configuration Model . 175 2 On the Sequential Importance Sampling Model . 183 3 On the Faster Option . 191 bifan equation revisited - continued 195 publications 199 1 Journal Articles . 199 ii Contents 2 Conferences . 199 3 Other . 199 iii LISTOFFIGURES Figure 2.1 Possible two-edge swaps in an undirected graph. 21 Figure 2.2 “Unswappable” Digraph . 22 Figure 2.3 Intermediate step in the graph generating process of the sequential importance sampling. 24 Figure 3.1 Kolmogorov-Smirnoff Test . 29 Figure 3.2 Example 1 .................................. 32 Figure 3.3 Two graphs generated from the same degree sequence with different modularity-scores. 33 Figure 4.1 Measures in unskewed graphs . 42 Figure 4.2 Measures in skewed graphs . 44 Figure 5.1 First comparison of the models based on assortativity . 46 Figure 7.1 Example graph for which the cfg and the sim yield many multiple edges and self-loops. 61 Figure 7.2 G(n, m) edgeloss example . 63 Figure 7.3 Edgeloss in graphs with skewed degree distributions . 64 Figure 7.4 Comparison of multiple edges attached to a high degree node . 66 Figure 8.1 Example of a multigraph. The question is, how to calculate the average neighbor degree of v......................... 67 Figure 8.2 Co-occurrence vs. multigraph . 71 Figure 8.3 Comparison of the number of neighbors . 75 Figure 11.1 Edgeloss in directed graphs . 92 Figure 11.3 Histogram of Feed-Forward Loop distribution in cfg, ecfg, and fdsm 98 Figure 11.4 Degree distributions of graphs . 102 Figure 11.5 Average occurrence of edges in samples . 115 Figure 11.6 Average occurrence of edges in samples, contd. 117 Figure 11.7 Histograms of the distribution of subgraphs in different graphs, com- pared with the estimated distribution via the sim............ 119 Figure 12.1 Positions in a Feed-Forward Loop . 128 Figure 12.2 Relative error for Twopaths . 133 Figure 12.3 Relative error for Feed-Forward Loops . 134 Figure 12.4 Relative error based on position . 136 Figure 12.5 Relative error of the Bifan . 140 Figure 13.1 Correctly assessed edges . 151 v LISTOFTABLES Table 6.1 Basic network statistics for the individual networks used in the article. 49 Table 7.1 Average diameter of cfg samples . 54 Table 7.2 Average diameter of sis samples . 54 Table 7.3 Two-sample z-test results, the Kolmogorov-Smirnov two-sample test result and its p-value for the diameter of the graphs generated with the fdsm and the usis............................ 55 Table 7.4 Average distance of cfg samples . 57 Table 7.5 Two-sample z-test results, the Kolmogorov-Smirnov two-sample test result and its p-value for the average distance of the graphs generated with the fdsm and the cfg....................... 58 Table 7.6 Average distance of sis samples . 59 Table 7.7 Two-sample z-test results, the Kolmogorov-Smirnov two-sample test result and its p-value for the average distance of the graphs generated with the fdsm and the usis...................... 59 Table 7.8 Average number of self-loops and multi-edges for samples from the cfg. The table also contains their expected value calculated with Equation (7.19), resp. Equation (7.20).................... 65 Table 8.1 Average neighbor degree in fdsm, cfg, and sim ............ 68 Table 8.2 Average neighbor degree in fdsm and sis ................ 69 Table 8.3 Two-sample z-score calculation in comparison with the fdsm to test whether results can be from the distribution indicated by the samples. 70 Table 8.4 Average co-occurrence in the different models . 73 Table 8.5 z-score calculation with the graphs from the ecfg as samples and the number of cooccurrences in the fdsm as value to test whether it can be from the distribution indicated by the samples. 73 Table 11.1 Number of Forks found in the respective models. The standard de- viation for the cfg is due to the fact that two edges between the same node do not yield a Fork. 93 Table 11.2 Number of Fans found in the respective models. 94 Table 11.3 Results of the Kolmogorov-Smirnov two-sample test between the different models for the Fork. 95 Table 11.4 Results of the Kolmogorov-Smirnov two-sample test between the different models for the Fan. 95 Table 11.5 Number of Twopaths found in the respective models. 96 Table 11.6 Results of the Kolmogorov-Smirnov two-sample test between the different models for the Twopaths. 96 Table 11.7 Number of Feed-Forward Loops found in the respective models. 97 Table 11.8 Results of the Kolmogorov-Smirnov two-sample test between the different models for the Feed-Forward Loop. 98 Table 11.9 Number of Twopaths found in the respective models. 100 vii List of Tables Table 11.10 Results of the Kolmogorov-Smirnov two-sample test between the different models for the Twopath. 101 Table 11.11 Percentage of nodes which violate the condition that the square of their out-, in-, or combined degree should be smaller than the sum of the respective degree sequence. 101 Table 11.12 Number of Feed-Forward Loops found in the respective models. 103 Table 11.13 Results of the Kolmogorov-Smirnov two-sample test between the different models for the Feed-Forward Loop. 104 Table 11.14 Results of the Kolmogorov-Smirnov two-sample test between the different models for the Out-Fan. 104 Table 11.15 Results of the two-sample z-score calculation between.

Load more