The Friendship Paradox for Weighted and Directed Networks
Total Page:16
File Type:pdf, Size:1020Kb
THE FRIENDSHIP PARADOX FOR WEIGHTED AND DIRECTED NETWORKS BY HONGYI JIANG A Thesis Submitted to the Graduate Faculty of WAKE FOREST UNIVERSITY GRADUATE SCHOOL OF ARTS AND SCIENCES in Partial Fulfillment of the Requirements for the Degree of MASTER OF ARTS Mathematics and Statistics August 2017 Winston-Salem, North Carolina Approved By: Kenneth S. Berenhaut, Ph.D., Advisor Miaohua Jiang, Ph.D., Chair John Gemmer, Ph.D. Staci Hepler, Ph.D. Acknowledgments The work presented here could not be achieved without a great deal of support. Firstly, I want to share my greatest thanks to my mentor, Kenneth S. Berenhaut, who was always with me during the whole processes of the research and my entire Master studies. I feel very fortunate to be your student and have those brainstorms with you which produced lots of brilliant ideas. I would also like to thank Dr. Staci Hepler. I really appreciate your consistent encouragement, support and trust. I learned a lot while taking your classes and working as your TA. In addition, I benefit much from Dr. Erhardt Robert's challenging time series class and the discussion with Dr. John Gemmer about research. I would like to extend my thanks to Dr. Stephen Robinson and Dr. Mauricio Rivas since your analysis classes are two of the best analysis classes I have ever taken, which deepened my understanding toward analysis. Also, thanks to Dr. Jennifer Erway who taught me my first optimization class, which would be very helpful to my future career. I am grateful to Mrs. Jule Connolly for her support in my work as teaching assistant and tutoring, and to Dr. Miaohua Jiang for his valuable guidance during my transition to studying and living abroad. I appreciate the friendliness and help from all the graduate students, faculty and staff in the department. You make me feel at home when I study at Wake. Finally I greatly appreciate my parents' and other family members' love to me, as a child far from the whole family. ii Table of Contents Acknowledgments . ii Abstract . v List of Tables . vi List of Figures . vii Chapter 1 Introduction . 1 1.1 Terms and Definitions . .1 1.2 Historical Background . .4 1.2.1 Applications of the Friendship Paradox . .6 1.3 Overview of Results . .8 1.4 Organization of the thesis . .9 Bibliography . 11 Chapter 2 The friendship paradox for weighted and directed networks . 14 2.1 Introduction . 15 2.2 Directed networks . 19 Bibliography . 25 Chapter 3 Future Directions . 30 Appendix A The degree-wise effect of a second step for a random walk on a graph.......................................................................... 31 A.1 Introduction . 32 A.2 Proof of Theorem A.3 . 35 Bibliography . 44 Appendix B A new look at clustering coefficients with generalization to weighted and multi-faction networks . 48 B.1 Introduction . 49 B.2 Weighted networks . 54 iii B.3 Clustering for node subsets of interest . 61 B.3.1 Two-mode networks . 63 B.4 Computing values of γv .......................... 69 B.5 Applications . 73 B.6 Conclusion . 82 Bibliography . 83 B.7 Appendix . 90 Curriculum Vitae . 94 iv Abstract Hongyi Jiang This thesis studies the friendship paradox for weighted and directed networks, from a probabilistic perspective. We consolidate and extend recent results of Cao and Ross and Kramer, Cutler and Radcliffe, to weighted networks. Friendship paradox results for directed networks are given; connections to detailed balance are considered. v List of Tables B.1 The values of γv(G; S) (to two decimal places) for each node (employing the appropriate set S for the node v, so that v 2 S), in each of the two graphs in Figure B.3. The average values γG(P) are 0.7158 (left) and 0.9876 (right). 63 B.2 Local clustering values for women in the Southern Women data set. ∗ 0 0 0 0 Here γev = (γv − me (k; k ))=(Mf(k; k ) − me (k; k )), where me (k; k ) and Mf(k; k0)) are as in (B.46) and (B.48), respectively. The two-mode LCC is calculated as in [1], for comparison. 75 B.3 Sizes of the primary node set, secondary node sets and edge set for the two-mode networks considered in Tables B.4 and B.5. The Nor- wegian Directors network (see [2]) consists of 1495 directors connected to 367 companies on whose boards they served (the largest connected component consists of 818 directors). The US Supreme Court network consists of 9 justices connected to 24 cases, with an edge between a jus- tice and a case whenever the justice voted in the minority for that case [3]. The Scotland network is comprised of 131 directors and 86 joint- stock companies in early 20th century Scotland [4]. The St. Louis Crime network consists of 754 suspects for 509 crimes in 1990's St. Louis, Missouri, USA [5]. The CEO's and Clubs network consists of 26 CEO's and the 15 clubs of which they were members [6], see also [7]. The Authors and Papers network is a collaboration network comprised of 86 authors and 167 papers [8]. 76 B.4 The mean clustering values for some two-mode networks. Here for a subset S ⊆ V , γG(S) is the mean value of γv(G; S) over all v 2 S and for P P a partition P = fS1;S2g, γG(P) is the mean value ( S2P v2S γv(S))=n. 77 ∗ B.5 Global clustering values for some two-mode networks. Here γfG is the ∗ mean value of γev over all v 2 S (the three values in parentheses indicate 0 0 values when, for nodes for which me (k; k ) = Mf(k; k ), the correspond- ∗ ing γev values are treated as zero, or one, or are excluded in computing the mean, respectively). GCCO is calculated as in [1], GCCR is calcu- lated as in [9], and the value for P rojected is the one-mode clustering coefficient as in (B.3) for the corresponding projected network. 77 B.6 Global and subset clustering coefficients for some multi-faction networks. 80 vi List of Figures 2.1 Two five-node directed networks. In-degrees are indicated adjacent to the corresponding nodes. The constants along edges in (a) indicate multiple edges. 19 A.1 Two six-node networks. Degrees are indicated adjacent to the corre- sponding nodes. 34 A.2 (a) gives configuration types leading to a positive value for P(X0 2 + + − S ;X1 2 S ;X2 2 S ). Nodes on the left and right are assumed to be in S− and S+, respectively. (b) gives configuration types leading to a − + − positive value for P(X0 2 S ;X1 2 S ;X2 2 S ). The configurations corresponding to (A.16), (A.17), (A.21), (A.22) and (B.14) are given in A, B, C, D and E, respectively. 38 B.1 A simple five-node network. 52 B.2 A simple weighted graph. 56 B.3 A small collaboration graph (left), and its corresponding weighted pro- jection (right), with weights as in (B.21). 57 B.4 A simple five-node weighted network, with weights indicated adjacent to the corresponding edge. 59 B.5 Two ten-node networks, each partitioned into two subsets, one of size four (white) and one of size six (grey). 62 B.6 Configurations in the ego network of node 1 with γ1 = me (4; 7) = 372=361 (left) and γ1 = Mf(4; 7) = 16=5 (right). Here p = 1 and r = 3 in (B.48). 65 0 0 B.7 The conjectured enveloping values me (k; k ) and Mf(k; k ) in (B.46) and (B.48), for 2 ≤ k ≤ 10 and 2 ≤ k0 ≤ 20. 66 B.8 Two small two-mode networks, with primary and secondary node sets of size eight and four, respectively. 68 B.9 Two simple two-mode networks with three individuals and three events. 69 B.10 A small seven-node network with selected subset S = f1; 2; 3; 4g.... 71 B.11 The Davis Southern Women network. The events are denoted E1 through E14. 74 B.12 The ego networks of women, Charlotte and Olivia, from the Davis Southern Women network. 75 vii B.13 Comparison of local clustering coefficients against (weighted) degree for four weighted networks. Where appropriate, a log-scale has been used for degree. Clustering coefficients are computed as a function of vertex strengths, for Barrat's coefficient with arithmetic mean (◦), Barrat's coefficient with geometric mean (4), the coefficient of Zhang ∗ and Horvath [10] (+), the coefficient of Onnela et al [11] (2) and 1−γv , as in (B.26) (∗). Note that in each case (undefined) coefficient values for nodes with only one neighbour have been excluded. 79 B.14 The 34-node and 78-edge two-faction karate social network of Zachary [12]. The nodes represent members of a karate club and edges are determined according to interactions outside the club. A conflict arose within the group leading to allegiances as indicated. 81 B.15 A network of friendship choices in a school [13]. Top left is partitioned by race; top right is partitioned by gender and bottom is partitioned by grade (7{8 and 9{12). 81 viii Chapter 1: Introduction The friendship paradox, introduced widely by Feld in [6], states roughly that, in a network scenario, one's neighbours have (on average) more neighbours than oneself. The result has recently been employed to advantage in epidemic detection and more general sampling scenarios (see [3, 2, 8, 5, 13, 9, 19]), and has received considerable attention from scientists across disciplines. For a recent discussion of societal welfare implications see [11].