SYNTHESIZING REALISTIC SOCIAL NETWORKS

USING PERSONALITY COMPATIBILITY

by

DANIEL ANTHONY O'NEIL

A DISSERTATION

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in The Modeling and Simulation Program to The School of Graduate Studies of The University of Alabama in Huntsville

HUNTSVILLE, ALABAMA

2019

In presenting this dissertation in partial fulfillment of the requirements for a doctoral degree from The University of Alabama in Huntsville, I agree that the Library of this University shall make it freely available for inspection. I further agree that permission for extensive copying for scholarly purposes may be granted by my advisor or, in his/her absence, by the Chair of the Department or the Dean of the School of Graduate Studies. It is also understood that due recognition shall be given to me and to The University of Alabama in Huntsville in any scholarly use which may be made of any material in this dissertation.

______Daniel A. O’Neil (Date)

ii

iii

DISSERTATION APPROVAL FORM

Submitted by Daniel Anthony O'Neil in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Modeling and Simulation and accepted on behalf of the Faculty of the School of Graduate Studies by the dissertation committee. We, the undersigned members of the Graduate Faculty of The University of Alabama in Huntsville, certify that we have advised and/or supervised the candidate on the work described in this dissertation. We further certify that we have reviewed the dissertation manuscript and approve it in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Modeling and Simulation.

iv

v

ABSTRACT The School of Graduate Studies The University of Alabama in Huntsville

Degree: Doctor of Philosophy Program: Modeling and Simulation Name of Candidate: Daniel Anthony O'Neil Title: Synthesizing Realistic Social Networks Using Personality Compatibility_

Social structures and interpersonal relationships may be represented in abstract mathematical objects known as social networks. A social network consists of nodes corresponding to people and links between pairs of nodes corresponding to relationships between those people. Social networks can be constructed by examining groups of people and identifying the relationships of interest between them. There are circumstances where such empirical social networks are unavailable, or their use would be undesirable.

Consequently, methods to generate synthetic social networks that are not identical to real- world networks but have desired structural similarities to them are valuable.

A process for generating synthetic social networks based on attributing human personality types to the nodes and then stochastically adding links between nodes based on the compatibility of the nodes’ personalities was developed. Four algorithms for finding an effective assignment of personality types to nodes were developed and tested.

Using the Myers-Briggs Type Indicator as a model of personality types, a compatibility table used by the algorithms was created. The four algorithms were evaluated for realism as measured by the similarity of the synthetic social networks to real-world exemplar networks. Based on 20 standard quantitative network metrics, synthesized social networks were compared to 14 real-world exemplar networks. Custom implementations of two randomized algorithm classes, Monte Carlo and Genetic, produced more realistic

vi

networks than the classic Erdős-Rényi algorithm. Two new heuristic algorithms, Probability

Search and Compatibility-Degree Matching, produced more realistic networks than the well-

known and widely-used algorithm.

To confirm that the algorithms’ effectiveness was independent of a specific personality type

model, 15 Iterated Prisoners’ Dilemma strategies were treated as personality types. The

strategies were implemented, an Iterated Prisoners’ Dilemma round-robin tournament was

conducted, and the tournament’s results were used as a personality compatibility table. The

new Compatibility-Degree Matching algorithm again produced more realistic synthetic social

networks than the Configuration Model algorithm. Finally, a new randomized algorithm to

synthesize a sequence of revised social networks representing the evolution of a social

network over time was developed. A Turing test showed that the synthesized social network

sequences were indistinguishable from real-world exemplar sequences of evolving social

networks.

vii

ACKNOWLEDGMENTS

I thank God for the people who enabled the development of this dissertation.

Especially, I am grateful to my adviser, Dr. Mikel Petty, for his guidance during the past nine years, for suggesting the topic, and for our creative collaboration on algorithms, code, and articles. Recommendations from other members of my dissertation committee regarding the scope of work and a validation approach were greatly appreciated. The Alabama

Supercomputer Authority, which is funded by the State of Alabama, granted a copious amount of much needed processing time; that support is gratefully acknowledged.

Deploying software on a supercomputer can be daunting; thankfully, Dr. David Young and his team patiently yet swiftly answered all my questions. This research was partially funded by the 2014 RADM Fred Lewis Postgraduate I/ITSEC Scholarship, awarded in association with the Interservice/Industry Training, Simulation and Education Conference and organized by the National Training and Simulation Association. The National Aeronautics and Space

Administration funded several courses through the part time study program. My parents,

Tony and Cozy O’Neil, provided additional funding. I thank these sponsors for significantly reducing the financial burden and associated emotional stress of this educational journey.

A social network of family and friends provided emotional support. One of the reasons I cherish Marie O’Neil is she went back to college to accompany me on this journey.

Brainstorming sessions with my supervisor and friend, Wes Brown, motivated me and clarified a reason for this research. My brother Chris, his wife Laura, and his sons

Christopher, Kenneth, and Steven energized me as they listened to my stories about obstacles encountered during this journey. A compadre, Dan Shultz, cheered me up when I felt down.

Finally, I thank my parents for their love and continuing encouragement.

viii

ix

TABLE OF CONTENTS

PAGE List of Tables ...... xvi

List of Figures ...... xxi

List of Abbreviations ...... xxiii

Chapter 1 Introduction ...... 1

Chapter 2 Background Information ...... 5

2.1 ...... 5

2.2 and social network analysis ...... 6

2.3 Classes of social networks ...... 7

2.4 Data structures and attributes of social networks ...... 9

2.5 Social network metrics ...... 10

2.6 Personality models ...... 15

2.7 Iterated Prisoners’ Dilemma ...... 18

Chapter 3 Motivation And Research Questions...... 22

Chapter 4 Literature Review ...... 27

4.1 Social network metrics ...... 27

4.2 Network generation methods ...... 34

4.2.1 model ...... 35

4.2.2 The Configuration Model ...... 36

4.2.3 Exponential random graph model ...... 37

4.2.4 Stochastic block model ...... 37

4.2.5 Small world model...... 39

x

4.2.6 Preferential attachment model ...... 39

4.2.7 Popularity similarity model ...... 39

4.2.8 Chung-Lu graph model ...... 40

4.2.9 Degree correlation dK series ...... 40

4.2.10 Block Two-Level Erdős-Rényi model ...... 41

4.2.11 Replication of complex networks model ...... 41

Chapter 5 Real-World Exemplar Data Sets ...... 44

5.1 Why not use online social media data? ...... 44

5.2 Sources of real-world social network data sets ...... 45

Chapter 6 Developing A Personality Compatibility Table ...... 48

6.1 Motivation for developing this method ...... 48

6.2 Inferring attitudes and personality compatibility ...... 50

6.3 Keirsey's observations of the MBTI personality types...... 52

6.4 Constructing the compatibility table ...... 56

6.5 Personality compatibility table development method discussion ...... 57

Chapter 7 Agent-Based Modeling ...... 58

7.1 NetLogo simulation ...... 58

7.2 Verification of the NetLogo network extension ...... 59

7.2.1 Betweenness centrality ...... 60

7.2.2 Closeness centrality ...... 62

7.2.3 Eigenvector centrality ...... 62

7.3 Demonstrating the convergence of metrics between random and real networks ...... 65

xi

7.4 Time series analysis ...... 67

7.4.1 Spearman rank correlation ...... 67

7.4.2 Mann-Kendall ...... 70

7.4.3 Theil-Sen ...... 72

7.5 Network formation via the Iterated Prisoners’ Dilemma ...... 75

7.6 Network formation using a personality compatibility table ...... 80

7.7 Discussion of results from the Netlogo experiments ...... 83

Chapter 8 Stochastic Block Modeling ...... 85

Chapter 9 Randomized Methods For Assignment Search Algorithms ...... 89

9.1 The personality-based network synthesis algorithm ...... 89

9.2 Monte Carlo Search for an optimum personality assignment ...... 92

9.3 Genetic algorithm search for an optimum personality assignment ...... 93

9.4 R code libraries ...... 96

9.5 Stochastic block model preference matrices based on personality compatibility ...... 97

9.6 Implementation of the MC and GA based search algorithms ...... 98

9.7 The stochastic block model function ...... 101

9.8 Execution of the personality type assignment search algorithms ...... 101

9.9 Results from the randomized methods ...... 102

9.9.1 Realism of the networks produced with randomized methods ...... 102

9.9.2 Computational efficiency ...... 108

9.10 Discussion of the randomized methods ...... 112

xii

Chapter 10 Heuristic Methods For Assignment Search Algorithms ...... 113

10.1 Evaluating network realism ...... 113

10.2 Synthesis process overview ...... 114

10.3 Synthesizing networks from an assignment of personality types...... 116

10.4 Probability search algorithm ...... 122

10.5 Compatibility-degree matching algorithm ...... 127

10.6 Configuration Model algorithm as a basis of comparison ...... 129

10.7 Implementation and execution ...... 130

10.7.1 Implementation of the algorithms...... 130

10.7.2 Execution of the algorithms ...... 131

10.8 Results from the heuristic methods ...... 131

10.9 Discussion of the heuristic methods ...... 138

Chapter 11 Iterated Prisoners’ Dilemma Generated Compatibility Table ...... 140

11.1 Iterated Prisoners' Dilemma strategy tournament ...... 140

11.2 Simulating an ecosystem to produce an empirical distribution ...... 145

11.3 Using the IPD-generated personality compatibility table ...... 147

11.4 Discussion of the IPD-based results ...... 150

Chapter 12 Synthesizing Evolving Social Networks ...... 152

12.1 Exemplar evolving network data sets ...... 152

12.2 Pseudocode for synthesizing an evolving network ...... 154

12.2.1 Equations for calculating parameters of sequence set 핋 ...... 155

12.2.2 Pseudocode for the main function ...... 156

12.2.3 Pseudocode for the subroutines ...... 157

xiii

12.3 Visualizing social trajectories and networks ...... 158

12.4 Method for comparing real-world and synthesized social networks ...... 161

12.4.1 Results of the first Turing test ...... 163

12.5 Results from the second Turing test ...... 164

Chapter 13 Results, Conclusions, And Future Work ...... 166

13.1 Results ...... 166

13.1.1 Answer to research question 1 ...... 167

13.1.1.1 Answer to research question 1.1 ...... 168

13.1.1.2 Answer to research question 1.2 ...... 169

13.1.1.3 Answer to research question 1.3 ...... 170

13.1.2 Answer to research question 2 ...... 171

13.1.2.1 Answer to research question 2.1 ...... 171

13.1.2.2 Answer to research question 2.2 ...... 171

13.1.3 Answer to research question 3 ...... 172

13.1.3.1 Answer to question 3.1 ...... 173

13.1.3.2 Answer to research question 3.2 ...... 173

13.1.3.3 Answer to research question 3.3 ...... 173

13.1.4 Answer to research question 4 ...... 173

13.1.4.1 Answer to research question 4.1 ...... 174

13.1.4.2 Answer to research question 4.2 ...... 174

13.1.4.3 Answer to research question 4.3 ...... 175

xiv

13.2 Conclusions ...... 175

13.3 Future Work ...... 178

Appendix A Compatibility tables ...... 183

Appendix B Results from Agent-Based Modeling with Monte Carlo search ...... 185

Appendix C Custom compatibility tables for Stochastic Block Modeling ...... 192

Appendix D Randomized methods realism comparison tables ...... 206

Appendix E Heuristic methods realism comparison tables ...... 220

Appendix F Heuristic results from using IPD generated compatibility table ...... 234

Appendix G Network diagrams for the first Turing test ...... 241

Appendix H Network diagrams for the second Turing test ...... 246

Appendix I Institutional Review Board approval for the Turing test ...... 254

References ...... 257

xv

LIST OF TABLES

TABLE PAGE

Table 2.1 Classes of social networks ...... 8

Table 2.2 Social network metrics used in this research...... 14

Table 2.3 Personality type frequencies in the U. S. population ...... 16

Table 2.4 Prisoners’ Dilemma payoff matrix ...... 19

Table 5.1 Real-world social network data sets used in this research...... 47

Table 6.1 Inferred MBTI personality types’ attitudes toward group compatibility factors. ... 55

Table 7.1 Summation of the betweenness centrality for node 1 ...... 61

Table 7.2 Manually calculated closeness metric...... 61

Table 7.3 Comparison of eigenvector centrality ...... 64

Table 7.4 Geodesics between all node pairs ...... 64

Table 7.5 Determining the Spearman rank correlation ...... 68

Table 7.6 Summary of Spearman rank correlation analysis ...... 69

Table 7.7 Results of the Mann-Kendall monotonic trend test ...... 72

Table 7.8 Summary table for ABM & MC vs. random network generation ...... 83

Table 9.1 Realism results for the Sampson Monastery social network ...... 105

Table 9.2 Realism results summary...... 106

Table 9.3 Social network metric evaluations performed...... 109

Table 10.1 Realism results for the Bernard & Killworth Technical network...... 134

Table 10.2 Realism results summary...... 135

Table 11.1 Empirical distribution of IPD Strategies ...... 146

Table 11.2 Thurman Office network results using an IPD strategy compatiblity table ...... 149 xvi

Table 11.3 Score table for all the exemplar networks ...... 150

Table 12.1 Longitudinal data sets from the SIENA project website ...... 152

Table 12.2 Results from the first Turing Test ...... 162

Table 12.3 Analytical results from the second Turing test ...... 165

Table A.1 A Myers Briggs Type Indicator personality compatibility table ...... 183

Table A.2 A Myers Briggs Type Indicator compatibility table ...... 183

Table A.3 Iterated Prisoners’ Dilemma strategy compatibility table ...... 184

Table B.1 Robins Australian Bank ABM MC vs. Random ...... 185

Table B.2 Roethlisberger & Dickson Wiring Room ABM MC vs. Random ...... 185

Table B.3 Thurman Office ABM MC vs. Random ...... 186

Table B.4 Sampson Monastery ABM MC vs. Random...... 186

Table B.5 Krackhardt Managers ABM MC vs. Random...... 187

Table B.6 Krackhardt Office ABM MC vs. Random ...... 187

Table B.7 Schwimmer Taro Exchange ABM MC vs. Random ...... 188

Table B.8 Webster Accounting Firm ABM MC vs. Random...... 188

Table B.9 Zachary Karate Club ABM MC vs. Random ...... 189

Table B.10 Bernard & Killworth Technical ABM MC vs. Random ...... 189

Table B.11 Bernard & Killworth Office ABM MC vs. Random...... 190

Table B.12 Krebs IT Department (Advice) ABM MC vs. Random ...... 190

Table B.13 Lazega Law Firm ABM MC vs. Random ...... 191

Table C.1 Robins Australian Bank preference matrix for SBM ...... 192

Table C.2 Roethlisberger & Dickson Bank Wiring Room preference matrix for SBM ...... 193

Table C.3 Thurman Office preference matrix for SBM ...... 194

xvii

Table C.4 Sampson Monastery preference matrix for SBM...... 195

Table C.5 Krackhardt Office preference matrix for SBM ...... 196

Table C.6 Krackhardt High-Tech Managers preference matrix for SBM ...... 197

Table C.7 Schwimmer Taro Exchange preference matrix for SBM ...... 198

Table C.8 Webster Accounting Firm preference matrix for SBM...... 199

Table C.9 Zachary Karate Club preference matrix for SBM ...... 200

Table C.10 Bernard & Killworth Technical preference matrix ...... 201

Table C.11 Bernard & Killworth Office network preference matrix for SBM ...... 202

Table C.12 Krebs IT Department Advice network preference matrix for SBM ...... 203

Table C.13 Krebs IT Department Business network preference matrix for SBM ...... 204

Table C.14 Lazega Law Firm preference matrix for SBM ...... 205

Table D.1 Randomized methods results for the Robins Australian Bank social network .... 206

Table D.2 Randomized methods results for Roethlisberger & Dickson network ...... 207

Table D.3 Randomized methods results for Thurman Office social network ...... 208

Table D.4 Randomized methods results for Sampson Monastery social network ...... 209

Table D.5 Randomized methods results for the Krackhardt Office CSS network ...... 210

Table D.6 Randomized methods results for the Krackhardt High-Tech Managers network 211

Table D.7 Randomized methods results for the Schwimmer Taro Exchange network ...... 212

Table D.8 Randomized methods results for the Webster Accounting Firm network ...... 213

Table D.9 Randomized methods results for the Zachary Karate Club network ...... 214

Table D.10 Randomized methods results for the Bernard & Killworth Technical network 215

Table D.11 Randomized methods results for the Bernard & Killworth Office network ...... 216

Table D.12 Randomized methods results for Krebs IT Department Advice network ...... 217

xviii

Table D.13 Randomized methods results for Krebs IT Department Business network ...... 218

Table D.14 Randomized methods results for the Lazega Law Firm network ...... 219

Table E.1 Heuristic methods results for the Robins Australian Bank network...... 220

Table E.2 Heuristic methods results for Roethlisberger & Dickson network...... 221

Table E.3 Heuristic methods results for Thurman Office social network...... 222

Table E.4 Heuristic methods results for Sampson Monastery network...... 223

Table E.5 Heuristic methods results for the Krackhardt Office CSS network...... 224

Table E.6 Heuristic methods results for the Krackhardt High-Tech Managers network...... 225

Table E.7 Heuristic methods results for the Schwimmer Taro Exchange network...... 226

Table E.8 Heuristic methods results for the Webster Accounting Firm network...... 227

Table E.9 Heuristic methods results for the Zachary Karate Club network...... 228

Table E.10 Heuristic methods results for the Bernard & Killworth Technical network...... 229

Table E.11 Heuristic methods results for the Bernard & Killworth Office network...... 230

Table E.12 Heuristic methods results for Krebs IT Department Advice network...... 231

Table E.13 Heuristic methods results for Krebs IT Department Business network...... 232

Table E.14 Heuristic methods results for the Lazega Law Firm network...... 233

Table F.1 Heuristic with IPD for Robins Australian Bank ...... 234

Table F.2 Heuristic with IPD results for Roethlisberger & Dickson Wiring Room ...... 234

Table F.3 Heuristic with IPD results for Thurman Office ...... 235

Table F.4 Heuristic with IPD results for Sampson Monastery ...... 235

Table F.5 Heuristic with IPD results for Krackhardt Office CSS...... 236

Table F.6 Heuristic with IPD results for Krackhardt High-Tech Managers ...... 236

Table F.7 Heuristic with IPD results for Schwimmer Taro Exchange ...... 237

xix

Table F.8 Heuristic with IPD results for Webster Accounting Firm ...... 237

Table F.9 Heuristic with IPD results for Zachary Karate Club ...... 238

Table F.10 Heuristic with IPD results for Bernard & Killworth Technical...... 238

Table F.11 Heuristic with IPD results for Bernard & Killworth Office ...... 239

Table F.12 Heuristic with IPD results for Krebs IT Department (Advice) ...... 239

Table F.13 Heuristic with IPD results for Krebs Fortune 500 IT Department (Business) ... 240

Table F.14 Heuristic with IPD results for Lazega Law Firm ...... 240

xx

LIST OF FIGURES FIGURE PAGE Figure 2.1 An example graph with the “v” omitted from the vertex labels...... 5

Figure 2.2 Friendship within a law firm ...... 7

Figure 2.3 Correlations between MBTI and the FFM ...... 17

Figure 3.1 Network diagram of the research ...... 26

Figure 7.1 Example network for validating the network extension ...... 59

Figure 7.2 Comparison of eigenvector centrality results from Gephi and Netlogo ...... 63

Figure 7.3 Closed loop convergence between metrics of synthetic and real networks ...... 66

Figure 7.4 Excel spreadsheets used in calculation of the Mann-Kendall test statistic ...... 71

Figure 7.5 Excel spreadsheet tables of Theil Sen slopes ...... 73

Figure 7.6 Thiel-Sen slopes ...... 74

Figure 7.7 Screenshot of the personality-based IPD strategy selection NetLogo model ...... 79

Figure 8.1 Screenshot of the preference matrix generator ...... 87

Figure 9.1. Pseudocode for random social network generation algorithm ...... 91

Figure 9.2 Pseudocode for Monte Carlo social network generation algorithm ...... 93

Figure 9.3 Pseudocode for Genetic social network generation algorithm...... 96

Figure 9.4 Implementation sequence diagrams...... 100

Figure 9.5 Comparison of the real-world and synthetic social networks...... 108

Figure 10.1 Process overview ...... 115

Figure 10.2 Flowchart for GNAC ...... 117

Figure 10.3 Pseudocode for the addlink function ...... 120

Figure 10.4 Pseudocode for the GNAC algorithm...... 121

Figure 10.5 Pseudocode for the vertex probability function ...... 125

xxi

Figure 10.6 Pseudocode for the graph formation probability function ...... 125

Figure 10.7 Pseudocode for the PS algorithm ...... 126

Figure 10.8 Pseudocode for the CDM algorithm ...... 128

Figure 10.9 Comparison of real-world social network with generated networks ...... 137

Figure 11.1 Pseudocode for generating the IPD strategy compatibility table ...... 144

Figure 11.2 Moran process-based evolutionary IPD tournament ...... 146

Figure 12.1 Pseudocode for the evolving social network generation algorithm...... 156

Figure 12.2 Pseudocode for a function to select a node to link ...... 157

Figure 12.3 Pseudocode for a function to select a link to remove ...... 158

Figure 12.4 Visualization of a social trajectory ...... 159

Figure 12.5 Pair-wise plots of a 3D social trajectory...... 160

Figure G.1West Scotland teenage girls ...... 241

Figure G.2 Dutch school class ...... 242

Figure G.3 Sociology class Winter 1996 ...... 243

Figure G.4 Sociology cohort ...... 244

Figure G.5 University freshman ...... 245

Figure H.1 Human generated evolving network of teenage girls ...... 246

Figure H.2 Synthesized evolving network of teenage girls ...... 247

Figure H.3 Synthesized evolving network in a Dutch school class ...... 248

Figure H.4 Human generated evolving network in a Dutch school class ...... 249

Figure H.5 Synthesized evolving network of a sociology freshmen students ...... 250

Figure H.6 Human generated evolving network of sociology freshmen students ...... 251

Figure H.7 Human generated evolving network of university freshmen...... 252

Figure H.8 Synthesized evolving network of university freshmen...... 253 xxii

LIST OF ABBREVIATIONS

ABM Agent-Based Modeling CDM Compatibility-Degree Matching CRAN Comprehensive R Archive Network DiSC Dominance, Influence, Steadiness, Conscientiousness E Extraversion ERGM Exponential Random Graph Model F Feeling GA Genetic Algorithm I Introversion IPD Iterated Prisoners’ Dilemma IT Information Technology J Judging, in the context of MBTI MBTI Myers Briggs Type Indicator MC Monte Carlo MV Metric Values N Intuition, in the context of MBTI NASA National Aeronautics and Space Administration OCEAN Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism P Perceiving, in the context of MBTI PS Probability Search ReCon Replication of complex networks RN Random Network S Sensing, in the context of MBTI SBM Stochastic Block Model SN Synthetic Network SNAP Stanford Network Analysis Platform T Thinking, in the context of MBTI

xxiii

CHAPTER 1

INTRODUCTION

Social network analysis is the study of social structures and relationships. The essential analytic tools of social network analysis are social networks. Built from the theoretical foundation of graph theory, social networks are formal mathematical structures, consisting in their simplest form of nodes corresponding to actors or agents, which may be individual or identifiable groups of people, and links between pairs of nodes corresponding to relations between them, where relations may be any type of contact or connection between the actors or agents that the nodes represent (Knoke 2008)

(Scott 2000).

Interpersonal tensions adversely affect crew and team performance; this could be dangerous in isolated, confined, and extreme environments, such as space exploration missions and settlements. Compatible teams have a better chance of forming a supportive social network that enhances their chances of success and survival. By studying and simulating how personality compatibility affects the formation of relationships or links within social networks, we can apply social network analysis to team selection. One useful approach is to algorithmically generate synthetic social

1 networks, based on personality type compatibility. These networks could support the analysis and design of future space exploration crews.

The study and use of social networks often begin with and depend on empirical social networks. Empirical social networks are obtained directly from the real-world group or organization they represent, by the process of investigators identifying the people in the group or organization of interest and determining if the relationships to be represented in the network exist between them. Empirical social networks obtained by observation are valuable. However, they have issues. Empirical social networks can be difficult and expensive to obtain, especially if the process for doing so is manual, and there are consequently relatively few of them and they are less comprehensive in their coverage of the range of possible social networks. They may not be available in the number of nodes or links that an investigator needs. Finally, they may be vulnerable to the malicious recovery of private information using de-anonymization methods

(Narayanan 2009).

Synthetic social networks, generated algorithmically rather than obtained empirically, can mitigate these issues. With effective social network synthesis methods, a user could produce a set of synthetic social networks that individually are non-identical to empirical networks but collectively have the specific desired structural characteristics, including size. A set of multiple social networks could be used to systematically test a network analysis or visualization tool (Staudt 2017) and would allow the deliberate introduction of deviations from the defining characteristics of the class of social networks for testing purposes (Tsevetovat 2005). In addition, synthesizing social networks is an approach to anonymization, which may protect the privacy of the individuals represented

2 in an empirical social network (Narayanan 2009). Researchers may use the synthetic social networks without privacy concerns and freely share them with other researchers to allow repeatable experiments (Zhou 2008).

However, an arbitrary or random graph is unlikely to be suitable as a synthetic social network for any application. To be useful, a synthetic social network must

“approximate certain qualities or parameters found in the empirical data”

(Tsevetovat 2005). In other words, a useful synthetic social network must possess the structural characteristics expected for the class of social networks it is intended to exemplify, without simply being a copy of one of those networks. For brevity, a synthetic social network with the structural characteristics of a desired class of social networks, perhaps as measured by suitable quantitative network metrics, will hereinafter be described as realistic.

Existing network generation algorithms focus on replicating the structural characteristics of an exemplar network. The algorithms that are described in this dissertation apply a different approach. They form links among nodes based upon personality compatibility (where the nodes represent people). Studying social networks based on personality compatibility is of significant interest to organizations that must organize teams of persons to interact and work effectively, especially in challenging circumstances.

The algorithms that are described in this dissertation assign a personality type to each node which can be used as the basis for stochastically generating links between the nodes. Link generation between a pair of nodes depends on the relative compatibility of

3 the personalities assigned to the two nodes. Personality type compatibilities are encoded in a personality compatibility table that is an input to the generation process. Given a personality type assignment to the nodes and stochastic link generation, several non- identical social networks can be generated from a single assignment. When an assignment produces networks with a realistic set of metrics on average, it is suitable for generating as many networks as needed. Algorithms developed in this research have been shown to generate synthetic social networks that are significantly more realistic, in terms of their structural properties, as measured by a range of standard graph metrics, than random graphs generated using the classic Erdős-Rényi G(n, p) algorithm.

The generation process has been demonstrated to work with multiple personality compatibility tables and is, therefore, adaptable to different personality type models.

This dissertation is structured as follows: Chapter 2 provides background information about graph theory and social network analysis. Chapter 3 explains the motivation and research questions. Chapter 4 presents a literature survey of topics discussed in this dissertation. Chapter 5 identifies real-world exemplar networks that were used in this research. Chapter 6 explains the method for producing a personality compatibility table. Chapter 7 describes research related to agent-based modeling.

Chapter 8 explains the method for producing preference matrices for stochastic block modeling. Chapter 9 presents randomized methods and results. Chapter 10 presents heuristic methods and results. Chapter 11 describes a method for producing a compatibility table using Iterated Prisoners’ Dilemma strategies and results from that research. Chapter 12 describes a method to simulate evolving networks. Finally, Chapter

13 explains results, presents conclusions, and identifies potential future work.

4

CHAPTER 2

BACKGROUND INFORMATION

This section provides background information on graph theory and social network analysis. It explains the metrics that were used to measure networks’ structural similarity.

This section also provides an overview of the Iterated Prisoners’ Dilemma and asserts that attributes of associated strategies are similar to personality characteristics.

2.1 Graph theory

Mathematical graph theory is the formal study of structure and relationships. A graph G consists of a set of vertices V and a set of edges E; the elements of E are sets of exactly two distinct elements of V. Although the mathematics of graph theory normally do not mention the visual representation of a graph (the exception is the property of planarity), graphs are conventionally drawn with the vertices in V represented as dots and the edges in E represented as line segments connecting the vertices that are the common elements of an edge.

Figure 2.1 An example graph with the “v” omitted from the vertex labels.

5

Figure 2.1 is an example of a conventional drawing of an arbitrary graph, where vertex set V = {v1, v2, v3, v4, v5, v6, v7, v8} and edge set E = {{v1, v2}, {v1, v3}, {v1, v8},

{v2, v4}, {v3, v4}, {v3, v8}, {v4, v5}, {v4, v8}, {v5, v7}, {v6, v7}, {v6, v8}}. From this deceptively simple notion, an important and sophisticated theory with a vast literature has been developed (Trudeau 1993). Graphs exist in diverse variations, including directed graphs (the elements of E are ordered pairs, giving each edge a direction), weighted graphs (the values are assigned to the vertices and edges), hypergraphs (the edges need not have exactly two elements), and many others. The study of graphs has included their structural properties (Bang-Jensen 2001), algorithms to process them (Even 1979)

(Golumbic 1980), and applications (Kropatsch 2007).

2.2 Network theory and social network analysis

Graphs, as mathematical objects, are rather abstract. Graphs represent structure and relationship in their purest forms, uncluttered by semantics. However, identifying objects or concepts with the vertices and edges and assigning meaning to the relationships they represent produces numerous applications of graph theory. One of the fastest growing and increasingly important application areas is the study of social networks.

A parallel but different terminology from that used in graph theory has evolved in network theory. Graph theory’s graph, vertex, and edge are referred to as network, node, and link respectively, in network theory. I will attempt to use the terms in context. Graph theory terms will be used when discussing graph theory concepts and network theory terms will be used when referring to a social network

6

Figure 2.2 Friendship within a law firm (Lazega 2001).

The details vary by specific application, but in their simplest form, social network nodes may correspond to people in a group, organization, or population of interest. The presence of a link connecting two nodes represents some relationship, such as kinship, friendship, collaboration, or information exchange, between the people corresponding to the nodes that the link connects. For example, social networks are used to represent social distance in (Li 2017) and information spreading in (Bouanan 2018). The study of the structural properties of these social networks can provide insight into the group, organization, or population that it represents. As an example, Figure 2.2 shows a real- world social network found to exist within a corporate law firm in the northeastern

United States (Lazega 2001).

2.3 Classes of social networks

Not all random graphs are social networks, because an arbitrary random graph generally will not have the desired structural properties of a social network. Moreover,

7 not all social networks have the same structural characteristics and properties. Social networks that represent communications in terrorist organizations might be expected to differ in structure and activity from those that represent collaborations in a scientific community. A set of social networks that represents instances of some well-defined category of group of organization will be termed a class. Several of the examples in

Table 2.1 are based on (Easley 2010). The examples in Table 2.1 are all social networks, but intuitively they are not the same in terms of structure. In the last example in Table

2.1, the nodes of the social network correspond to organizations, not individual people.

That example is included to draw attention to this distinction. This work focuses on social networks where the nodes correspond to people. The potentially different structure of an organizational-node network, as compared to a people-node network, will be of interest later.

Table 2.1 Classes of social networks (Easley 2010).

Group or organization Nodes Possible link(s) Communications Terrorist organization People Recruitment Romantic relationship High school student body People Athletic teammates Friendship Social club People Sponsorship Exchange of email Employees of a corporation People Supervisory authority Relatedness Regional or national populace People Transmission of infection Interbank loans Financial system Banks Currency exchanges

8

A social network may be an element of one class, but not of another, by virtue of its structural properties. Therefore, two operations are of interest: (1) Membership; given a social network, how can it be tested for membership in a class of social networks?

(2) Generation; given a description or example of a particular class of social networks, how can a synthetic social network that is a member of that class be generated? This work focuses on the second operation.

2.4 Data structures and attributes of social networks

For computational purposes, social networks may be stored using a data structure known as an adjacency matrix (Gersting 2014). In the simplest form of adjacency matrix, denoted A, there is a column and a row for each node. Individual values in an adjacency matrix, denoted A(i, j) for row i and column j, indicate whether there exists a link between nodes i and j, or equivalently, between the people or entities those nodes represent.1

Two important attributes of social networks must be defined. First, networks may be weighted or unweighted. In an unweighted network, an adjacency matrix value for

A(i, j) of 0 or 1 indicates the absence or presence respectively, of a link between nodes i and j. In a weighted network, adjacency matrix values for A(i, j) may be greater than 1 and are considered to be a weight, i.e., a numerical value that quantifies some application-specific attribute of the relationship between nodes i and j within the social network that the link represents. Adjacency matrices used in the work described in this

1 More sophisticated data structures for adjacency matrices, such as the adjacency list (Gersting 2014), have been developed, but they are not of concern in this work.

9 dissertation are processed so that they are exclusively unweighted networks. Second, social networks may also be symmetric or asymmetric. In a symmetric social network, the relationships represented by the links are assumed to be mutual (two-way) and equivalent. For example, if the person represented by node i considers the person represented by node j to be a friend, then person j is assumed to feel the same way about person i, and a single undirected link between nodes i and j represents that mutual relationship. If a sociologist observes the interactions among a group of people, in order to obtain a social network representing those interactions, he or she may assume that the interactions are mutual and construct a symmetric adjacency matrix. In the adjacency matrix of a symmetric social network, A(i, j) = A(j, i) for all values 1 ≤ i, j ≤ n, where n is the number of nodes. An asymmetric social network can represent relationships that may be one-way or non-equivalent. Examples of such relationships include supervisor- subordinate, which is one-way by definition, or romantic relationships, where one person’s affections are not returned by the other person. In the adjacency matrix of an asymmetric social network, it is possible that A(i, j) ≠ A(j, i), e.g., because person i supervises person j and person j does not supervise person i. In the work reported here, we are solely concerned with mutual relationships, and, therefore, use symmetric networks exclusively.

2.5 Social network metrics

Social network analysis involves the calculation of metrics at the node and network level. In the context of social network analysis, metrics are numerical measurements of a network’s structure. Metrics at the node level include degree centrality, betweenness, and closeness. Metrics at the network level include diameter,

10 , density, and clustering coefficients. Analyzing the node metrics can produce a ranked list of individual capabilities to control the flow of information, establish new connections, or to influence people within the network. Analyzing the network level metrics can provide insight into the potential rate of idea diffusion and the possibility of cross pollination of ideas among various organizations. Identifying the information flow controllers, connection brokers, and change influencers can enhance the tactical decisions for task and responsibility assignments. Characterizing the potential idea diffusion rate and cross pollination of the network can enhance strategic decisions in the organizational structure and by conducting activities that may increase connectivity.

A wide range of different metrics are available. Graph theory provides several abstract metrics, sometimes known as graph invariants, which quantify some aspect of a network’s structure, without attaching any specific semantic meaning to the metric’s values. Examples include maximal degree, girth, or vertex chromatic number (Bang-

Jensen 2001). Social network analysis has defined additional metrics that are intended to measure something about the network that has semantic meaning in the context of the social application of the network. These metrics include centrality (Scott 2000), reciprocity (Newman 2010), and clustering coefficient (Easley 2010). Finally, overarching empirically-derived structural properties common to categories of networks, such as scale-free and cellular, may apply to social networks (Tsevetovat 2005). All are intended to measure, in an objective and quantitative way, some aspect of a network’s structure that may be useful for an application. The intent is that realistic synthetic social networks would have metric values similar to those of the real-world social networks they were intended to mimic, without having identical structures.

11

Many network metrics have been defined, and clearly not all could be used in this work. From those available, 20 were carefully selected to assess the similarity of real- world and synthetic social networks in this work. That selection was made, in part, based on the motivation of studying the social networks of future space colonies. Therefore, metrics that characterize information flow, integration of individuals into the network, level of camaraderie indicated by clustering, and the level of influence among the individuals are of interest. Because this work used only undirected symmetric networks, only metrics suitable for those networks were considered.

Selected metrics include both standard metrics of graphs’ structural characteristics

(nodes, links, components, degree, radius, and eccentricity) and metrics considered to be relevant to social network structure, per (Rapoport 1957) (Freeman 1978) and

(Bonacich 2007). In the former category, the number of nodes, links, and components, the network’s radius and eccentricity, and the nodes’ degrees, fundamentally characterize a network’s structure. In the latter category, metrics found useful to study team structure and interaction were of special interest. Global clustering coefficient, average clustering coefficient, Gini coefficient, and number of communities provide some insight into the tight-knit groups and the distribution of nodes among the communities. Average betweenness serves as a basis of comparison for maximum betweenness to identify the information brokers or potential bottlenecks in the network. Likewise, average closeness serves as a basis of comparison for minimum closeness to identify the nodes that are at the heart of communities. Mean path length, network radius, average eccentricity, and network diameter are geodesic distances that can be used to estimate the rate of information flow across a network. Eigencentrality indicates the level of influence that a

12 node may exert on other nodes. In some similar applications, clustering, path length, betweenness, closeness, and diameter were used in a study of information sharing and collaboration in small groups (Manso 2010), betweenness was used in a study of interaction in programming teams (Gloor 2011), density and diameter were used in a study of authorship collaboration (Gajewar 2012), eigencentrality was used in a study of leadership in social groups (Bullington 2016), and Gini coefficients have been used as a measure of inequality of participation in digital health social networks (van Mierlo 2016).

Table 2.2 identifies and defines the metrics used in this research. The intent is that realistic synthetic social networks would have metric values similar to those of the real-world social networks they were intended to mimic, without having identical structures.

13

Table 2.2 Social network metrics used in this research.

Metric Definition Nodes Number of nodes in the network; here denoted as n. Links Number of links in the network; here denoted as m. Number of disjoint sets of connected nodes in a network. Components For a connected network, the value of this metric is 1. Number of links in the network divided by the number of Network density possible links n · (n – 1) / 2; here denoted as p. Average degree Average, or mean, of the nodes’ degrees. Standard deviation degree Standard deviation of the nodes’ degrees. Global clustering Ratio of connected triangles of nodes to the total number of node coefficient triples connected by at least two links. Average of the nodes’ local clustering coefficients; Average clustering the latter is the ratio of actual links to neighbors coefficient to possible links to neighbors for a given node. Number of communities Number of clusters in the network. Cluster Gini coefficient Inequality of distribution of nodes among communities. Mean of the number of links in the shortest path between Mean path length each pair of nodes. Mean of the nodes’ betweenness centrality values, which is the Average betweenness number of shortest paths between pairs of nodes that pass through a node. Maximum betweenness Maximum of the nodes’ betweenness centrality values. Mean of the nodes’ closeness centrality values, which is the sum Average closeness of the path lengths between the node and all other nodes. Minimum closeness Minimum of the nodes’ closeness centrality values. Mean of the nodes’ eigencentrality (also known as eigenvector Average eigencentrality centrality); the latter is a measure of the number of links each of a node’s neighbors have. Minimum eigencentrality Minimum of the nodes’ eigencentrality. Minimum of the nodes’ eccentricities; the latter is the maximum Network radius length of the shortest paths from a node to all other nodes. Average eccentricity Mean of the nodes’ eccentricities. Network diameter Maximum of the nodes’ eccentricities.

14

2.6 Personality models

In 1923, Jung described distinct human personality types, based upon his clinical observations (Jung 1971). Building upon Jung’s ideas, in 1944 Myers and Briggs developed a structured approach to identifying personality types and published a manual describing a personality typing process that later became known as the Myers Briggs

Type Indicator (MBTI) (Smathers 2003). In the MBTI personality typing scheme, each person is categorized on four “dichotomies” or dimensions, held to correspond to different aspects of personality. Two “preferences” or values are possible on each dichotomy, yielding a total of 16 different personality types. The four dimensions and their two preferences are:

 Attitude (inward or outward focus); Extraversion (E) or Introversion (I).

 Perceiving (information gathering) function; Sensing (S) or Intuition (N).

 Judging (deciding) function; Feeling (F) or Thinking (T).

 Lifestyle preference; Perceiving (P) or Judging (J).

A detailed description of the 16 Myers-Briggs types is beyond the scope of this chapter; for details see (Keirsey 1998). The important ideas here are that each person may be categorized as having one of the 16 types and that the likely compatibility of two people may be estimated from their personality types.

Table 2.3 (a) shows the estimated proportion of the United States population who would be categorized into each preference, with each dimension considered separately

(Hammer 1996). Table 2.3 (b) shows the result of calculating a proportion for each personality type, based on the dimensions’ proportions.

15

Table 2.3 Personality type frequencies in the U. S. population (Hammer 1996).

E 0.463 I 0.537 ENTJ 0.045 ESTJ 0.097 INTJ 0.053 ISTJ 0.112

N 0.319 S 0.681 ENTP 0.033 ESTP 0.070 INTP 0.038 ISTP 0.081

T 0.529 F 0.471 ENFJ 0.040 ESFJ 0.086 INFJ 0.047 ISFJ 0.100

J 0.581 P 0.419 ENFP 0.029 ESFP 0.062 INFP 0.034 ISFP 0.072

(a) (b)

While this research used the empirical distribution specified in Table 2.3 (a), it

should be noted that different professions tend to attract certain personality types.

Nordvig (1994) identified frequencies of MBTI personality types for a variety of

professions in Norway. Critics of the MBTI personality typing scheme point to apparent

problems. Metzner et al. suggested that the “rigid” dichotomies of the Jungian

personality types constitute a “conceptual straight jacket” and proposed a reformulation

of the dichotomies, as pairs of primary and inferior psychological functions (Metzner

1981). McCrae and Costa also commented that the MBTI lacks a neuroticism factor,

perhaps because emotional instability was not part of Jung's type definitions, and it

appears that Myers and Briggs believed that each personality type was positive. The lack

of a negative factor may make the interpretation of MBTI results easier to accept.

However, it could also allow the omission of information that would be useful to

employers, coworkers, counselors, and individuals (McCrae 1989).

Nonetheless, the MBTI model is widely used in industry in the United States,

including 89 of the Fortune 100 companies (Grant 2013), for applications that include:

increasing self-awareness to support decision analysis (Malik 2014) (Weiler 2017),

improving team performance by explaining communication styles (Choo 2014),

16 identifying correlations between performance and personalities (Kiss 2014) (Furnham

2015a) (Furnham 2015b), and identifying correlations between professions and personalities (McCaulley 1977) (Freeman 2009) (Jafrani 2017).

Other personality models exist. Arguably among the best known is the Five

Factor Model (FFM) or OCEAN model. After analyzing correlations among 35 personality traits, Tupes and Christal identified five personality factors: Surgency

(Extraversion), Agreeableness, Dependability (Conscientiousness), Emotional Stability

(versus Neuroticism), and Culture (Openness) (Tupes 1992) (John 1999). Goldberg referred to these factors as “The Big Five” (Goldberg 1990). McCrae and Costa interpreted the factor Culture as Openness to experience (McCrae 1987). Ruston and

Irwing rearranged the first letters of the factors to form the mnemonic OCEAN

(Rushton 2008).

Figure 2.3 Correlations between MBTI and the FFM

17

The NEO-PI, developed by McCrae and Costa, is a widely used personality instrument for characterizing personalities with the FFM traits. Using the NEO-PI and

MBTI personality instruments, McCrae and Costa (1989), McDonald et al. (1994), and

Furnham et al. (2003) conducted studies that identify direct and inverse correlations between the FFM and the MBTI traits. Figure 2.3 presents the correlations found by

McDonald et al. These correlations enable mapping FFM to MBTI.

Initially, this research applied MBTI personality model because of its widespread use in practical applications. However, it will be eventually shown that the social network synthesis algorithms to be described will work with any personality model for which the relative compatibility between personality types can be quantified.

2.7 Iterated Prisoners’ Dilemma

A typical scene in a police drama depicts two suspects in separate interrogation rooms. The suspects have agreed to remain silent in the hope of avoiding any jail time at all. Each of them is offered a deal; the first person to confess will receive less jail time than their co-conspirator, who will receive a longer sentence. Each suspect must decide whether to cooperate with their partner or defect and take the deal. In game theory, this scenario is called the Iterated Prisoners’ Dilemma (IPD), where two players decide whether to cooperate or defect (Axelrod 1984). Table 2.4 presents a payoff matrix, where players receive points, based upon their decisions. Possible decisions are listed as row and column headings. The first value in the intersecting cell, is the number of points that the row player receives, and the second value is the points that the column player

18

Table 2.4 Prisoners’ Dilemma payoff matrix

B

Cooperate Defect A = 2 A = 0 Cooperate A B = 2 B = 3 A = 3 A = 1 Defect B = 0 B = 1 receives. If both players decide to cooperate, they both receive a reward of two points. If one player cooperates, while the other player defects, then the cooperating player receives no points and the defecting player receives a tempting three points. If both players defect, then they receive a punishment of one point.

Axelrod conducted two IPD tournaments that pitted different strategies for deciding whether to cooperate or defect. In an IPD game, there are several rounds, where the players can make their next decision based upon previous decisions made by the other player. There were 14 entries to the first tournament, where each entry was a computer program written either in BASIC or FORTRAN. Entries were submitted by psychologists, economists, political scientists, mathematicians, and sociologists.

Conducted as a round robin, each entry competed with every other entry, as well as with a copy of itself and against an entry named Random. As the name implies, Random made decisions randomly. The winning entry, known as Tit For Tat (TFT), was submitted by Professor Anatol Rapoport from the University of Toronto. In the TFT strategy, a player’s next decision is the same as the opposing players last decision

(Axelrod 1984).

19

Results of the first tournament revealed that nice and forgiving (Axelrod’s terms) strategies performed better than the strategies that did not possess these traits. According to Axelrod, a nice strategy starts with decisions to cooperate. A forgiving strategy may have a rule to defect after the opponent defects, but it also has a rule to return to the decision to cooperate, if the opponent cooperates after a defection. Lessons and details about TFT were published before the second tournament. In the second tournament, there were 62 entries; 39 of them were nice and somewhat forgiving. Several entries were designed to take advantage of nice and forgiving strategies. Those entries did not score better than the nice and forgiving strategies because they drove the contestants to mutual defection (Axelrod 1984).

Additional research by Axelrod considered noise, where the tournament software occasionally implements the wrong decision, e.g., changes a player’s decision to cooperate into a decision to defect or vice versa. Traits that enable a strategy to cope with noise include generosity and contrition. A generous strategy allows some percentage of an opponent’s decisions to defect to go unpunished. A contrite strategy does not react to an opponent’s defection, if it determines that its own previous decision to cooperate had been implemented as a defection, due to noise (Axelrod 1997).

It is important to notice that these adjectives, nice, forgiving, generous, and contrite, sound like personality traits. The previous subsection explained that compatible exploration teams involve high levels of agreeableness and low levels of neuroticism.

The likelihood of an IPD strategy to cooperate could be a measurement of agreeableness.

Neuroticism measures moodiness and the level of trust in others. Low neuroticism indicates emotional stability, which could lead to predictable behavior decisions in

20 cooperation. High neuroticism could lead to random decisions in cooperation. The next section formulates the problem of developing an objective approach to simulating personality compatibility and the synthesis of realistic social networks.

21

CHAPTER 3

MOTIVATION AND RESEARCH QUESTIONS

This research has two primary motivations. The first, the potential usefulness of realistic synthetic social networks in general, was described in Chapter 2. The second is the specific potential value of social network analysis in social networks based on personality type.

Schutz developed the Fundamental Interpersonal Relations Orientation (FIRO) model in 1958. Behavioral dimensions of FIRO are inclusion, control, and affection.

Sometime later, affection was replaced with openness and the FIRO model evolved into the Human Element (Schutz 1994). Nelson researched Antarctic work groups and found that during the winter months, "the most compatible set of work groups were those with homogenous low need for prominence" (Nelson 1964). Within the context of the FIRO inclusion dimension, a low need for prominence that the individual does not want a lot of recognition. Within the context of MBTI, a low need for recognition could imply an introverted intuitive personality type. Another Antarctic workgroup study by Gunderson and Ryman found that "Results for the personality scales strongly suggest that reducing group heterogeneity on certain need and attitude dimensions may contribute to effective functioning of isolated work groups.” (Gunderson 1967).

22

The National Aeronautics and Space Administration (NASA) defines Team Risk as the risk associated with a decrease in performance and behavioral health due to inadequacy of a team's cooperation, coordination, communication, and psychosocial adaption (DeChurch 2015). “Currently, NASA has no formalized process to compose mission teams from a scientific perspective, but this is an identified need for future exploration missions” (Landon 2015). Anania, et al. asserts that “crew compatibility on an interpersonal level will need to be a major factor in order to ensure optimal communication and coordination within the team (Anania 2017).” Back asserted that

“personality differences influence social relationships” but noted that social network research rarely considers the effects of individual personalities (Back 2015). Brandley and Herbert applied the Myers Briggs Type Indicators (MBTI) to their study of

Information Systems teams and found that a team's personality type composition is partially related to performance (Bradley 1997).

Lykourentzou, et al. used the Dominance, Influence, Steadiness, and

Conscientiousness (DiSC)2 personality model to study crowd sourced teams; the findings were that balanced teams performed better than unbalanced teams. In a balanced team, no one or two personality types represented a majority of the team members

(Lykourentzou 2016). The terms homophily and heterophily describe the sociological phenomena “birds of feather flock together” and “opposites attract” respectively.

Compatibility of personalities can lead take the form of homophily or heterophily and lead to friendships.

2 The lower case i was a printing error that was later trademarked by the predecessors of Wiley (Goodman 2002). 23

With the motivations of useful synthetic social network generation and the potential value to social network analysis in mind, the following research questions are posed:

1. Can a social network synthesis algorithm driven by a personality compatibility table

produce realistic synthetic social networks?

1.1. Do different personality compatibility tables with different probabilities of link

formation affect the realism of the generated the synthetic social networks?

1.2. Are the social networks generated using personality compatibility tables

produced manually and automatically similarly realistic?

1.3. Are there at least two effective algorithms for generating synthetic social

networks from personality compatibility tables?

2. If effective social network synthesis algorithms driven by personality compatibility

tables exist, how computationally efficient are they?

2.1. What machine-independent metric can be used to measure such algorithms’

efficiency?

2.2. How does the performance of the effective algorithms compare in terms of the

machine-independent metric?

3. Can a “closed-loop” simulation, where a social network is incrementally updated in

an agent-based model, lead to an increasingly realistic social network?

3.1. What actions or events in the agent-based model should modify the social

network?

3.2. How should the agent-based model use a personality compatibility table?

3.3. What stopping criteria should be used to reach or preserve realism?

24

4. Can social network synthesis algorithms produce realistic sequences of synthetic

social networks showing change or evolution of the social network over time?

4.1. What social network metrics are useful as dimensions in social trajectory

analysis?

4.2. How can the realism of social trajectories be measured?

4.3. What algorithm or algorithms produce realistic social trajectories?

Figure 3.1 presents a network diagram or flowchart of the research described in this dissertation. Boxes on the left side of the flowchart represent tasks that produced algorithms, code, tables, metrics, or data needed for the experiments. Boxes in the middle column include experiments, integration, and visualization activities. Products from those activities include figures and tables that flow into the answers to the research questions presented in the ellipses on the right side of Figure 3.1.

25

Figure 3.1 Network diagram of the research

26

CHAPTER 4

LITERATURE REVIEW

This literature review covers two topics central to this research: social network metrics and network generation methods. The social network metrics section includes applications of the metrics identified by the researchers who developed the metrics and the mathematical equations used to calculate them. The second section explains nearly a dozen network generation methods, which provides a basis for asserting the novelty of the new methods described in subsequent chapters.

4.1 Social network metrics

Metrics used in social network analysis measures the number of nodes, number connections among nodes, number of links in a path between nodes, and density of links among groups of nodes. These metrics can be interpreted to identify tightly knit communities, bridges among communities, and powerful influencers. A widely cited review by Newman consolidated descriptions and formulas for the most used network metrics (Newman 2003). An analysis by Freeman applied centrality metrics to variations of a five-node network (Freeman 1978). Bonocich defined a metric quantifying network influence (Bonacich 1987). Borgatti (2005) analyzed centrality metrics with respect to network simulation.

The review article by Newman reviewed graph theory history, starting with

Euler’s 1735 Seven Bridges of Königsberg problem. It explained applications of graph

27 theory to social networks beginning in the 1930s, widely used graph metrics, and network properties starting with the random graph. Based on the work of Rapaport, Solomonoff,

Erdὅs, and Rényi, randomly placed links among a fixed number of nodes constitute an

1 undirected network with 푛(푛 − 1) possible links with a probability p, and the number of 2 links connected to each node has a binomial or Poisson distribution (Newman 2003).

A few of the metrics defined in (Newman 2003) include degree, geodesic path, diameter, and component. Table 2.2 briefly defined these terms. Diameter is a graph level metric, which is the longest geodesic path between two vertices. Newman observed that a few authors define diameter as the average geodesic distance in the graph. A component is a subset of nodes that can be reached by paths through the links to a particular node. Within directed graphs, nodes can have an in-component and an out- component.

Usually, real networks are nonrandom and exhibit a common set of features. A small-world effect emerges from the fact that most network nodes seem to be connected to each other by relatively short paths. An application-oriented implication of the small- world effect is the fast spread of information across a network if links in a network are interpreted as communication pathways. The small world effect has been verified in many different networks. A geodesic distance is the number of links in the shortest path between two nodes. Equation (4.1.1) determines the mean geodesic distance, 푑푖푗, between two nodes 푖 and 푗. The variable n identifies the number of nodes in the network.

1 푙 = 1 (4.1.1) ( ) ∑ 2 푛 푛 + 1 푖≥푗 푑푖푗

28

A breadth-first search of a network can measure the geodesic distance.

Measuring geodesic distance in graphs that have multiple components is problematic because there could be one or more node pairs where there is no connecting path.

Avoiding this issue involves defining geodesic distance only for nodes with have connecting paths. A harmonic mean is the geodesic sum of the inverse of path lengths between the nodes. Disconnected components or isolated nodes have missing links that would have connected them to the giant component; those missing links are considered as infinite path lengths. Inverting those values will make them zero as presented in

Equation (4.1.2).

1 푙−1 = 1 (4.1.2) ( ) ∑ −1 2 푛 푛 + 1 푖≥푗 푑푖푗

Transitivity, or clustering, refers to the likelihood that if node a is connected to node b and node b is connected to node c then there is a good probability that node a is connected to node c. In terms of social networks, a friend of your friend is likely to be your friend as well. Equation (4.1.3) quantifies the clustering coefficient.

3 × 푛푢푚푏푒푟 표푓 푡푟푖푎푛푔푙푒푠 푖푛 푡ℎ푒 푛푒푡푤표푟푘 퐶 = (4.1.3) 푛푢푚푏푒푟 표푓 푐표푛푛푒푐푡푒푑 푡푟푖푝푙푒푠 표푓 푛표푑푒푠

A connected triple consists of a node with links to an unordered pair of nodes.

Effectively, the clustering coefficient measures the fraction of triples where the third link completes a triangle. According to Newman (2003), the sociology literature refers to the clustering coefficient as the “fraction of transitive triples.” An alternative definition of a clustering coefficient is the ratio of the number of triangles connected to a node to the

29

푛푢푚푏푒푟 푡푟푖푎푛푔푙푒푠 푡ℎ푎푡 푐표푛푡푎푖푛 푣푖 number of triples centered at that node, 퐶푖 = , which he 푛푢푚푏푒푟 표푓 푡푟푖푝푙푒푠 푐푒푛푡푒푟푒푑 표푛 푣푖 attributed to (Watts 1998). In sociological literature, this local clustering is known as network density (Scott 2000). This local clustering definition of network density differs from the definition in Table 2.2, which involves the actual number of links divided by the possible number of links.

The degree distributions involve a histogram of the number of links connected to the nodes. Random graphs have a Poisson distribution of degrees. Real-world networks have a highly right-skewed distribution. A cumulative distribution serves as another approach to representing degree data. In Equation (4.3.4), the k represents a value of degree and Pk is the probability that a node’s degree is greater than or equal to k.

푃 = ∑ 푃 ′ 푘 푘 (4.3.4) 푘′=푘

A mix pattern analysis compares the number of node pairs. Examples include mixed race couples, plants and herbivores, customers and internet service providers. An assortativity coefficient quantifies mixing by dividing the number of links connecting two types of nodes over the sum of all the links; an example of two types of nodes could be extroverts and introverts. Degree correlation considers the mix patterns of high and low degree nodes. Newman’s approach involves calculating the Pearson correlation coefficient of the degrees at both ends of a link. Positive values indicate an assortative network and negative values indicate a disassortative network. Social networks appear to be assortative, but other types of networks, for example information, technological, and biological networks, appear to be disassortative.

30

Some networks have degree distributions with exponential tails, which have exponentials in the cumulative distribution. Equation (4.3.5) generates power law and exponential distributions.

∞ ∞ −푘′ −푘 푃 = ∑ 푃 ~ ∑ 푒 ⁄푘 ~ 푒 ⁄푘 푘 푘 (4.3.5) 푘′=푘 푘′=푘

Networks with power law degree distributions are often called scale-free networks. Hubs, or high degree nodes characterize a scale-free network. The oldest nodes in a network have more opportunities to gain links than the newer nodes.

Newman (2003) explained that a network’s resilience refers to the network’s reliability in transmitting messages as nodes are removed from the network. If nodes are removed, the mean path length between a pair of nodes tends to increase. Removal has different meanings in various networks. In epidemiology, removal indicates that a person received a vaccination against a disease. In the field of communication, removal could be due to a storm or sabotage. Removal of high degree nodes can significantly affect the network. Average distance among nodes increases as portions of the population are removed.

Cluster analysis considers the community structure of a social network.

Community structures refers to a set of nodes with high internal link density but lower link density with nodes outside of the community. Assigning connection strength to links and considering cuts that could separate components enables the creation of a dendrogram depicting hierarchical clusters.

31

Betweenness centrality counts the number of paths among nodes that run through a particular node. This metric appears to follow the power law for many networks. As a measure of resilience, betweenness centrality indicates the number of paths that will increase in length by removing a particular node.

Freeman reviewed nine centrality metrics and compared calculated centrality scores for every possible configuration of a five-node network (Freeman 1978). Freeman observed that many of the existing centrality metrics are only loosely related to intuitive ideas and several are so complex that it is difficult to know what they are measuring.

Nine centrality metrics reviewed by (Freeman 1978) are based upon the following foundational metrics: degrees, betweenness, and closeness. Node degree metrics indicate network communication activity. Betweenness related metrics identify nodes that can control communication. Closesness related metrics can indicate either component independence or efficiency. Freeman produced a table that compared metrics for the 34 possible configurations of a five-node network. Observations from his analysis included:

1. Geodesic distance-based metrics cannot calculate centrality for the 13 unconnected

networks;

2. Star and wheel patterns produced a maximum centrality score;

3. Circle and complete graph patterns produced a minimum centrality score;

4. Between the previous two extreme patterns, metrics differed in scores and rankings;

5. Variation in scores, both for node and network centrality, was greatest for indices

based on betweenness, and those metrics are “finer grained;” and

6. Variation of centrality scores was the smallest for degree-based metrics; those metrics

are “coarser grained.”

32

Bonacich explained, “in a power hierarchy, one's power is a positive function of the powers of those one has power over” (Bonacich 1987). He defined a measure of centrality, e, in which a node's centrality “is the summed connections to others, weighted by their centralities.” In Equation (4.3.6), the R represents a matrix of relationships, e is the eigenvector of R, and λ is the associated eigenvalue; this metric became known as eigenvector centrality.

λ푒푖 = ∑ 푅푖푗푒푗 (4.3.6) 푗 Borgatti (2005) analyzed centrality metrics and identified inherent assumptions for consideration in network simulation. As defined by Freeman (1978), a node’s closeness centrality metric is the sum of the geodesic distances from all other nodes.

Recall that a geodesic distance is the number of links in the shortest path between two nodes; therefore, a node with a low closeness metric value has the shortest paths to the other nodes in a network. If a research and development organization’s social network has a low average closeness metric then it means the developers are likely to have a lot of direct communication with one another. A potential benefit of a low average closeness could be that the organization can develop products faster than competitors. The closeness serves as an index of the expected time until arrival of something flowing in the network, the smaller the value, the shorter the time until arrival.

Recall that the betweenness metric measures the number of paths among pairs of nodes that involve the same intermediate node. Betweenness can indicate potential bottle necks in simulating the flow of items or information brokers in communications.

The eigenvector centrality metric is the principal eigenvector of the adjacency matrix that defines the network. The eigenvector centrality equation may be interpreted

33 as indicating that a node with a high eigenvector score is connected to nodes that also have high scores. In a simulation, a high eigenvector centrality can indicate multiple simultaneous paths, which implies an amorphous entity, such as information or infections. For example, in a contagion simulation, eigenvector centrality measures the long-term direct and indirect risk of infection. Within an adjacency matrix, a node’s degree is the sum of a row or column. When compared to eigenvector centrality, node degree indicates the immediate transfer or infection of connected nodes at t + 1 in the contagion simulation example.

4.2 Network generation methods

Generating synthetic social networks that are more realistic than random graphs would seem to require attention to the properties of social networks that distinguish them from random graphs in general. (Hamill 2010), when summarizing observations from

(Wong 2006) and (Bruggeman 2008) asserts that social networks, where the nodes represent individual people, should:

 Have low network density

 Display high clustering

 Exhibit positive assortativity by degree of connectivity

 Include groups of people who are highly connected among themselves

 Have short path links where other individuals can be reached in a few steps

 Limit the size based on the types of relations being studied reflecting homophily

 Have a right-skewed degree distribution

 Change over time

34

Since 1960, several social network generation models have been developed. A selection of existing social network generation models that consider or exploit various structural characteristics of networks include the following. Each will be described following this list:

 Random graph model (Erdős 1960)

 Configuration Model (Bender 1978) (Bollobás 1980) (Milo 2003) (Newman 2003)

 Exponential random graph model (Holland 1981) (Frank 1986) (Wasserman 1996)

 Stochastic block model (Holland 1983) (Nowicki 2001)

 Small world model (Watts 1998)

 Preferential attachment model (Barabási 1999)

 Popularity Similarity model (Papadopoulos 2012)

 Chung-Lu graph model (Chung 2002)

 Degree correlation dK series (Mahadevan 2006)

 Block two-level Erdős-Rényi model (Seshadhri 2012)

 Replication of complex networks model (Staudt 2017)

4.2.1 Random graph model

The simplest method to synthesize a social network is to generate a random graph.

To generate an arbitrary random graph for a given number of nodes n, a simple procedure known as the Erdős-Rényi G(n, p) algorithm (or the random graph model) suffices (Erdős

1959) (Erdős 1960):

1. Initialize vertex set V = {v1, v2, …, vn} and edge set E = .

35

2. For each pair of vertices vi, vj  V, 1 ≤ i < j ≤ n, randomly add edge {vi, vj} to E with probability p, 0 ≤ p ≤ 1.

The resulting random graph will have the desired number of nodes n. The density of each random graph is subject to stochastic variability, but graphs generated in this manner will have, on average, the desired density p (Erdős 1960).

Random graphs can be generated quite easily and some of their properties

(notably number of nodes and density) can be controlled in the process. However, those two properties are in general insufficient to produce graphs that have the more sophisticated desired structural characteristics of any specific class of social networks.

Thus, random graph generation is not a solution to the problem of synthesizing social networks, but it can serve as a starting point and a basis of comparison when developing better social network synthesizers.

4.2.2 The Configuration Model

In random graphs, the nodes’ degrees tend to follow a Poisson distribution

(Bollobás 1998). This can be unrealistic; real-world networks’ node degree distributions are more often non-Poisson and heavy-tailed (Barabási 2005). The Configuration Model extends the random graph model to address that inconsistency (Bender 1978) (Bollobás

1980) (Molloy 1995) (Molloy 1998) (Newman 2001) (Milo 2003) (Newman 2003)

(Viger 2005). In the Configuration Model, network generation is initialized with both the number of nodes n and a specific degree sequence K = {k1, k2, …, kn}, where ki is the degree of node vi. The degree sequence K may be random variates drawn from a suitable distribution (checked to ensure that Σ ki is even), or more simply, the actual degree

36 sequence of a real-world network serving as an exemplar of the class of networks to be generated. Given n nodes and a degree sequence K, links are added by randomly connecting each node vi to ki other nodes, with each link uniformly possible. This produces networks with a realistic degree distribution, but if a single exemplar is used for multiple synthetic networks, all the generated networks will have identical node degrees, which may not be desirable.

4.2.3 Exponential random graph model

The exponential random graph models (ERGM), also known as the p* or p-star model, assembles a network from subgraph structures, such as stars, triangles, paths, and cycle patterns (Wasserman 1996) (Snijders 2002) (Robins 2007). Holland and Leinhardt developed an exponential family of probability distributions for directed graphs, which derived from empirical observations of stars (nodes with multiple links), isolates (nodes without links), and their triad census (the sixteen possible configurations of a directed triad) (Holland 1977) (Holland 1981). Frank and Strauss developed a family of distributions for directed and undirected Markov graphs wherein there existed dependence among the links (Frank 1986). Snijders applied Monte Carlo Markov chains to estimate network metrics such as dyads, undirected and directed two paths, and directed and undirected triangles (Snijders 2002). Hunter distinguished between ERGM and p* by associating the maximum pseudo-likelihood estimation (Wasserman 1996) with p* and maximum likelihood estimation (Geyer 1992) with ERGM (Hunter 2007).

4.2.4 Stochastic block model

Among the existing methods, the stochastic block model (SBM) may have the most similarity to the new methods developed in the current work. Therefore, it is

37 described in more detail. The SBM can be used to generate networks and to detect communities within large scale networks (Holland 1983) (Anderson 1992) (Faust 1992)

(Newman 2004) (Bickel 2009) (Fortunato 2010) (Decelle 2011) (Abbe 2017). The set of actors or agents involved is first partitioned into B communities or clusters known as blocks. This partitioning is often done by manual analysis based on observation or data.

Frequently interacting groups of actors are placed into the same group. A B x B matrix W contains probabilities of link formation both within and among the blocks (Nowicki

2001). The probabilities may be provided manually or by automated analysis of the source data. The on-diagonal entries in W contains the probabilities of links forming between nodes in the same block; whereas, the off-diagonal entries in W contain the probabilities of links forming between nodes in different blocks. If the on-diagonal probabilities are higher than the off-diagonal probabilities, then the intra-block link density will be higher than the inter-block link density; such a network is known as assortative. Conversely, if the off-diagonal probabilities are higher than the on-diagonal probabilities, then the resulting network will have a higher inter-block link density. This type of network is known as dissassortative. In an SBM implementation, the number of nodes in each block may be stored in an integer vector with B entries. If the blocks are assumed to be disjoint, then the sum of the vector’s entries is the total number of nodes in the network. To generate a synthetic network, pairs of nodes (dyads) are selected randomly. In such instances, the probability of link formation in W between nodes in the block(s) containing the selected pair is used to stochastically determine if a link is formed. This process is iterated until the overall network density reaches a specified value.

38

4.2.5 Small world model

The small world model starts with a one-dimensional regular ring lattice where each node has links to its k nearest neighbors (Watts 1998) (Strogatz 2001). Several iterations of random rewiring produce a network with a desired density. For each node, rewiring involves stochastically determining whether an existing link is deleted or if a new link is formed between the current node and another randomly selected node.

4.2.6 Preferential attachment model

The preferential attachment model starts with a small set of nodes and then adds nodes and links in an iterative process based upon the connectivity of the nodes (Barabási

1999). The number of nodes in the initial set determines the maximum degree for new nodes. In each iteration, or “time step,” a new node is added to the network and then links from the new node to the existing nodes are stochastically added up to the maximum degree. The process depends upon the existing nodes’ current connectivity,

1/2 which is calculated as k = m · (t / ti) where m is the node’s current degree, t is the current iteration (or time step), and ti is the initial time step when the node was added. The probability of link being added from the new node to existing node i is ki / (Σ k) where ki is the connectivity of node i and (Σ k) is the sum of the connectivity of the other existing nodes. New nodes, and links from them to existing nodes, are iteratively added until the network has the desired number of nodes. This process produces a scale-free network.

4.2.7 Popularity similarity model

The popularity similarity model bases the probability of link formation on hyperbolic distances between nodes (Papadopoulos 2012). In this model, the network

39 grows as nodes are added at successive time steps. Older (earlier added) nodes tend to be popular because they have had more time to connect to other nodes. To model similarity, new nodes are randomly placed on a circle. A node’s birth time determines the radial coordinate rt = ln(t). Two nodes, with polar coordinates (rs, θs) and (rt, θt), have an approximate hyperbolic distance xst = rs + rt + ln(θst / 2) = ln(stθst / 2) where s and t are the nodes’ respective birth times. This hyperbolic distance serves as a convenient metric representing both radial popularity and angular similarity.

4.2.8 Chung-Lu graph model

The Chung-Lu model generates a random graph with a power law distribution.

Given an exemplar degree sequence, weights, w = (w1,w2,…,wn), are assigned to the

푤푖푤푗 nodes, the probability of link formation between nodes i and j is 푝푖푗 = ⁄ where ∑푘 푤푘

2 k is a component size and max 푤푖 ≤ ∑푘 푤푘, so that 푝푖푗 ≤ 1. Equations 4.2.8.1 and

4.2.8.2 express the average degree in terms and the probability of link formation in terms of the weights and density respectively (Chung 2002a) (Chung 2002b).

푛 1 푑 = ∑ 푤 (4.2.8.1) 푛 푖 푖=1

1 푝 = 푤 푤 휌 where 휌 = (4.2.8.2) 푖푗 푖 푗 푛푑

4.2.9 Degree correlation dK series

The degree correlation dK series model defines probability distributions that specify degree correlations within subnetworks, of size d, for a given exemplar network.

Given an exemplar network, a generated 0K-graph reproduces the average node degree,

40

푘 = 2푚 / 푛, where m is the number of links and n is the number of nodes. A 1K-graph reproduces the degree distribution of an exemplar network. A 2K-graph reproduces the joint degree distribution that reproduces the number of links among nodes of degree k1 and degree k2. The probability of link formation among a pair of nodes is

푚(푘1푘2)휇(푘1푘2) 푃(푘1푘2) = ⁄2푚, where 휇(푘1푘2) is two if k1 = k2 and one otherwise.

A 3K-graph reproduces similar interconnectivity among triads and triangles as an exemplar network. The greater the value of d the more complex the properties of the exemplar network can be reproduced (Mahadevan 2006).

4.2.10 Block Two-Level Erdős-Rényi model

The Block Two-Level Erdős-Rényi model introduces community structures by generating a set of independent networks and then randomly linking nodes among the communities (Seshadhri 2012). Typically, algorithms that implement this model include input parameters for nodes and density. The algorithms also return a network with the number of links based upon the density.

4.2.11 Replication of complex networks model

A method described in Staudt (2017) enables replication of complex networks called ReCon. The method generates scalable synthetic social networks based on an exemplar network. An objective of ReCon is to generate networks of sizes of up to 32 times larger than the exemplar. The ReCon algorithm first detects communities in the exemplar network using the parallel Louvain method. It then generates a working graph as a disjoint union of x copies of the exemplar, where x is a scaling factor. For each detected community in the working graph, the algorithm preserves the degree distribution

41 and rewires the intra-community links through random edge switching. After rewiring both the intra-community links and the inter-community links, the algorithm then connects the copies of the network (Staudt 2017). In this work a realistic replica of an exemplar social network was defined as a network that has similar metric values as the exemplar. The metrics that were compared to the exemplar included sparsity, such as number of links versus number of nodes, the degree distribution of the Gini coefficient, maximum degree, average clustering coefficient, diameter, number of connected components, and number of communities. ReCon produces replicas that are realistic under this definition because it preserves the exemplar’s community structure and node degrees. The idea of individual traits determining link formation was hinted at in

(Staudt 2017), which described a potential application of synthetic social networks as showing interactions that are “determined by implicit psychological and social rules”, but those rules were not used to generate synthetic networks.

In contrast to the algorithms developed in the work reported in this dissertation, with only one exception, the existing social network generation methods do not use any actual or inferred attributes of the persons represented by the nodes to determine or influence the generation of links between the nodes. The exception is the stochastic block model, which uses a group attribute associated with each node to determine the probability of link formation with other nodes within the same group. None of the prior methods use personality type or compatibility as is done in this work to produce synthetic social networks.

The desirable features of a synthetic social network generation algorithm include parsimony (i.e., few parameters), speed of execution, and network realism. Realism is a

42 very important characteristic of synthesized social networks. Realism in social networks has been defined in terms of network structural features, dynamics, and evolution (Staudt

2017). The similarity, or lack thereof, of metric values between a synthetic network and a real network is understood as a measure of realism (Chakrabarti 2004) (Lekovec 2010).

A quantitative assessment of realism is central to the current work.

43

CHAPTER 5

REAL-WORLD EXEMPLAR DATA SETS

After generating a network, how does one know whether it is realistic? The growth of the Internet and general increases in computational power have enabled considerable research on large networks. The topologies of large networks tend to exhibit similar characteristics, which can be exploited by network generation methods.

The topologies of relatively small networks can have significantly different characteristics, which can preclude the use of methods that were developed for generating large networks. Typically, repositories have data sets for large networks, but only a few include data sets for small networks. This chapter identifies the sources of the real-world exemplar networks and explains why online social media data was not used in this research.

5.1 Why not use online social media data?

Current trends in social network analysis include studying social networks developed from massive data sets captured from online social media and communities such as Facebook, Twitter, and Wikipedia. A few examples of this research can be found in (Mislove 2007), (Crandall 2008), (Kwak 2010), (Catanese 2011), (Yang 2015), and

(Grandjean 2016). Common interests in careers, pastimes, politics, popular culture, and societal trends serve as the motivation for joining groups within these online communities, so personality types may be one of many factors determining how links

44 form in real-world social networks. However, according to Krebs (2008), social networks expressed as connections in Facebook and LinkedIn can be misleading, because some site members may try to connect with as many people as possible and others may acquiesce to the creation of requested links with no real connection. “Two people might show to be connected but they really are not – one person was too embarrassed to turn down a ‘friend request’ from a total stranger. These ‘false positives’ tend to pollute the data of these social networking services” (Krebs 2008).

5.2 Sources of real-world social network data sets

Social network analysis research requires real-world social networks to use as input data. First developed in the early 1980s, UCINet is a social network analysis application that calculates a variety of network metrics (Freeman 1988). UCInet includes functions for discovering cohesive subgroups in a network (Borgatti 2014). An associated archive of social networks, represented as adjacency matrices, is maintained in the UCINet format (Freeman 2008).

The Stanford Network Analysis Platform (SNAP) is a repository with open source software for analyzing and manipulating large data sets (Sosič 2015). Leskovec and

Sosič distinguish between graphs, as mathematical objects, and networks, as real-world datasets of nodes and links (Leskovec 2016). The SNAP repository provides more than

200 graph functions and approximately 80 network datasets. The functions include graph generators, manipulators, and analytics (Leskovec and Sosič 2016). The network datasets include online social networks, citation networks, communication networks, and

Wikipedia article networks (Leskovec 2014).

45

The real-world social networks used in this research include both symmetric and asymmetric as well as both unweighted and weighted networks. The new network synthesis algorithms to be described produce symmetric unweighted networks.

Therefore, the real-world networks used as exemplars are converted to symmetric and unweighted when necessary before being used. The conversions were done in the obvious ways; if an asymmetric network had directed link(s) in either or both directions between two nodes, then the converted network had an undirected link between those nodes. On the other hand, if a weighted network had a weighted link of any weight between two nodes, then the converted network had an unweighted link between the nodes.

Table 5.1 lists the real-world social networks used in this research as source data; they are from the UCINet archive. In all but one of the networks, the nodes of the network correspond to individual people and the links to a relationship of some kind between them. The exception is the Schwimmer Taro Exchange Network, where the nodes correspond to Orokaiva households within the Papaun village Sivepe and the links represent the mutual exchange of gifts, such as cooked taro (Schwimmer 1970; 1973).

46

Table 5.1 Real-world social network data sets used in this research.

Real-world social network Source Nodes Symmetric Weighted

Robins Australian Bank (Pattison 2000) 11 no no

Roethlisberger & Dickson (Roethlisberger 1939) 14 yes no Bank Wiring Room

Thurman Office (Thurman 1979) 15 yes no

Sampson Monastery (Sampson 1969) 18 no yes

Krackhardt Office CSS (Krackhardt 1987) 21 no no

Krackhardt High-Tech (Krackhardt 1987) 21 yes no Managers

Schwimmer Taro Exchange (Schwimmer 1973) 22 yes no

Webster Accounting Firm (Webster 1993) 24 yes yes

Zachary Karate Club (Zachary 1977) 34 no no

Bernard & Killworth (Bernard 1982) 34 yes yes Technical

Bernard & Killworth Office (Bernard 1982) 40 yes yes

Krebs Fortune 500 IT (Chen 2007) 56 no yes Department (Advice)

Krebs Fortune 500 IT (Chen 2007) 56 no yes Department (Business)

Lazega Law Firm (Lazega 2001) 71 no no

47

CHAPTER 6

DEVELOPING A PERSONALITY COMPATIBILITY TABLE

The network generation algorithms described in the subsequent chapters use a personality compatibility table. This chapter explains a method for producing such a table. The method applies observations found in (Keirsey 1998) to infer attitudes about group compatibility factors. Positive or negative attitudes in common constitute potential homophily and opposing attitudes constitute potential heterophily. This chapter presents

Keirsey’s observations, explains how attitudes and compatibility were inferred from those observations, and presents the equation that generated link formation probabilities for a compatibility table.

6.1 Motivation for developing this method

To use the MBTI personality types to drive the network generation algorithms, information regarding the compatibility of those personality types was needed. A table indicating the compatibilities of all the combinations of MBTI personality types has been available on several social media sites, such as Imgur, Pinterest, Reddit, Personality Café, and Socionics forum, since at least 2014. However, none of the online posts containing that table provide a source for it. Without a credible source, the posted table could not be cited, relied upon, or used in this research.

48

A sustained effort was undertaken to identify the original source of the table. The people who had posted the table, the web services personnel at the UAH Salmon Library, the research department of the Myers-Briggs Foundation and a few Socionics3 personality experts were all contacted. Results from these contacts were either no response or a negative response. As this is written, the source of the online MBTI personality compatibility table is still unknown.

The lack of a credible source for the online MBTI personality compatibility table meant that it could not be used in this research. A few experiments conducted with that table provided promising results. Consequently, the MBTI personality compatibilities needed for the network generation algorithms would have to be derived directly from standard credible MBTI sources. The method for doing so and the resulting new MBTI personality compatibility table are described in this chapter.

The network generation algorithms that are the subject of this research were, as will be seen, able to use the resulting MBTI personality compatibility table to generate realistic synthetic social networks. However, it is important to note that the algorithms are not dependent on a specific personality compatibility table. Chapter 11 reports how a different personality compatibility table, generated with a completely different method and personality scheme, was also used by the algorithms to generate realistic networks.

3 Socionics was suggested because it also built on the work of Jung and its personality scheme shares a few characteristics with MBTI. 49

6.2 Inferring attitudes and personality compatibility

Homophily and heterophily can be modeled as likelihoods of link formation among personality types. Table A.1 is a personality compatibility table for the MBTI personality types. The rows and columns are the 16 MBTI personality types. Each entry in the table is the probability of a link forming in a social network between two nodes if the nodes’ associated personality types are those of the entry’s row and column. Note that the table is symmetric, for example the two entries for two personality types are the same regardless of which type is on the row and the column. In Table A.1, values on the diagonal of the table represent a level of homophily because cells on the diagonal are the intersections of rows and columns identifying the same personality type. Values in the cells other than the diagonal represent some level of heterophily because those cells are at the intersections of rows and columns that identify different personality types. Generating this compatibility table involved inferring attitudes about environmental factors and conducting pairwise comparison of the MBTI personality types to determine the number of common and opposing attitudes.

Table A.1 was constructed from (Keirsey 1998) by applying the following

procedure:

1. Identify a set of environmental factors that are important in determining personality

compatibility; for this work eight such factors were identified.

2. Interpret the personality model to determine each personality type’s opinion regarding

each of the environmental factors.

50

3. Perform pair-wise comparisons of 16 MBTI personality types to determine the number

of shared or consistent opinions regarding the environmental factors between each pair

of personality types.

4. Scale the counts of common opinions into probabilities of link formation for the

compatibility table.

Long duration space exploration crews will be a small isolated confined team supported by a larger remote team. Team members will have high tempo workloads and periods of boredom (DeChurch et al. 2015). Withholding information from other teams can be dangerous. Training must establish communication norms and build trust among teams (Landon et al. 2018). Crews need to have compatible personalities, a "good command structure, a common goal, respect for each other's worth et cetera".

Interpersonal sensitivity was identified as having a positive correlation with leadership and task activity (Kanas 1971). “[L]eadership and teammate support can facilitate individual functioning and encourage psychological and physically healthy behaviors and attitudes” (Landon et al. 2015). Interpreting these comments into group compatibility factors, one could compare personality types to find common attitudes about:

 Authority; a tendency to respect or work with the chain of command.

 Communication; a tendency to value accurate and specific vernacular.

 Consideration; a tendency to respect or incorporate other people's opinions.

 Empathy; a tendency to recognize or synchronize with other people's feelings.

 Harmony; a tendency to tolerate or relieve interpersonal tensions.

 Loyalty; tendency to value relationships and defend alliances.

51

 Productivity; a tendency to value efficient processes or creating something.

 Rules; a tendency to follow and defend documented procedures.

6.3 Keirsey's observations of the MBTI personality types

An interactive table located at the Myers Briggs Foundation website presents sets of positive adjectives to describe the 16 personality types4; for example, adjectives for an

ESTP include adventurous, pragmatic, easygoing, realistic, analytical, and spontaneous.

Though these adjectives characterize a person’s personality, they do not provide insight to interpersonal perceptions. Observations in (Keirsey 1998) do indicate attitudes about authority, harmony, communications, and other aspects of interpersonal interactions.

The following quotations from (Keirsey 1998) illustrate the source content from which the environmental factors could be identified and the various personality types’ likely opinions of them were determined. Environmental factors noted after each quotation indicate that people with the associated MBTI personality types may have positive or negative attitudes about those factors. Interpretive analysis enabled mapping the observations to sets of external factors, which follow the quotes.

 Promoters (ESTP) “[have a] low tolerance for anxiety and are apt to leave relationships

that are filled with interpersonal tensions.” (harmony, loyalty)

 Composers (ISFP) “will put up with a lot more interpersonal tensions than other

Artisans.” (harmony and loyalty).

4 Myersbriggs.org. (2019). MBTI® Basics. [online] Available at: https://www.myersbriggs.org/my-mbti- personality-type/mbti-basics/ [Accessed 9 June 2019].

52

 Crafters (ISTP) “can be fiercely insubordinate, seeing hierarchy and authority as

unnecessary and even irksome.” (authority and rules)

 Performers (ESFP) “tolerance for anxiety is the lowest of all the types, and they will

avoid worries and troubles by ignoring the unhappiness of a situation as long as

possible.” (harmony and productivity)

 Supervisors (ESTJ) “may not always be responsive to points of view and emotions of

others and have a tendency to jump to conclusions too quickly.” (authority and

productivity)

 Providers (ESFJ) “tend to listen to acknowledged authorities on abstract matters, and

often rely on officially sanctioned views as the source of their opinions and

attitudes.” (authority and rules)

 Inspectors (ISTJ) “Because of [being adamant about rule compliance,] they are often

misjudged as having ice in their veins, for people fail to see their good intentions and

their vulnerability to criticism.” (authority and rules)

 Protectors (ISFJ) “know the value of a dollar and abhor the squandering or misuse of

resources.” (productivity)

 Teachers (ENFJ) “When [they] find that their position or beliefs were not

comprehended or accepted, they are surprised, puzzled, and sometimes

hurt.” (communications, harmony, and consideration)

53

 Counselors (INFJ) “value staff harmony and want an organization to run smoothly and

pleasantly, making every effort themselves to contribute to that end.” (harmony,

consideration, and productivity)

 Champions (ENFP) “Sometimes [they] get impatient with their superiors; and they will

occasionally side with detractors of their organization, who find in them a sympathetic

ear and a natural rescuer.” (authority, communication, and empathy)

 Healers (INFP) “have difficulty thinking in conditional ‘if-then’ terms; they tend to see

things as either black or white and can be impatient with contingency.”

(communication, empathy, and consideration)

 Fieldmarshals (ENTJ) “For the [fieldmarshal], there must always be a reason for doing

anything, and peoples’ feelings usually are not sufficient reason.” (authority, rules, and

productivity)

 Masterminds (INTJ) “Colleagues may describe [masterminds] as unemotional and, at

times, cold and dispassionate, when in truth they are merely taking the goals of an

institution seriously, and continually striving to achieve those goals.” (productivity and

rules)

 Inventors (ENTP) “If an [inventor’s] job becomes dull and repetitive, they tend to lose

interest and fail to follow through -- often to the discomfort of colleagues.”

(productivity)

54

 Architect (INTP) “It is difficult for an [architect] to listen to nonsense, even in a casual

conversation, without pointing out the speaker’s error, and this makes communication

with them an uncomfortable experience for many.” (communication and consideration)

Based on these quotes and other similar descriptions of the personality types in

(Keirsey 1998), their likely opinions regarding the environmental factors were determined.

Table 6.1 Inferred MBTI personality types’ attitudes toward group compatibility factors.

Attitudes regarding

Communication

Productivity

Authority

Harmony

Loyalty

Rules Category Personality type

Promoter ESTP 0 1 0 0 0 0 Composer ISFP 0 0 1 1 1 0 Artisans Crafter ISTP 0 1 0 1 1 1 Performer ISFP 1 0 0 0 0 1 Supervisor ESTJ 1 1 0 1 1 1 Provider ESFJ 1 0 0 1 0 1 Guardians Inspector ISTJ 1 1 0 1 0 1 Protector ISFJ 1 1 0 1 1 1 Teacher ENFJ 1 1 1 0 0 0 Counselor INFJ 0 0 1 1 0 0 Idealists Champion ENFP 0 0 0 1 1 0 Healer INFP 0 0 1 0 0 1 Field marshal ENTJ 1 0 1 1 0 0 Mastermind INTJ 0 1 1 0 1 0 Rationals Inventor ENTP 0 1 0 0 1 0 Architect INTP 0 1 1 0 1 1

55

6.4 Constructing the compatibility table

Interpretation of Keirsey’s observations of the MBTI personality types produced a set of attitudes about group compatibility factors. Summarizing those attitudes as a high or low value, Table 6.1 enables a pair-wise comparison of the personalities to determine the number of common attitudes.

The Keirsey temperaments scheme groups the 16 possible MBTI personality types into four categories referred to as artisans, guardians, idealists, and rationals

(Keirsey 1998). The table is organized by those categories. In Table 6.1, a 0 indicates that people of the personality type are likely to hold a low or negative opinion of the environmental factor. The number 1 indicates a relatively high or positive opinion for each pair of personality types X and Y. The number of environmental factors on which they agreed (both had 0 or both had 1 in the table) was calculated, and that value is denoted as a(X, Y), with a(X, Y)  {0, 1, 2, …, 6}. The pairwise comparison considered six environmental factors; hence, six was the maximum number of possible agreements.

In this analysis, the maximum number of agreed upon factors by any pair of two distinct personality types found in (Keirsey1998) was actually five. Equation (6.3.1) estimates the probability of a link forming between personality types X and Y.

(푥 − 휇) 푝(푋, 푌) = 0.5 ∙ (1 + 푒푟푓 ( )) (휎 ∙ √2) (6.3.1) 2 푥 2 where 푒푟푓(푥) = ∫ 푒−푡 푑푡 is the Gauss error function √휋 0

The values for μ and 휎 were determined empirically meaning that several tables had been produced using various values for μ and 휎. Those tables were used with the

56 algorithms described in Chapter 9; the parameter values μ ≈ 2.9747, and σ ≈ 1.8185 produced the most effective personality compatibility table. The result of this formula is that 0.05 ≤ p(X, Y) ≤ 0.95 for all personality types X and Y, leaving a small but non-zero probability (0.05) of a link forming and the same small probability of link not forming

(also 0.05) between any two personality types. The p(X, Y) values were recorded in the personality compatibility table. Table A.1 presents the resulting personality compatibility table.

6.5 Personality compatibility table development method discussion

The method described in this chapter involves a pair-wise comparison of attitudes towards a set of factors. This method can be applied to any discrete personality model.

For personality models with continuous scales, it is possible to specify discrete personality types based on dominant traits. Chapter 13 will present a notional approach of discretizing continuous personality models. Of course, other methods of determining personality compatibility values are possible. Though MBTI was used in this research because of its wide application in practical settings, the social network generation algorithms developed in this work do not depend on a particular personality compatibility table or even on a particular personality model. The synthetic social network generation algorithm will operate with any reasonable and internally consistent compatibility table.

The method described in this chapter can apply to other personality models provided that they are discrete and enable inference of attitudes toward a set of group compatibility factors.

57

CHAPTER 7

AGENT-BASED MODELING

Agent-Based Modeling (ABM) can simulate interacting individuals or teams through rules that govern behaviors. Prior research conducted as part of an independent study project involved modeling a network of teams using the NetLogo ABM integrated development environment. Building upon that experience, the research described in this chapter was conducted in NetLogo. Verification is determining if a model is consistent with its specification and validation is determining whether a model is an accurate representation of the system or phenomenon it models (Petty 2010). Based upon recommendations from the dissertation committee members, the research included both verification of the NetLogo network extension and validation of the synthetic network produced using NetLogo. The verification was done by comparing results calculated by

Excel, Gephi, and NetLogo. The validation was performed by time series analysis of the realism of a synthetic social network generated using NetLogo. The validation addressed the research queston concerning whether a closed-loop agent-based model can produce and increasingly realistic social network.

7.1 NetLogo simulation

Wilensky and his team at the Center for Connected Learning and Computer-

Based Modeling at Northwestern University created the NetLogo ABM integrated development environment with the languages Scala and Java (Wilensky 1999). A world-

58 wide community of NetLogo developers have contributed dozens of extensions and libraries as well as hundreds of models. The research described in this chapter utilized the network extension, named Nw, which was built upon the Java Universal

Network/Graph Framework (JUNG) and JGraphT code libraries (Wilensky 2013).

7.2 Verification of the NetLogo network extension

Analysis of the graph shown in Figure 7.1 enabled verification of the NetLogo network extension.5 Tables in this section present manually counted values used in calculations for betweenness centrality, closeness centrality, and mean path length.

Figure 7.1 depicts a graph expressed by V(G) = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,

13, 14, 15} and E(G) = {(1, 2), (2, 3), (3, 4), (1, 4), (4, 5), (5, 6), (6, 7), (7, 8), (7, 9),

(7, 10), (7, 11), (11, 12), (12, 13), (13, 14), (14, 15), (15, 1), (13, 1)}.

Figure 7.1 Example network for validating the network extension

5 Dr. Peter J. Slater suggested the validation method and provided the graph. 59

The following subsections present manual calculations of the metrics and compare those metrics to results from NetLogo and Gephi. Results presented in the following sections show that the NetLogo network extension performs as expected.

7.2.1 Betweenness centrality

Calculating a node’s betweenness centrality involves counting the shortest paths, known as geodesics, between two nodes that pass-through a node of interest, then dividing that count by the total number of geodesics between the two nodes. As an equation, betweenness is 푏푖푗 = 푔푖푗(푣푘)⁄푔푖푗 where the numerator is the number of geodesics between vi and vj that pass-through vk and the denominator is the total number

6 of geodesics between vi and vj (Freeman 1977) . A summation of all the proportions of paths that involve the node of interest is the betweenness centrality of that node. As an example, consider node 1 in Figure 7.1. Table 7.1 presents a summation of the betweenness centrality for node 1. The lower left triangle of Table 7.1 lists the number of geodesics between the nodes. The upper right triangle of Table 7.1 identifies the proportions of those geodesics that pass-through node 1. For node 1, NetLogo calculated a betweenness centrality of 28.17, which is the same as the manually calculated value at the bottom of Table 7.1.

6 The betweenness equation in (Freeman 1977) used the variable p for points; using v for vertex reflects modern graph terminology. 60

Table 7.1 Summation of the betweenness centrality for node 1

Nodes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ∑ 풃풊풋

1 2 1 0 0.50 0.50 0.50 0.67 0.67 0.67 0.67 1.00 1.00 1.00 1.00 1.00 9.17 3 2 1 0 0 0 0 0 0 0 0.67 1.00 1.00 1.00 1.00 4.67 4 1 2 1 0 0 0 0 0 0 1.00 1.00 1.00 0.50 1.00 4.50 5 1 2 1 1 0 0 0 0 0 0 0.50 1.00 1.00 1.00 3.50 6 1 2 1 1 1 0 0 0 0 0 0 0.50 0.67 1.00 2.17 7 2 3 1 1 1 1 0 0 0 0 0 0 0 0.67 0.67 8 2 3 1 1 1 1 1 0 0 0 0 0 0 0.67 0.67 9 2 3 1 1 1 1 1 1 0 0 0 0 0 0.67 0.67 10 2 3 1 1 1 1 1 1 1 0 0 0 0 0.67 0.67 11 1 1 3 2 1 1 1 1 1 1 0 0 0 0.50 0.50 12 1 1 2 1 2 1 1 1 1 1 1 0 0 0.50 0.50 13 1 1 2 1 1 2 1 1 1 1 1 1 0 0.50 0.50 14 2 2 2 2 2 3 1 1 1 1 1 1 1 0 15 1 1 2 1 1 1 3 3 3 3 2 2 2 1 Sum 28.17

Table 7.2 Manually calculated closeness metric

Path Distance Path Distance {7, ..., 1} 4 {1, ..., 2} 1 {7, ..., 2} 5 {1, ..., 3} 2 {7, ..., 3} 4 {1, ..., 4} 1 {7, ..., 4} 3 {1, ..., 5} 2 {7, ..., 5} 2 {1, ..., 6} 3 {7, ..., 6} 1 {1, ..., 7} 4 {7, ..., 8} 1 {1, ..., 8} 5 {7, ..., 9} 1 {1, ..., 9} 5 {7, ..., 10} 1 {1, ..., 10} 5 {7, ..., 11} 1 {1, ..., 11} 3 {7, ..., 12} 2 {1, ..., 12} 2 {7, ..., 13} 3 {1, ..., 13} 1 {7, ..., 14} 4 {1, ..., 14} 2 {7, ..., 15} 5 {1, ..., 15} 1 d 37 37 ퟏ ⁄∑ 풅 0.0270 0.0270 (풏 − ퟏ) 0.3784 0.3784 ⁄∑ 풅

61

7.2.2 Closeness centrality

Equation (7.2.2.1) calculates the closeness centrality of a node (Beauchamp 1965)

(Freeman 1978). The summation in the denominator involves the geodesic distances between a specific node and all the other nodes; n is the number of nodes, and v represents the nodes7.

푛 — 1 퐶푐(푣푘) = 푛 (7.2.2.1) ∑푖=1 푑(푣푖, 푣푘)

In Table 7.2 the path columns identify paths originating at nodes 1 and 7. The distance columns show the counts of the number of links between the starting node and the destination node. The next to the last row of Table 7.2 presents the mean shortest distance from node vi to the other nodes in the network. The last row of Table 7.2 reports the calculated closeness centrality as defined in Equation 7.2.2.1. For nodes 1 and 7,

NetLogo calculated the closeness centrality values as 0.3784, which is the same as the manually calculated values in Table 7.2.

7.2.3 Eigenvector centrality

If a person is connected to people who have a lot of connections, then that person might be influential. Eigenvector centrality measures influence by considering the number of connections of a node’s neighbors. Bonacich expressed eigenvector centrality as 휆푒 = ∑푗 푅푖푗푒 where e is an eigenvector of the matrix R and 휆 is the eigenvalue

(Bonacich 1987). In Equation (7.2.3.1), A is an adjacency matrix, xi is the eigenvector

-1 centrality of node vi, and k is a constant of proportionality equal to the largest eigenvalue (Newman 2018).

7 The equation in (Freeman 1978) used p for points; Equation (7.2.2.1) uses v for vertex, to reflect current graph terminology. 62

푛 −1 푥푖 = 푘 ∑ 퐴푖푗푥푗 (7.2.3.1) 푗=1

A few graph analysis applications, such as Gephi, use the power iteration algorithm for calculating eigenvector centrality. This algorithm finds the largest eigenvalue and eigenvector pair by iteratively multiplying a matrix by a vector until a convergence condition is met (Sharma 2012). In Gephi, an analyst enters the number of iterations for estimating the eigenvector centrality (Cherven 2013).

Figure 7.2 presents a bar chart of the differences in the values presented in

Table 7.3. The most significant value differences include node 7, which had the highest degree, and nodes 8, 9, and 10, which had the lowest degree.

The sum of all geodesics through a network divided by the number of nodes multiplied by one less than the number of nodes yields the mean path length. Equation

(7.2.3.2) expresses mean path length, where n is the number of nodes and dij is the distance between the nodes vi and vj (Newman 2018).

Figure 7.2 Comparison of eigenvector centrality results from Gephi and Netlogo

63

Table 7.3 Comparison of eigenvector centrality

Node Gephi NetLogo 1 1.000 1.000 2 0.567 0.575 3 0.500 0.499 4 0.754 0.729 5 0.483 0.405 6 0.497 0.327 7 0.770 0.447 8 0.310 0.171 9 0.310 0.171 10 0.310 0.171 11 0.497 0.327 12 0.483 0.405 13 0.754 0.729 14 0.500 0.499 15 0.567 0.576

Table 7.4 Geodesics between all node pairs

Nodes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 0 2 1 0 3 2 1 0 4 1 2 1 0 5 2 3 2 1 0 6 3 4 3 2 1 0 7 4 5 4 3 2 1 0 8 5 6 5 4 3 2 1 0 9 5 6 5 4 3 2 1 2 0 10 5 6 5 4 3 2 1 2 2 0 11 3 4 5 4 3 2 1 2 2 2 0 12 2 3 4 3 4 3 2 3 3 3 1 0 13 1 2 3 2 3 4 3 4 4 4 2 1 0 14 2 3 4 3 4 5 4 5 5 5 3 2 1 0 15 1 2 3 2 3 4 5 6 6 6 4 3 2 1 0 Totals 37 47 44 32 29 25 18 24 22 20 10 6 3 1 0

64

1 푙 = ∑ 푑 (푛 − 1) 푖푗 (7.2.3.2) 푖≠푗

Table 7.4 presents the geodesics between every pair of nodes in the network. The total number of links or total distance is 318. Dividing the total distance of the shortest paths by the number of pairs, which is 105, gives 3.028, which is the same value calculated by

NetLogo.

7.3 Demonstrating the convergence of metrics between random and real networks

The Erdős-Rényi G(n, p) algorithm generates networks for a given number of nodes n and a probability of link formation p. The network starts with no linked nodes and with each iteration, two nodes, known as a dyad, are randomly selected. A randomly generated number between zero and one is compared to p. If the randomly generated number is equal or less than p, then a link is formed between the dyad. With each iteration, the metrics of the generated network become more realistic when compared to a real-world exemplar network.

Constant metric values in Figure 7.3 were calculated for the real-world exemplar network observed by Robins, who asked employees of a large Australian bank whom they considered a close friend. In Figure 7.3, the lines that converge toward the constant metrics were generated by the NetLogo model that called a Nw extension function, nw:generate-random, which implements the G(n, p) algorithm. Parameter n was set to the number of people in the Robins Australian bank network and p was set to that network’s density. Metrics were calculated and recorded after each link was added to the network. The metrics, network density, degree average, degree standard deviation, global cluster coefficient, and mean path length of the real-world social network were compared

65 with these metrics from a simulator synthesized social network. Plots in Figure 7.3 demonstrate the convergence of the synthetic network’s metrics to the real-world values as links are iteratively added to the generated network.

(a) Average degree (b) Standard deviation degree

(c) Network density (d) Mean path length

(e) Global clustering coefficient Figure 7.3 Closed loop convergence between metrics of synthetic and real networks

66

7.4 Time series analysis

A time series analysis was conducted using the NetLogo model described in section 7.3 wherein G(n, p), the n was set to number of people and p was set to the density of the Robins Australian bank exemplar network. Comparisons were conducted by applying Spearman rank correlation, the Mann-Kendall, and the Thiel-Sen time series statistical methods. Metrics for betweenness centrality, closeness centrality, and mean path length were calculated with an Excel spreadsheet and compared to results generated by the NetLogo network extension; also, eigenvector centralities were generated and compared with results from the Gephi network analysis application.

7.4.1 Spearman rank correlation

Spearman rank correlation, also known as Spearman’s Rho, involves determining the ranks of data from two data sets and generating a test statistic that indicates whether a monotone increasing or decreasing correlation exists between the two data sets. Equation

(7.4.1.1) calculates the Spearman rank correlation coefficient (Brase 2015).

6 ∑ 푑2 푟 = 1 − 푤ℎ푒푟푒 푑 = 푅(푥 ) − 푅(푦 ) (7.4.1.1) 푠 푛(푛2 − 1) 푖 푖

If rs = -1 then the relationship between x and y is perfectly monotone decreasing.

If rs = 0 then there is no relationship between x and y.

If rs = 1 then the relationship between x and y is perfectly monotone increasing.

67

Table 7.5 Determining the Spearman rank correlation

Real- Ordered World Rank 2 Network R(Xi) Position d = (xi − yi) d Network of yi Density Density 0.333333 1 1 0.2909 17.5 16.5 272.25 0.357143 7 2, …, 12 0.2909 17.5 10.5 110.25 0.361111 16 13, …, 19 0.2909 17.5 1.5 2.25 0.380952 20.5 20, 21 0.2909 17.5 -3 9 0.388889 23.5 22, …, 25 0.2909 17.5 -6 36 0.392857 27 26, …, 28 0.2909 17.5 -9.5 90.25 0.428571 29.5 29, 30 0.2909 17.5 -12 144 0.666667 32 31, 32, 33 0.2909 17.5 -14.5 210.25 1 34 34 0.2909 17.5 -16.5 272.25 Mean 17.5 Sum 3124.5 ퟔ ∙ ∑ 풅ퟐ 18747 풏 ∙ (풏ퟐ − ퟏ) 39270 rs 0.522613

Table 7.5 presents the calculation of the Spearman rank correlation coefficient that compares a network density of a real-world social network and observed network densities for synthesized social networks. The coefficient rs = ρ = 0.523 indicates some monotone increasing correlation between the synthesized social network densities and the real-world social network density. A hypothesis test can determine whether there is a monotone increasing correlation between the data sets. The null hypothesis, H0: 휌 = 0 means there is no correlation between the two data sets. The alternative hypothesis,

H1: 휌 > 0 means there is a monotone increasing correlation.

A look-up table provides a P-value for a specified critical value. If the test statistic, is greater than the P-value, then the null hypothesis can be rejected. What is the

68 meaning of the P-value? According to Brase, the definition of P-value is “Assuming H0 is true, the probability that the test statistic will take on values as extreme as or more extreme than the observed test statistic (computed from sample data) is called the P-value of the test. The smaller the P-value computed from sample data, the stronger the evidence against H0” (Brase 2015). Also known as the probability of chance, the P-value indicates whether the results of a statistical test are due to chance. According to Table

B.19 in (Zar 2010), the P-value = 0.34 when the critical value 훼 = 0.05 and n = 34.

Table 7.6 presents the Spearman rank correlation where Equation (7.4.1.2) calculates 휌 and 푛 ∙ (푛2 − 1) = 39,290. Considering that ρ = 0.523 > 0.34, the null hypothesis can be rejected, which means that there exists a monotonically increasing correlation between the synthesized social network metric and the real-world metric. Table 7.6 presents a summary of findings based on values from Table 7.5.

(6 ∙ ∑ 푑2) 휌 = 1 − 푖 (7.4.1.2) 푛 ∙ (푛2 − 1)

Table 7.6 Summary of Spearman rank correlation analysis

Null Metric ∑ 풅ퟐ ퟔ ∙ ∑ 풅ퟐ 훒 풊 풊 Hypothesis Network density 3124.5 18747 0.522613 Reject Average degree 3124.5 1847 0.522613 Reject Standard deviation degree 3124.5 18747 0.522613 Reject Global clustering coefficient 3116.5 18699 0.523835 Reject Mean path length 3053 18318 0.533537 Reject

69

7.4.2 Mann-Kendall

The monotonic trend test, Mann-Kendall, applies a sign function in the calculation of the test statistic S (Hollander 2014) (Gilbert 1987). Equation (7.4.2.1) defines the sign functions and the calculation of the test statistic S.

푠푔푛(푥푗 − 푥푘) = 1 푖푓 푥푗 − 푥푘 > 0

푠푔푛(푥푗 − 푥푘) = 0 푖푓 푥푗 − 푥푘 = 0

(7.4.2.1) 푠푔푛(푥푗 − 푥푘) = −1 푖푓 푥푗 − 푥푘 < 0

푛−1 푛

푆 = ∑ ∑ 푠푔푛( 푥푗 − 푥푘) 푘=1 푗=푘+1

Equation (7.4.2.2) estimates the variance for sample sizes greater than 40 (Gilbert 1987):

{푛(푛 − 1)(2푛 + 5) − ∑푔 푡 (푡 − 1)(2푡 + 5) } 푉퐴푅(푆) = 푖=1 푖 푖 푖 (7.4.2.2) 18

Where g is the number of groups of repeating observation values, ti is the size of a repeating observation value group, and n is the size of the sample. Equation (7.4.2.3) calculates a z score

푆 − 1 푍 = 푖푓 푆 > 0 √푉퐴푅(푆) (7.4.2.3) 푍 = 0 푖푓 푆 = 0

푆 + 1 푍 = 푖푓 푆 < 0 √푉퐴푅(푆)

70

Calculating S involves a summation of the results from the sign function applied to the differences of every combination of the observations. The observations are the differences between the real-world metric and the simulation synthesized social network metrics. Figure 7.4 depicts the calculation of the test statistic S.

With 34 observations, 푛(푛 − 1)(2푛 + 5) = 81,906. Among the 34 observations, there were eight groups of repeating values. After calculating S and VAR(S), a z score was calculated. NORM.S.DIST(W82,TRUE), an Excel function for the Normal distribution, replaced the action looking up the z score in a cumulative distribution function table.

Figure 7.4 Excel spreadsheets used in calculation of the Mann-Kendall test statistic

71

Table 7.7 Results of the Mann-Kendall monotonic trend test

Null Metric S VAR(S) Z φ(Z) Hypothesis Average degree 471 4318 7.152 1 Reject Standard deviation degree 439 4320 6.663 1 Reject Network density 43 4326 0.6386 0.7385 Accept Global clustering coefficient 436 4310 6.626 1 Reject Mean path length 193 4220 2.955 0.9984 Reject

Table 7.7 resents the results of the Mann-Kendall monotonic trend test. As with

Spearman rank correlation, the null hypothesis was that there is no monotonic trend.

Except for Network Density, the null hypothesis was rejected, which means there exists a monotonic trend.

7.4.3 Theil-Sen

1 The Theil-Sen nonparametric slope estimator involves the 푛(푛 − 1) possible 2 differences between observed values divided by the differences in the time series positions, as expressed in Equation (7.4.3.1) for slope (Gilbert 1987) (Hollander 2014).

푦푗 − 푦푘 푄 = 푓표푟 푎푙푙 푗 > 푘 (7.4.3.1) 푡푗 − 푡푘

Organize the slope estimates from the smallest to the largest: 푄1 ≤ 푄2 ≤ ⋯ ≤ 푄푁

Equation (7.4.3.2) determines the median slope.

훽̂ = 푄 푁+1 푖푓 푁 푖푠 푂푑푑 1 [ ] 2 1 (7.4.3.2) 훽̂ = (푄 푁 + 푄 푁+2 ) 푖푓 푁 푖푠 푒푣푒푛 1 [ ] [ ] 2 2 2 푤ℎ푒푟푒 푁 = 푛(푛 − 1)/2

72

The upper and lower limit of a confidence interval involves determining the slope estimates at positions specified by the Equations (7.4.3.3) through (7.4.3.7).

푁 + 퐶훼 (7.4.3.3) 푈 = 1 + 푤ℎ푒푟푒 2

훼 √[ ( )] (7.4.3.4) 퐶훼 = 푍1− 푉푎푟 푆 2 푛(푛 − 1)(2푛 + 5) (7.4.3.5) 푉푎푟(푆) = 18 푁 − 퐶 (7.4.3.6) 퐿 = 훼 2 (7.4.3.7) 퐿퐿 ≤ 훽1 ≤ 푈퐿

Figure 7.5 Excel spreadsheet tables of Theil Sen slopes

73

Figure 7.5 depicts spread sheets named Numerators, Denominators, and Slopes.

The values are the differences between metrics from a synthesized social network and a real-world network. The spreadsheets calculate the values for the Theil-Sen slopes presented in Figure 7.6..

(a) Average degree (b) Standard deviation degree

(c) Network density (d) Mean path length

(e) Global clustering coefficient Figure 7.6 Thiel-Sen slopes

74

Slopes, presented in Figure 7.6, include a trend and upper and lower bound confidence intervals. A central trend line and confidence interval trend lines selected from an ordered list of the slopes using the upper and lower bounds calculations. The degree average and the global clustering coefficient have the steepest slopes, indicating that those values converged with the real-world metrics at a faster rate. The mean path length and the network density had the two shallowest slopes, indicating a slower convergence.

Consistent with the Mann-Kendall test results, the network density had a zero slope, meaning that there is no monotonic trend. The network density is the number of actual links divided by the number of potential links. A complete network would have a density of one.

7.5 Network formation via the Iterated Prisoners’ Dilemma

A NetLogo model was developed to select Iterated Prisoners’ Dilemma (IPD) strategies based on MBTI and Kiersey temperaments. The implemented IPD strategies include Always Cooperate, Always Defect, Tit for Tat, Pavlov, Grudge, and Random

(Axelrod 1984) (Beaufils 1997) (Mittal 2009) (Nowak 1993). The first two strategies are self-explanatory; an agent following those strategies will always make the same decision.

The implementation used previous agent interactions to support the Tit for Tat, Pavlov, and Grudge strategies. An agent following a Tit for Tat strategy considers the decision made by a particular agent during their last interaction and makes that same decision in the current interaction. Pavlov is a win-stay, lose-shift strategy. If the decisions of the two agents matched during their last interaction, then make the same decision during the current interaction. If the decisions were different during the last interaction, then make a

75 different decision than one made during the last interaction. The Grudge strategy cooperates until the other agent defects, then the agent will never cooperate with that particular agent again.

A simplified approach to personality compatibility was based on (Varner 2007), wherein two MBTI personality types are compatible if the second letter is the same and the last two letters are different as in this list:

 _STJ are compatible with _SFP

 _SFJ are compatible with _STP

 _NTJ are compatible with _NFP

 _NFJ are compatible with _NTP

In (Keirsey 1998), the 16 MBTI personality types are mapped to four temperaments: Artisan, Guardian, Idealist, and Rational. Applying these temperaments, the observations in (Varner 2007) could be expressed as Artisans are compatible with

Guardians and Idealists are Compatible with Rationals.

Research indicates a relationship between programming performance and the

MBTI factors. Correlations between MBTI and performance were analyzed for sample populations of 37 and 114 introductory computer programming students at two universities. According to Katira et al., "The results revealed that sensing and judging students performed better than intuitive and perceptive students respectively on programming assignments." (Katira 2004). Research conducted by Katira et al. and

76

Williams et al. on pair programming found empirical evidence to support the adage that opposites attract. Pair programming teams comprised of different personality types tended to be compatible (Katira 2004) (Williams 2006). If Judging students perform better at programming than Perceiving students, then it makes sense to pair them. From this perspective, the compatible personality types identified in (Varner 2007) are consistent with the compatible pair programming team evidence presented in

(Katira 2004) (Williams 2006).

In the development of a NetLogo model for this research, each agent was assigned an MBTI personality type in accordance with the national statistics of the constituent factors as presented in Table 2.3. Each agent was assigned a Keirsey temperament as follows:

 Idealist if the MBTI personality type was in the set {INFJ, INFP, ENFJ, ENFP}

 Guardian if the MBTI personality type was in the set {ISTJ, ISFJ, ESTJ, ESFJ}

 Artisan if the MBTI personality type was in the set {ISTP, ISFP, ESTP, ESFP}

 Rational if the MBTI personality type was in the set {INTJ, INTP, ENTJ, ENTP}

The IPD strategies were assigned to the temperaments based upon Keirsey’s descriptions of the temperaments. Idealists and Guardians want to be helpful and loyal.

Rationals are logical and loyalty may not always suit the logic. Artisans are free spirits.

This list identifies the allocation of the IPD strategies to the temperaments.

 Always Cooperate was assigned to Guardians

77

 Always Defect was assigned to Artisans

 Pavlov was assigned to Guardians and Idealists

 Tit-For-Tat assigned to Idealists, and Rationals

 Grudge assigned to Artisans

 Random assigned to Rationals

Note that each temperament was assigned two strategies; a random number determined which strategy would be used by each agent during an interaction. The temperaments leaned towards one of their assigned strategies, it was not necessarily a

50% chance. For example, Guardians would select Always Cooperate in 80% of the interactions. Strategy selection occurs during initialization and agents use that same strategy in all interactions. User inputs to the NetLogo model include number of activities, minimum number per activity, and number of people, which determine interactions in nested loops. The outer loop repeats for the number of activities and the inner loop repeats for random number of times, which is a value between the specified minimum number per activity and the specified number of people.

78

Figure 7.7 Screenshot of the personality-based IPD strategy selection NetLogo model

Figure 7.7 depicts the user interface of the personality-based IPD strategy selection NetLogo model. Because Grudge and Always Defect prohibit link formation, the compatibility-sway slider bar was added to the user interface. During an interaction, if one of the agents defected, the code would check for compatibility of the two MBTI personality types. If the two MBTI personality types were compatible, the code would generate a random number and compare it to the value on the compatibility-sway slider bar. If the random number was less than the compatibility-sway value, then a link would

79 form between the two agents. If two agents had previously been found to be compatible, the link would remain if the next interaction determined not to form a link. If multiple interactions resulted in link formation the weight of the existing link was increased. Interactions among agents were conducted in a round-robin tournament, such that every agent had a turn interacting with every other agent; the simulation concluded at the end of the round-robin tournament.

7.6 Network formation using a personality compatibility table

Chapter 6 described the development of the MBTI personality compatibility table presented in Table A.1. This section explains how that table was used in NetLogo models that implemented a Monte Carlo search for optimum assignments of MBTI personality types to network nodes. Three NetLogo models were developed to determine the metrics of real-world exemplar networks, calculate averages of metrics for 30 randomly generated networks, and to calculate metrics for 30 networks formed using probabilities from a personality compatibility table and a Monte Carlo search for optimum assignments.

NetLogo includes a tool named Behavior Space, which provides the capability to create experiments with specified input data. Metrics for the real-world exemplar networks are stored in the Behavior Space experiments. Absolute differences between the averaged metrics from 30 randomly generated networks and exemplar network metrics are stored in the experiments; these stored values will be referred to as array R later in this section. Experiments were named after the real-world exemplar networks identified in Table 5.1. Input data included values for 17 of the metrics identified in

80

Table 2.2.8 NetLogo has a headless mode, which allows it to be executed without the graphical user interface. This headless mode enables the development of shell scripts that execute Behavior Space experiments within a NetLogo model. A Bourne Again Shell

(BASH) script was developed for each of the Behavior Space experiments. Within a computing cluster, located at the Alabama Supercomputer Center, a resource management system schedules these BASH scripts to start jobs that run a NetLogo model using the real-world metrics and the averaged metrics from 30 randomly generated networks.

Within the NetLogo model, a set of MBTI personality types were assigned to agents representing non-connected nodes. Each MBTI personality type was generated in accordance with the empirical distribution specified by Table 2.3. After the MBTI personality types were assigned, a main loop generated 30 networks using the network density of a real-world exemplar network and using the likelihoods of link formation from the personality compatibility table Table A.1. Metrics for the 30 networks were averaged, and the absolute differences between the mean metrics and the real-world exemplar network’s metrics were calculated and stored in an array named S for synthetic.

Values in array S are compared to values in array R. If there are a greater number of lower values in array S then the MBTI personality type assignment and associated averaged network metrics are stored in a file.

The Alabama Supercomputer Center operates clusters of computers and the

Simple Linux Utility for Resource Management (SLURM) system schedules jobs on

8 These NetLogo models did not include the metrics named Number of communities, Gini coefficient, and minimum eigencentrality; those metrics were included in research described in Chapter 10. 81 available processors within the clusters. Jobs were specified in Bourne Again Shell

(BASH) scripts, which started NetLogo in the headless mode with parameters that specified a model and the BehaviorSpace experiment. The NetLogo models ran until an assignment of MBTI personality types that produces networks that win on most of the metrics or the time in the job queue expired. In Appendix B, Tables B.1 through B.13 present the detailed results. Column T in those tables present the metrics for a real-world exemplar network.

Column S푥̅ presents the averaged metrics from 30 synthesized network. Column

|푇 − 푆푥̅| presents the absolute differences between the averaged synthesized network metrics and the metrics of the real-world exemplar network. Column R푥̅ presents the averaged metrics from 30 randomly generated networks. Column |푇 − 푅푥̅| presents the absolute differences between the averaged metrics from the 30 randomly generated networks and the metrics of the real-world exemplar network. Cells in the |푇 − 푆푥̅| column with a green background indicate that the synthesized networks performed better than the randomly generated networks. If the cell has an orange background, then the randomly generated networks performed better for that metric. Cells with a white background indicate that both the synthesized and randomly generated networks performed the same.

Table 7.8 presents a summary of the result from the Tables B.1 through B.13.

The three columns indicate the number of wins, losses, and draws. Wins indicate the number of metrics where the synthesized personality-based social networks were closer to the metrics of the real-world exemplar networks than the metrics of the randomly generated networks. 82

Table 7.8 Summary table for ABM & MC vs. random network generation

Social Network Exemplars Win Loss Draw Robins Australian Bank 11 4 2 Roethlisberger & Dickson Wire Room 7 9 1 Thurman Office 11 4 2 Sampson Monastery 5 10 2 Krackhardt Office CSS 6 10 1 Krackhardt High-Tech Managers 8 8 1 Schwimmer Taro Exchange 7 9 1 Webster Accounting Firm 8 7 2 Zachary Karate Club 9 7 1 Bernard & Killworth Technical 13 2 2 Bernard & Killworth Office network 9 5 3 Krebs Fortune 500 IT Dept. (Advice) 8 8 1 Krebs Fortune 500 IT Dept. (Business) 8 8 1 Lazega Law Firm 9 6 2 Metrics totals 119 97 22 Exemplar totals 7 4 3

7.7 Discussion of results from the Netlogo experiments

Figure 7.4 and Figure 7.6 present evidence that a closed-loop simulation of an incrementally updated social network implemented in an agent-based model can lead to an increasingly realistic network. Respectively, research questions 3.1 and 3.2 asked about events that should modify the social network in ABM and how ABM ought to use a personality compatibility table. A lesson from this research was that matching the network density goes a long way toward realism. Comparing the overall network density of a synthesized social work with the density of an exemplar network provides a stopping criterion; i.e., continue adding links until the two network densities are equal. Matching the network densities as a stopping criterion answers research question 3.3. Regarding research question 3.2, the NetLogo model used the personality compatibility to find link

83 formation likelihood values for pairs of personality types. Pairs of nodes were selected at random and their assigned MBTI personality types determined a row and column in the personality compatibility table. A randomly generated number between zero and one was compared to the link formation likelihood valued for the node pairs’ MBTI personality types. If the random number was less than or equal to the link formation likelihood value, then a link between the two nodes was created.

Another application of the personality compatibility table could involve assigning weights to existing links in a network, which could affect interactions among nodes. In previous work, a Netlogo model wherein links were determined by a Design Structure

Matrix (DSM) was developed. A DSM is a square table that identifies interfaces among subsystems within a system. Agents in that Netlogo model represented interacting teams that were developing subsystems (O’Neil 2013). That model could benefit from a personality compatibility table for use in assigning weights to the links in the network.

Those weights could be compared to randomly generated numbers to determine the outcomes of negotiations among the networked agents.

Results presented in Table 7.8 indicate that the Monte Carlo search for optimum

MBTI personality type assignments and networks synthesized from those assignments perform well when compared to randomly generated networks. Netlogo is a great at visualizing interactions among agents and generating basic plots. Excel was used for conducting the time series and producing the plots in this chapter. Experience gained from this research determined a need to conduct further research in a statistics-oriented programming language. The following chapters build upon the lesson learned from this experience by describing development activities to implement algorithms in R.

84

CHAPTER 8

STOCHASTIC BLOCK MODELING

Section 4.2.4 described Stochastic Block Modeling (SBM) and Chapter 6 described the method for constructing an MBTI personality type compatibility table.

This chapter explains how to customize the compatibility table so that it can be used as a look-up table known as a preference matrix by an SBM function.

Inputs to an SBM function implemented in the R igraph package includes n for the number of nodes, preference matrix, which is a look-up table of link formation probabilities among communities, known as blocks, and block sizes, which is an array of numbers that specify the number of nodes within each community (Csárdi 2006). Notice that these inputs do not include a network density or a degree distribution. The preference matrix encodes the network density by specifying the probability of link formation among the blocks. Probabilities on the main diagonal of the preference matrix represent the likelihood of a link forming within a block and probabilities elsewhere in the preference matrix indicate the likelihood of link formation among two different blocks. For each potential link, the SBM function conducts Bernoulli trials using the values in the preference matrix (Wasserman 1992) (Csárdi 2006).

How can the preference matrix encode the network density? When conducting

Bernoulli trials for potential links within the same block, the probability found in that block number on the main diagonal. When conducting Bernoulli trials for potential links

85 between two different blocks, the probability will be found in a cell at the intersection of the row identified by one block number and the column identified by the other block number. For example, between a pair of nodes in blocks two and three, the probability of link formation would be found in the cell located in (2, 3). For an undirected network, the preference matrix is symmetric; so, the same probability would be found in (3, 2).

Determining the block sizes involved the development of an R function, named makeMBTI, which generates an MBTI personality type in accordance with the empirical distribution defined by Table 2.3. This function was called n times where n was the number of nodes in a real-world exemplar network. A block size array was produced by counting each personality type, and the blocks were ordered in accordance with the columns of Table A.1.

Visual Basic for Applications (VBA) macros, in an Excel workbook, were developed to modify Table A.1 to produce preference matrices for the igraph sbm_game function. Preference matrices in Appendix C, presented in Table C.1 through Table C.14, were produced to synthesize social networks similar to the real- world exemplar networks listed in Table 5.1. An additional column in those tables present the block sizes that were used in the production of the preference matrices.

A VBA function named generate_estimated_degree_table wrote formulas into a spreadsheet that referenced block size values and values from the compatibility table.

Equation (8.1) defines the macro generated formulas in the spreadsheet, where B is a column of block sizes and C is the compatibility table. The generated formulas referenced spreadsheet cells that contained block sizes and compatibility table; thus, the formulas produced a new table of estimated number of links for each probability. 86

16 16 1 (퐵푖)(퐵푖 − 1)(퐶푖푗), 푖 = 푗 ∑ ∑ 푓(푥) 푓(푥) = {2 (8.1) 1 푖=1 푗=1 (퐵 )(퐵 )(퐶 ), 푖 ≠ 푗 2 푖 푗 푖푗

Figure 8.1 depicts the preference matrix generator spreadsheet. Spreadsheet cells

A4 through A19 contain the block sizes, which provide values for the variables Bi and Bj in the macro generated formulas. Spreadsheet cells C3 through R19 contain the generated preference matrix, which provides the values for the variable Cij in the macro generated formulas. Spreadsheet cells C23 through R38 contain the formulas produced by the generate_estimated_degree_table macro; values in those cells were calculated by

Figure 8.1 Screenshot of the preference matrix generator

87 the formulas using inputs from the block sizes and preference matrix. Buttons located in spreadsheet cells T24 and T26 enable increasing or decreasing values in the preference matrix.

Clicking the Increase button calls a VBA macro, which multiplies values in the cells located in the upper right triangle of the preference matrix by a value set in the spreadsheet cell U23. Formulas in the cells of the lower left triangle of the preference matrix copy the values from the upper right triangle so that the preference matrix is symmetric. A click of the Decrease button calls another VBA macro that divides the values in the upper right triangle by the value set in cell U23.

The spreadsheet cell T29 contains a formula that sums the values in cells C23 through R28 of the estimated degree table to produce an estimated number of links.

Interactively changing the value in cell U23 and clicking the buttons modifies the preference matrix, which updates the estimated number of links. When the estimated number of links matches the target number from a real-world exemplar network, the preference matrix is ready for use by the SBM function.

Network density in SBM generated networks are inherent in the block sizes and preference matrix. Using this feature of SBM, the Excel spreadsheet described in this chapter calculates the number of links for a given set of block sizes and the compatibility table. Based on a comparison of the calculated number of links with a goal number determined from an exemplar network, Excel macros adjust the compatibility table. This technique of preference matrix generation could be applied to any reasonable internally consistent compatibility table. The next chapter presents results from an SBM function using preference matrices produced with the method described in this chapter.

88

CHAPTER 9

RANDOMIZED METHODS FOR ASSIGNMENT SEARCH ALGORITHMS

This chapter introduces a new random network generator that uses an assignment of personality types and a personality compatibility table. Two sections of this chapter describe search algorithms based on the Monte Carlo method and a genetic algorithm find an optimum assignment of personality types, which become an input to the personality-based network generator. Chapter 8 described a method for customizing the

MBTI personality compatibility table to produce preference matrices for the Stochastic

Block Model. This chapter presents realism comparison results from real-world exemplar networks, networks generated with Erdős–Rényi random graph generator, networks generated from the personality-based network generator using personality assignments found using the search algorithms, and networks generated by the Stochastic

Block Model function.

9.1 The personality-based network synthesis algorithm

A personality assignment refers to a list of personality types for a given number of nodes in a network. In a personality compatibility table, rows and columns represent personality types and the intersecting cells contain values in the range {0 < 1}. The personality-based network synthesis algorithm is a variation of the Erdős–Rényi G(n, p) algorithm and referred to as the G(n, A, C) algorithm. Input parameters include n number of nodes, personality assignment A, and personality compatibility table C.

89

The G(n, A, C) algorithm is as follows:

1. Initialize node set V = {v1, v2, …, vn} and link set E = .

2. For each pair of nodes vi, vj  V, 1 ≤ i < j ≤ n,

randomly add link {vi, vj} to E with probability p(X, Y),

where X and Y are the personality types assigned to nodes vi and vj respectively in

personality assignment A and p(X, Y) is the probability of link formation between X

and Y from the personality compatibility table C.

Given an assignment A of personality types to nodes and a compatibility table C, as many synthetic social networks as needed can be generated using the G(n, A, C) algorithm. They will likely differ due to the randomness in the algorithm, but they will be related in that all were produced using the same assignment A and compatibility table

C. The challenge addressed by this research is to find a personality type assignment A which, when the G(n, A, C) algorithm is used with personality compatibility table C, will produce realistic synthetic social networks. A personality type assignment that produces realistic synthetic social networks will be referred to in this context as effective.

Two algorithms for finding effective personality type assignments were developed and tested in this research, Monte Carlo and Genetic. Both may be thought as search algorithms, in that they search for an effective personality type assignment, albeit by somewhat different methods. A previously existing third algorithm, Stochastic Block

Model (SBM), was adapted to use personality compatibility information. The algorithms are initially given a real-world social network T, which serves as an exemplar of the class of social networks to be generated.

90

1. MT  network metrics for exemplar network T 2. for j  1 to 40 3. Rj  random social network generated using n, p, and G(n, p) algorithm 4. Mj  network metrics for Rj 5. end for

6. 푀R  means of network metrics Mj for j = 1, 2, ..., 40

Figure 9.1. Pseudocode for random social network generation algorithm

The algorithms are compared, in terms of the realism of the synthetic social networks they generate, with randomly generated social networks and with each other.

Given an exemplar social network T, the number of nodes n, the number of links m, and the network density p = m / (n ∙ (n – 1) / 2) are calculated. The three metrics from the exemplar network are used to parameterize the random, Monte Carlo, and Genetic algorithms. Figure 9.1 presents the pseudocode for randomly generating social networks.

It uses the G(n, p) algorithm. In this pseudocode, as well as the pseudocode shown later for the Monte Carlo and Genetic algorithms, the various M variables are not scalars but vectors with 18 elements, for 18 of the 20 metrics9 listed in Table 2.2. Note the ostensibly random social networks generated by this process have some inherent realism concerning the exemplar network T in that the number of nodes n and network density p from T are used by the G(n, p) algorithm. This makes the random social networks generated more likely to be similar to T than a completely random network.

9 Two metrics, the Gini coefficient and the Number of communities were added during the research described in Chapter 10. 91

9.2 Monte Carlo Search for an optimum personality assignment

The Monte Carlo approach is used to search for an effective assignment of personality types for a synthetic social network in a straightforward manner. The Monte

Carlo (MC) algorithm, given a real-world exemplar social network T, searches for a personality type assignment that is effective for creating synthetic social networks of the same class as T by repeatedly generating random personality type assignments and evaluating their effectiveness. For each randomly generated personality type assignment, a sample of 40 synthetic social networks is generated using the G(n, A, C) algorithm.

The realism of those 40 networks, and thus the effectiveness of the specific personality type assignment from which they were generated, is then assessed by measuring the sample networks’ similarity to the exemplar network using the network metrics in

Table 2.2. The MC process (produce a personality type assignment randomly, generate a sample of 40 networks from it, and evaluate those networks’ realism) repeats until the execution time has reached a predetermined time limit. The personality type assignment that was most effective, and produced the most realistic networks, is then returned.

Figure 9.2 presents the pseudocode for the algorithm. Line 9 in the pseudocode requires some explanation. Recall that the M variables are vectors. Thus, the operations in line 9 should be understood as vector operations. MT – 푀S denotes a vector containing the element-by-element difference between MT and 푀S, which are themselves vectors,

| MT – 푀S | denotes a vector containing the absolute values of those differences, and

| MT – 푀S | < | MT – 푀R | denotes the count of elements of the first vector that are less than their corresponding element of the second vector.

92

1. Zmax  –∞ 2. while execution time < time limit 3. A  random assignment of personality types to n nodes per Table 2.3 (a) 4. for j  1 to 40 5. Sj  synthetic social network generated using A and G(n, A, C) algorithm 6. Mj  network metrics for Sj 7. end for

8. 푀S  means of network metrics Mj for j = 1, 2, ..., 40

9. Z  | MT – 푀S | < | MT – 푀R | 10. if Z > Zmax then 11. Zmax  Z 12. Amax  A 13. end if 14. end while 15. output Amax

Figure 9.2 Pseudocode for Monte Carlo social network generation algorithm

The result Z is the number of network metrics for which the mean values for that metric of the synthetic social networks produced by the Monte Carlo process was closer to the exemplar network T than the mean of the values for that metric of the random social networks where Z is the measure of performance for the two search algorithms.

The pseudocode presented in Figure 9.2 is generic enough to work with any reasonable and internally consistent personality compatibility table.

9.3 Genetic algorithm search for an optimum personality assignment

Classic textbooks about genetic algorithms include (Holland 1975), (Davis 1991),

(Koza 1994), and (Whitley 1994); these books provide an accessible introduction to the theory of why genetic algorithms work. (Konak 2006) compares several types of multi- objective genetic algorithms and provides high-level pseudocode. Briefly, genetic algorithms begin with a large population of randomly generated encoded potential solutions to the problem of interest, known as chromosomes. Chromosomes are

93 concatenations of individual segments of the encoded solutions, referred to as genes.

Each chromosome (each encoded solution) is evaluated against an objective function, which measures the quality of the solution it encodes.

A new generation of chromosomes is produced by selecting a portion of the current generation’s chromosomes, usually those with the best evaluations, and then exchanging genes (solution components) among pairs of the chromosomes (solutions).

In each generation, a portion of the chromosomes may mutate by randomly changing or substituting genes within them. In genetic algorithm terms, a chromosome is a list of

MBTI personality types, one per node in the network, and the personality types assigned to each node are the genes.

The genetic algorithm, given a real-world exemplar social network T, searches for a personality type assignment that is effective for creating synthetic social networks of the same class as T by using a genetic process to generate increasingly more effective personality type assignments. For each genetically generated personality type assignment, a sample of 40 synthetic social networks is generated using the G(n, A, C) algorithm. The realism of the networks, that is the effectiveness of the personality type assignment that produced the networks, is then assessed by measuring the sample networks’ similarity to the exemplar network based on metrics in Table 2.2.

The genetic process (produce a personality type assignment using a genetic algorithm, generate a sample of 40 networks from it, and evaluate those networks’ realism) repeats until the execution time has reached a predetermined time limit. The personality type assignment that was most effective (those that produced the most

94 realistic networks), is then returned. The two best assignments serve as the pair of chromosomes that spawn the next generation. Pseudocode for the Genetic algorithm is shown in Figure 9.3. Lines 2–18 produce the first generation of 40 assignments of personality types to nodes. In the first generation, the assignments are random.

Assignments are evaluated by generating 40 synthetic social networks from each assignment using the G(n, A, C) algorithm and calculating the metrics for those networks.

From the first generation, lines 11–26 iteratively produce more effective assignments by applying conventional genetic algorithm methods (crossover and mutation) to the two most effective assignments from the previous generation. Note the similarity of line 9 in

Figure 9.2 and Figure 9.3. The pseudocode in Figure 9.3 could work with any reasonable and internally consistent compatibility table.

The population size used for the GA deserves some explanation. Through experimentation, Alander found an optimum population size Sopt for genetic algorithms, in general, is a value in the range log2 N ≤ Sopt ≤ 2 log2 N, where N is the number of valid gene combinations (Alander 1992). In the GA algorithm the genes are an MBTI type, and the number of genes is the same as the number of nodes in the network. The smallest exemplar real-world social network has 11 nodes, and the largest has 71 nodes. Thus, the

16 smallest Sopt for the smallest exemplar, according to Alander, would be log2 11 = 55.

However, a population size of 20 was used to save execution time and to test whether effective personality type assignments could be found with smaller populations.

95

1. i  1 2. for j  1 to 40 3. Ai,j  random assignment of personality types to n nodes per Table 2.3 (a) 4. for k  1 to 40 5. Si,j,k  synthetic social network generated using Ai,j and G(n, A, C) algorithm 6. Mi,j,k  network metrics for Si,j,k 7. end for

8. 푀S  means of network metrics Mi,j,k for k = 1, 2, ..., 40

9. Zi,j  | MT – 푀S | < | MT – 푀R | 10. end for 11. while execution time < time limit 12. i  i + 1 13. Amax  Ai,1  assignment A(i – 1),x with highest realism score Z(i – 1),x 14. Ai,2  assignment A(i – 1),y with second highest realism score Z(i – 1),y 15. for j  3 to 40 16. cross  one of {single point, two point, crisscross, alternate}, selected cyclically 17. Ai,j  cross(Ai,1, Ai,2) # Produce an assignment of n personality types. 18. if random number < mutation probability then 19. m  value in the interval [0, n / 2], selected randomly 20. Ai,j  Ai,j with m personality types replaced by randomly selected personality types 21. end if 22. end for 23. for j  1 to 40 24. execute lines 4–9 for Ai,j 25. end for 26. end while 27. output Amax

Figure 9.3 Pseudocode for Genetic social network generation algorithm.

9.4 R code libraries

The two synthetic social network generation algorithms (MC and GA), as well as all supporting functions, were implemented in the R language. Although R first appeared in the 1990s, the use of R has recently grown substantially with the increased interest in data analytics, for which it is well suited. R is an open-source programming language and environment with powerful and extensive features for data analysis, data visualization, and statistical computing. R also includes a full range of general-purpose

96 programming language features, including control structures, mathematical operations, and file input/output. Statistical functions include general linear models, nonlinear regression models, time series analysis, classical parametric and nonparametric tests, clustering, and smoothing (R Core Team 2001). Approximately 9,900 packages of pre- written and freely available R code are stored in the Comprehensive R Archive Network

(CRAN) repository (CRAN 2017).

9.5 Stochastic block model preference matrices based on personality compatibility

The third randomized algorithm used to generate synthetic social networks, the

SBM, differs from MC and GA in two ways. First, it is not a new algorithm as the others are, but an existing social network generation algorithm which was adapted in this research to use personality compatibility. Second, it does not explicitly search for an effective personality type assignment in the manner of the MC and GA algorithms. A key feature of SBM is a B  B (where B is the number of blocks) preference matrix W that specifies the probabilities of link formation between nodes both within a block (intra- block links) and between blocks (inter-block links). Chapter 8 described a preference matrix generator, which customized the MBTI compatibility table for a given network.

There were 16 blocks, one for each MBTI type. Nodes were assigned to the blocks based on their personality types; that was done stochastically, using the U. S. population personality type frequencies given in Table 2.3. Considering the stochastic nature of this process, it is possible that one or more of the blocks can equal zero. A customized MBTI personality compatibility table was used as the preference matrix W. The on-diagonal entries in the personality compatibility table thus specify the probabilities of links forming between nodes in the same block, such as nodes with the same personality type,

97 and the off-diagonal entries specify the probabilities of links forming between nodes in different blocks, such as nodes with different personality types.

9.6 Implementation of the MC and GA based search algorithms

The implementation consists of five primary modules. Those modules and their functions are:

 Calculate metric values (MV). Given a social network (denoted R, S, or T), calculate

and return the metric values (MR, MS, or MT respectively) listed in Table 2.2 for the

given network.

 Generate random network (RN). Given a number of nodes n and a network density p,

generate a random network using the Erdős-Rényi G(n, p) algorithm.

 Generate synthetic network (SN). Given a number of nodes n, a personality type

assignment A containing a personality type for each of the n nodes, and a

compatibility table C that includes the personality types in A, generate a synthetic

social network using the G(n, A, C) algorithm.

 Monte Carlo search algorithm (MC). Given an exemplar network T and network

metric values MT for T, use the MC algorithm in Figure 9.2 to find a personality type

assignment that is effective with respect to T.

 Genetic algorithm search algorithm (GA). Given an exemplar network T and network

metric values MT for T, use the GA algorithm in Figure 9.3 to find a personality type

assignment that is effective with respect to T.

98

Figure 9.4 (a) and (b) are Unified Modeling Language (UML) sequence diagrams that summarize the interactions between these modules. Figure 9.4 (a) shows the process of generating random social networks. The user provides an exemplar social network T.

Module MV calculates network metrics MT for T. Those metrics, in particular the number of nodes n and network density p, are used by module RN to generate 40 random networks R1, R2, …, R40 using the G(n, p) algorithm. Finally, module MV calculates the means of the metric values, denoted 푀R, from the random networks R1, R2, …, R40.

Figure 9.4 (b) shows both the MC and GA processes. The user provides an exemplar social network T. Module MV calculates network metrics MT for T. Those metrics, in particular the number of nodes n, are used by either module MC or GA to generate a personality type assignment A. Personality type assignment A is passed to module SN, which produces synthetic social networks S1, S2, …, S40. Module MV calculates the means of the metric values, denoted 푀S, from the synthetic networks S1,

S2, …, S40. Those metric values are used by either module MC or GA to improve personality assignment A, as detailed in Figure 9.2 and Figure 9.3 respectively. The sequence of generating a personality type assignment A, using it to generate 40 synthetic networks, calculating those networks’ mean metric values, and using mean metric values to improve A, repeats until the time allotted for the run has expired. The resulting effective personality type assignment Amax is once more passed to module SN, which generates the final synthetic networks S1, S2, …, S40. Module MV calculates the means of the metric values, denoted 푀S, from the synthetic networks S1, S2, …, S40. Note that for both the MC and GA processes, the synthetic networks generated and evaluated during the process of finding an effective personality type assignment are not retained.

99

(a)

(b)

Figure 9.4 Implementation sequence diagrams.

100

9.7 The stochastic block model function

The previous chapter explained how preference matrices were produced for an

SBM function. As already mentioned, SBM is an existing algorithm for generating synthetic social networks. A prior implementation of SBM in the R language is available in the R igraph package, which is a collection of R functions for network analysis and visualization (Csárdi 2013). The SBM network generation function sample_sbm within the igraph package was used without modification. As Chapter 8 explained, the

MBTI personality compatibility table was customized to produce preference matrices for each of the real-world exemplar networks. An input parameter of the SBM function identified the preference matrix; another input parameter specified the number of each

MBTI personality type as blocks.

9.8 Execution of the personality type assignment search algorithms

Because R is an interpreted language, R programs often execute more slowly than comparable programs written in a compiled language. Also, two of the synthetic social network generation algorithms (MC and GA) involve numerous iterations of randomized processes. For the 14 real-world social networks used as exemplars, the MC based search algorithm performed a total 17,541 iterations (each iteration consisting of a personality type assignment and 40 synthetic social networks generated from it), the GA based search algorithm performed a total of 9,964 iterations, and the added SBM algorithm performed

1 iteration, for a total 17,541 + 9,964 + 1 = 27,506 personality type assignments and

(17,541 + 9,964 + 1) · 40 = synthetic social networks.

Consequently, the algorithms’ run times during testing and analysis were sometimes quite lengthy. The programs, to keep the executions manageable, were run on

101 computing clusters provided and supported by the Alabama Supercomputer Center.

Typical run times for the two search algorithms were highly dependent on the number of nodes in the exemplar graph; for the MC algorithm, the run times ranged from ~1 minute to ~240 minutes and for the GA algorithm they range from ~1 minute to ~210 minutes.

Both algorithms’ implementations included execution time limits, as can be seen in

Figures 9.2 and 9.3. Although the algorithms’ implementation code was not parallelized, scripts were used to initialize and initiate multiple instances of the program to execute concurrently. Because it does not search for a personality type assignment, the SBM algorithm ran much more quickly, typically less than a minute.

9.9 Results from the randomized methods

This section reports the results of testing and comparing the two algorithms (MC and GA), first against a random social network generator and then against each other.

The comparisons included both quantitative measures of the generated social networks’ realism and the computational complexity of the algorithms.

9.9.1 Realism of the networks produced with randomized methods

Realism is measured by the absolute difference between the mean metrics of the synthetic networks and the network metrics of the exemplar real-world social network.

The metrics used to measure realism are listed in Table 2.2. A smaller absolute difference is preferred. Absolute differences between the metrics of the exemplar real- world social network and the mean metrics of the synthetic networks were calculated for random networks, networks generated by the MC algorithm, and networks generated by the GA algorithm.

102

Metrics from the Sampson Monastery real-world exemplar network, networks synthesized from assignments found by the MC and GA search algorithms, and networks generated by the SBM function using preference matrices, produced by customizing the

MBTI personality compatibility table, are presented in Table 9.1. Columns in Table 9.1 show:

1. Names of metrics

2. Metric values of the exemplar network.

3. Mean metric values from 30 networks produced by G(n, p) where n is the number of

nodes and p is the density of the exemplar network.

4. Absolute difference between the metric values of the exemplar network and the mean

values of metrics from the networks produced by the G(n, p) function.

5. Mean metric values from 30 networks produced with personality type assignments

found by the MC based search algorithm.

6. Absolute differences between the metric values of the exemplar network and the

mean metric values of networks produced with personality type assignments found by

the MC based search algorithm.

7. Mean metric values from 30 networks produced with personality type assignments

found by the GA based search algorithm.

103

8. Absolute differences between the metric values of the exemplar network and the

mean metric values of networks produced with personality type assignments found by

the GA based search algorithm.

9. Mean metric values of 30 networks produced by the SBM function a preference

matrix produced by customizing the MBTI personality compatibility table.

10. Absolute differences between the metric values of the exemplar network and the

mean metric values of the networks produced by the SBM function.

Green indicates that the synthetic networks’ mean metric value was closer to the exemplar than the random networks’ mean metric value, no color indicates that they were the same, and red indicates that the synthetic networks’ mean metric value was farther from the exemplar than the random networks’ mean metric value. Table 9.1 reports the realism results for the Sampson Monastery network only. For that exemplar network, both the MC and the GA algorithms produced more realistic synthetic social networks than random generation over most of the network metrics. Similar detailed results tables for the 14 real-world networks listed in Table 5.1 can be found in Appendix D. For all tables in Appendix D, a green background indicates values < |MT – 푀R| < values with a pink background.

.

104

Table 9.1 Realism results for the Sampson Monastery social network

|

S

0 –

T

0.004 1.600 0.050 0.010 0.178 0.318 0.009 0.004 0.065 0.251 0.000 0.002 0.107 0.175 0.090 0.275

12.211

M

|

S

18

Stochastic Block

0.228 1.050 0.278 4.733 1.775 0.271 0.281 2.032 7.971 0.030 0.022 0.587 2.175 3.090 3.725

42.600 25.411

|

S

0

T

0.004 0.000 0.050 0.000 0.000 0.230 0.000 0.009 0.094 0.003 8.063 0.001 0.003 0.084 0.275 0.144 0.125

M

|

Genetic

S

18

0.236 1.050 0.268 4.556 1.862 0.263 0.276 2.061 8.219 0.030 0.021 0.565 2.275 3.144 3.875

41.000 29.560

|

S

0

T

0.001 0.000 0.100 0.000 0.000 0.212 0.006 0.001 0.166 0.199 9.813 0.001 0.003 0.081 0.000 0.100 0.100

M

|

S

Monte Carlo

18

0.231 1.100 0.268 4.556 1.881 0.268 0.284 2.133 8.024 0.029 0.021 0.562 2.000 3.100 3.900

41.000 27.809

|

R

0

T

0.011 1.300 0.100 0.008 0.144 0.398 0.011 0.018 0.200 0.121 0.001 0.003 0.105 0.175 0.111 0.250

10.579

M

|

Random

R

18

0.243 1.100 0.259 4.411 1.695 0.252 0.267 2.168 8.343 0.029 0.021 0.585 2.175 3.111 3.750

39.700 27.044

T

1 2 4

18 41

M

0.232 0.268 4.556 2.093 0.262 0.285 1.967 8.222 0.030 0.024 0.481 3.000

37.623

Metrics

Nodes Ginicoefficient Cluster Links Components Network density degreeAverage Standard deviation degree Global coefficient clustering clusterAverage coefficient Mean path length betweennessAverage betweennessMaximum closenessAverage closenessMinimum eigencentralityAverage Network radius eccentricityAverage Network diameter

105

Table 9.2 Realism results summary.

Exemplar Real-World MC vs. R GA vs. R SBM vs. R Social Network MC R = GA R = SBM R = Robins Australian Bank 14 2 2 14 3 1 11 5 2 Roethlisberger & Dickson Wiring 11 6 1 10 7 1 4 12 2 Room Thurman Office 12 3 3 11 3 4 6 9 3 Sampson Monastery 13 1 4 13 2 3 9 7 2 Krackhardt Office CSS 5 9 4 6 8 4 2 14 2 Krackhardt High-Tech Managers 13 3 2 15 1 2 12 5 1 Schwimmer Taro Exchange 15 2 1 12 5 1 13 4 1 Webster Accounting Firm 14 0 4 13 1 4 9 5 4 Zachary Karate Club 17 0 1 14 3 1 10 6 2 Bernard & Killworth Technical 14 2 2 11 1 6 8 4 6 Bernard & Killworth Office 10 5 3 11 3 4 5 6 7 Krebs Fortune 500 IT Department 13 2 3 14 1 3 12 3 3 (Advice) Krebs Fortune 500 IT Department 9 3 6 9 2 7 5 8 5 (Business) Lazega Law Firm 9 3 6 11 2 5 10 3 5 Totals 169 41 42 164 42 46 116 91 45

Table 9.2 summarizes the overall realism results. Three realism comparisons were made: MC versus Random, GA versus Random, and SBM versus Random. All three are reported in the table. A total of 252 metric values (14 real-world social networks · 18 metrics) were measured for each of the comparisons. For the MC versus

Random comparison, 169 of the 252 values (~67%) were closer to the exemplar network for the MC algorithm. For the GA versus Random comparison, 164 of 252 values

(~65%) were closer to the exemplar network for the GA algorithm. For the SBM versus

R comparison, 116 of 252 values (~46%) were closer to the exemplar network for the

SBM algorithm.

106

To support the quantitative realism results at an intuitive level, Figure 9.5 presents an example visual comparison of a real-world social network with a randomly generated network and two networks that were generated using a personality compatibility table.

Figure 9.5 (a) shows the real-world Robins Australian Bank exemplar social network

(Pattison 2000). Figure 9.5 (b) shows a network that was generated using the random

G(n, p) algorithm. The random social network has the same number of nodes and density as the exemplar real-world social network. Figure 9.5 (c) shows a synthetic social network generated by the G(n, A, C) algorithm using an assignment of personality types found by the GA based search algorithm. Figure 9.5 (d) shows a synthetic social network generated by the G(n, A, C) algorithm using an assignment of personality types found by the MC based search algorithm. In the figure, node communities found by the walktrap.community function in the R igraph package are depicted with bounding boxes around them. A visual inspection of the networks in Figure 9.5 reveals what subjectively appear to be more realistic communities within Figure 9.5 (c) Figure

9.5 (d) (MC and GA respectively) than in Figure 9.5 (b) (random). For example, note the complex structures in the pink colored boundaries of Figure 9.5 (c) and Figure 9.5 (d) and the contrastingly simple structure in the pink colored boundary within Figure 9.5 (b).

107

(a) Robins Australian Bank social (b) Randomly generated social network network

(c) Synthetic social network with (d) Synthetic social network with personality type assignment found by MC personality type assignment found by GA algorithm algorithm Figure 9.5 Comparison of the real-world and synthetic social networks.

9.9.2 Computational efficiency

In addition to comparing the algorithms in terms of the realism of the synthetic social networks generated, they were compared in terms of computational efficiency.

Computational efficiency or complexity is not typically measured in terms of execution time, as that may vary depending on processor speed, system workload, implementation language, and programmer skill. Instead, it is measured in terms of “basic operations,” such as a fundamental unit of work that the algorithm or algorithms being measured repeatedly perform (Gersting 2014). As was explained in section 9.2 and can be seen in

108

Figure 9.4, the algorithms repeatedly call the module that calculates the metrics for a given social network. Thus, to compare the algorithms’ computational efficiency, the number of calls to that module was recorded. Each call—the calculation of the metrics for a single network—is referred to as an evaluation.

Table 9.3 Social network metric evaluations performed.

MC GA SBM Links Social network Nodes evaluations evaluations evaluations Robins Australian Bank 11 16 640 2,240 40 Roethlisberger & Dickson 14 13 37,280 800 40 Bank Wiring Room Thurman Office 15 33 6,360 5,120 40 Sampson Monastery 18 41 28,720 149,120 40 Krackhardt Office CSS 21 14 4,400 800 40 Krackhardt High-Tech 21 36 440 1,520 40 Managers Schwimmer Taro Exchange 22 39 7,480 800 40 Webster Accounting Firm 24 150 40 800 40 Zachary Karate Club 34 78 1,280 1,520 40 Bernard & Killworth 34 175 452,480 800 40 Technical Bernard & Killworth Office 40 238 161,560 232,640 40 Krebs Fortune 500 IT 56 203 840 800 40 Department (Advice) Krebs Fortune 500 IT 56 387 80 800 40 Department (Business) Lazega Law Firm 71 726 40 800 40 Total 701,640 398,560 560 Mean 50,117.14 28,468.57 40 Standard deviation 123,415.79 70,739.33 0.00

109

Table 9.3 presents the number of evaluations performed by the MC, GA, and

SBM algorithms. Note that the total and mean evaluations performed by the MC algorithm were almost double those of the GA algorithm. Also, the number of evaluations MC algorithm was less consistent across the 14 social networks, as indicated by the higher standard deviation. In a few cases, the MC algorithm found effective personality type assignments very quickly by happenstance, but in other cases, the MC algorithm required many iterations to find an effective assignment. The personality SBM algorithm used only one personality type assignment per exemplar real-world network and generated 40 synthetic social networks from it. Thus, there were a constant 40 evaluations per exemplar. Finally, the metrics were also evaluated once for the real- world network and once for each of the 40 random Erdős-Rényi networks used for the realism comparisons; those evaluations are not charged to any of the three algorithms and are not included in Table 9.3.

In addition to the empirical measure of computational efficiency provided by the evaluation counts just reported, we also analyze the algorithms’ theoretical computational complexity. Both the MC and the GA process as many iterations (such as personality type assignments) as possible within a user-set execution time limit. Because that time limit may vary, we analyze the theoretical, computational complexity of each algorithm for a single iteration (a single personality type assignment). For each iteration, the input size n is the number of nodes in the exemplar real-world network. As is conventional, the multiplicative constants are omitted in this analysis. We first consider the MC algorithm.

Referring to the line numbers in the algorithm’s pseudocode (Figure 9.2) the random assignment of personality types in line 3 is O(n). The generation of each of the synthetic

110 social networks in lines 4–7 is O(n2) because the G(n, A, C) algorithm constructs an n  n adjacency matrix. The complexity of calculating the different network metrics in line 8 varies from O(1) to O(n3), depending on the metric. Finally, the record-keeping in lines

9–13 is O(1). Thus, the Monte Carlo algorithm’s overall time complexity per personality type assignment is O(n3).

The analysis of the GA is similar. Referring to the line numbers in the algorithm’s pseudocode (Figure 9.3), the random assignment of personality types in line

3 is O(n). The generation of each of the synthetic social networks in lines 4–7 is O(n2).

The complexity of calculating the network metrics in line 8 is O(n3). The record keeping in lines 13-14 is O(1), and the crossover and mutation in lines 17–20 are O(n). The generation of the synthetic social networks and the calculation of metrics in lines 23–25 are again O(n2) and O(n3) respectively. Thus, the GA’s overall time complexity per personality type assignment is also O(n3).

Because the SBM algorithm is an existing algorithm and was not developed or implemented as part of this work, its complexity is not analyzed here; it is reported as

O(n2) in (Csárdi 2013). Because the same network metrics were calculated for the SBM- generated networks as for the MC and GA, those metric calculations would again be

O(n3). The fact that the metric calculations dominate the theoretical time complexity of the algorithms explains why we chose to count metric evaluations in the preceding empirical comparison of the algorithms’ computational expense.

111

9.10 Discussion of the randomized methods

Considering the qualitative and quantitative results, it is evident that both the MC and the GA algorithms generate more realistic synthetic social networks than random graph generation. The MC and GA algorithms are approximately comparable in terms of realism. However, the GA algorithm is much more computationally efficient, requiring substantially fewer network metric evaluations. The SBM required the fewest evaluations but it did not perform as well as the MC and GA algorithms; also, it required customization of the MBTI personality compatibility table to produce preference matrices that were specific to each network.

Both the MC and GA algorithms used the G(n, A, C) algorithm to actually synthesize social networks, i.e., in both cases, links are added to a synthetic network during the generation process stochastically based on the probabilities in a personality compatibility table C. The MC and GA algorithms differ from each other in only the method employed to find an effective personality type assignment A. With the exception of SBM, most prior algorithms do not consider the attributes of the nodes when adding links; instead, they utilize structural characteristics of the exemplar network in the synthetic networks. For example, the Configuration Model is initialized with both the number of nodes n and a specific degree sequence K = {k1, k2, …, kn} which may be the actual degree sequence of the real-world network serving as an exemplar (Newman

2003). In contrast, the MC and GA algorithms use a general purpose personality compatibility table that determines link formation likelihood based on node attributes and applies multi-objective optimization to produce an assignment of the node attributes to generate realistic networks.

112

CHAPTER 10

HEURISTIC METHODS FOR ASSIGNMENT SEARCH ALGORITHMS

This chapter explains three new algorithms. The first algorithm generates synthetic social networks using personality compatibility and features triadic closure.

The second and third algorithms find effective personality type assignments using heuristic methods, in contrast to the randomized methods described in the previous chapter. Also in contrast to the MC and GA algorithms of the previous chapter, which were straightforward adaptations of known algorithms to the social network synthesis problem, the algorithms described in this chapter are new and were developed as part of this research.

10.1 Evaluating network realism

As described earlier, the realism of synthetic social networks is evaluated by calculating the selected network metrics listed in Table 2.2 and comparing those metrics’ values to those of a real-world exemplar social network of the same class to which the synthetic social networks are meant to belong. The realism of the synthetic social networks generated by the algorithms described in this chapter were compared to networks generated by the Configuration Model. The Configuration Model is an existing network generation algorithm that is widely used and considered to be very effective; therefore, it is often used as a basis of comparison (Behrendt 2014). Because of the random elements in the algorithms, the comparisons were not based on the metrics of a

113 single network but on mean metric values calculated from sample sets of networks generated by each of the algorithms.

The network metrics and used to evaluate realism are listed in Table 2.2. A function implemented in R uses the igraph package to calculate the metrics. To establish truth data, the metrics are determined for a real-world exemplar network. Using the Configuration Model, 30 networks are generated, and the metrics are calculated for each of those networks and stored in a column of a matrix. Another 30 networks are generated with the new method, and calculated metrics for those networks are stored as columns in another matrix. Row averages are calculated for the two matrices. Absolute differences between the truth and the two sets of averaged metrics are calculated and stored in a matrix for a report. That report is saved as a Comma Separated Value file and opened in Excel. Macros in Excel compare the columns of the report to identify the smaller absolute differences in each row. For each exemplar network, scores are calculated by counting the number of wins, losses, and draws. A win means the new algorithm had the lowest difference for a metric, a loss means the Configuration Model produced networks the lowest absolute difference for a metric, and a draw means both methods produced the same absolute difference value.

10.2 Synthesis process overview

The process starts with a real-world social network T, which serves as an exemplar of the class of social networks to be generated. Table 5.1 lists 14 real-world social networks that were used in this research. Network T is input to three different algorithms. The two new algorithms developed in this work, Probability Search (PS) and

Compatibility-Degree Matching (CDM), each construct an assignment A of personality

114 types to the nodes of T. Both employ heuristic methods to find A, albeit in completely different ways. The resulting personality type assignment A is then input to a network generator algorithm (GNAC), which generates a set of synthetic social networks (denoted

P for the PS algorithm or M for the CDM algorithm), using the personalities in A and the compatibility information in personality compatibility table C. As before, the goal is to find a personality type assignment A which, when the GNAC algorithm is used with personality compatibility table C, will produce realistic synthetic social networks. A personality type assignment that produces realistic synthetic social networks will again be referred to in this context as effective.

Real-world exemplar network T

PS CDM

Personality type assignment A

GNAC CM

Personality-based Structure-based synthetic social networks synthetic social networks F P, M

Calculate metrics and compare to exemplar T

Results Figure 10.1 Process overview

115

A real-world exemplar network T will also be input to a standard network generation algorithm, the Configuration Model (CM). The CM also generates a set F of synthetic social networks based only on the structure of T and without using any personality compatibility information. The three sets of synthetic networks are then input to a process that calculates the network metrics listed in Table 2.2 and compares them to the exemplar T. Figure 10.1 presents an overview of this process.

10.3 Synthesizing networks from an assignment of personality types

Synthetic social networks are generated by an algorithm that considers personality compatibility by using a personality compatibility table C (e.g., Table A.1 or Table A.2) and a personality assignment A to the nodes of the network. The network generation algorithm is denoted the G(n, A, C) (GNAC) algorithm, where n is the number of nodes,

A is an assignment of personality types to the n nodes, and C is a personality compatibility table that includes the personality types in A. The GNAC algorithm described in this section is similar to, but more sophisticated than, the G(n, A, C) algorithm described in section 9.1. Given an assignment A of personality types to nodes and a compatibility table C, as many synthetic social networks as needed can be generated using the GNAC algorithm. They will likely differ due to the randomness in the algorithm, but they will be related in that all were produced using the same assignment A and compatibility table C.

The GNAC algorithm first determines the degree sequence of an exemplar network T. The degree sequence is used to initialize a link budget for each of the nodes in the synthetic network. The algorithm randomly selects two triads of nodes in the synthetic network as candidates for triangles. The personality types assigned to the

116 triads’ nodes by A and the personality compatibility table C are used to find the probability of link formation between each pair of nodes in the triads, and the probabilities for each triad are summed. The triad with the larger sum is then converted into a triangle by connecting all unlinked pairs in the triad, and the link budgets of any newly linked nodes are decremented. This procedure repeats until the number of triangles in the synthetic network is the same as the number of triangles in the exemplar network. Figure 10.2 presents a flowchart of the algorithm.

Start

Set parameters

Make triangles

Reached Goal or Limit? No

Link nodes with budgets Have all dyads with link budget No been linked?

Link nodes without budgets Is total link budget equal to No zero?

End

Figure 10.2 Flowchart for GNAC

117

Set parameters is the first box in the flow chart in Figure 10.2. With N, the GNAC algorithm sets parameters for a link budget, a goal for number of triangles, and a total number of iterations limit. The R packages igraph and sna provide functions that determine the degree sequence and conduct a Davis and Leinhardt triad census, respectively (Butts 2016) (Csárdi 2006). The number of triangles found in the census sets the goal. A degree sequence establishes a link budget for the nodes. A new network comprised only of nodes is synthesized. With the A, the GNAC assigns the specified personality types to the nodes.

Make triangles is the second box in the flow chart. This part of the algorithm randomly selects two triads of nodes. The two triads are compared by consulting a personality compatibility table to determine link formation likelihood values of the three dyads within the triads. The algorithm selects the triad that has the most compatible dyads then it steps through each pair of that triad. If the pair is not already connected a link is formed between those nodes and the link budget for the pair of nodes is decremented. This process continues until the triangle goal is met or the limit is reached.

Link nodes with budgets is the third box in the flow chart. Dyads of nodes with remaining link budgets are randomly selected and linked if they are not already connected. When GNAC forms a link, link budgets for the end nodes are decremented.

After few iterations, potential pairs of unconnected nodes with link budgets have been exhausted but a few nodes may still have remaining link budgets.

Link nodes without budgets is the last box in the flow chart. At this point in the algorithm, GNAC steps through the nodes that have remaining link budgets and

118 randomly selects other nodes in the network. If the pair of nodes are not already connected, GNAC generates a random number and consults the personality compatibility table. If the random number is less than the propensity value for that pair of personalities, then GNAC forms the link and decrements the budgets. When the total link budget reaches zero, the algorithm is finished.

Producing the desired number of triangles typically does not completely deplete the link budgets of all the nodes. For the nodes with remaining link budgets, the algorithm randomly selects pairs of those nodes. If the pair is not linked, then a link is formed and the nodes’ link budgets are decremented. If a pair of non-linked nodes with remaining link budgets cannot be found, then the algorithm randomly selects nodes that have no remaining link budget. If the randomly selected node and a node needing a neighbor are not connected, the algorithm randomly adds a link between the nodes with a probability determined by the nodes’ assigned personality types and the compatibility table C. The process repeats until the sum of all nodes’ remaining link budgets is zero, at which point the synthetic social network is returned.

Figure 10.3 presents pseudocode the addlink function, which connects nodes u and v if they are not already connected; this function is used by the GNAC algorithm pseudocode presented in Figure 10.4, which follows the flowchart presented in Figure

10.2. In Figure 10.4, T is an exemplar network, A is a personality assignment, C is a compatibility table, S = (V, E) is a synthetic network, and u and v are nodes in the network.

119

1. function addlink(S, u, v, db, C, comp) 2. if {u, v}  E 3. i  personality type in A for node u 4. j  personality type in A for node v 5. if (comp = true and random number ≤ C[i, j]) or (comp = false) 6. E  E ∪ {u, v} # Add link between u and v 7. db[u]  db[u] – 1 # Decrement link budget for u 8. db[v]  db[v] – 1 # Decrement link budget for v 9. end if 10. end if 11. return S 12. end function

Figure 10.3 Pseudocode for the addlink function

How complex is the GNAC algorithm? The GNAC algorithm starts with some housekeeping that includes an O(n log n) sort of the nodes’ degree sequence (line 3). The first main loop (lines 7-25) is over the triangles of T. A network with n nodes may have as many as C(n, 3) triangles; C(n, 3) = n! / (3!(n – 3)!)  O(n3). Within that loop, the do until loop (lines 8-11) may execute an arbitrary number of times, but on average is O(1).

The set membership tests (line 19) are O(1), if the edge set is stored in a suitable data structure, such as an adjacency matrix. The function addlink does not loop over nodes or links, so its complexity is O(1). Other computations in the first main loop are also O(1).

Thus, the second main loop is O(n3). Finding all potential dyads the first time (line 27) is

O(n2). Potentially, there could be C(n, 2) such dyads; C(n, 2) = n! / (2!(n – 2)!)  O(n2).

The second main loop (lines 28-33) iterates once for each of the O(n2) dyads, and in each iteration, it again finds all potential dyads O(n2), thus the second main loop is O(n4). The third and final main loop iterates at most once for each node, i.e., O(n) iterations. Each iteration scans O(n) nodes to find those with remaining link budgets, so the third main loop is O(n2). Thus, the complexity of the GNAC algorithm as a whole is O(n4).

120

1. function gnac(T, A, C) 2. ds  degree sequence of T 3. db  sort ds in decreasing degree order 4. b  nodes with db link budget ≥ 2 5. S  network with n nodes and 0 links 6. ntriangles  0 7. while | b | ≥ 4 and ntriangles < number of triangles in T 8. do 9. u1, v1, w1  nodes randomly selected from b without replacement 10. u2, v2, w2  nodes randomly selected from b without replacement 11. until {u1, v1, w1} ≠ {u2, v2, w2} 12. triad1.p  C[A[u1], A[v1]] + C[A[v1], A[w1]] + C[A[w1], A[u1]) 13. triad2.p  C[A[u2], A[v2]] + C[A[v2], A[w2]] + C[A[w2], A[u2]) 14. if triad1.p ≥ triad2.p 15. u, v, w  u1, v1, w1 16. else 17. u, v, w  u2, v2, w2 18. end if 19. if {u, v}  E or {v, w}  E or {w, u}  E # Is at least one link missing? 20. S  addlink(S, u, v, db, C, false) 21. S  addlink(S, v, w, db, C, false) 22. S  addlink(S, w, u, db, C, false) 23. ntriangles  ntriangles + 1 24. end if 25. end while 26. b  nodes with db link budget ≥ 1 27. potentialdyads  all pairs of nodes (u, v) where u, v  b, u ≠ v, and {u, v}  E 28. while | potentiadyads | ≥ 2 29. {u, v}  pair of nodes randomly selected from potentialdyads 30. S  addlink(S, u, v, db, C, false) 31. b  nodes with db link budget ≥ 1 32. potentialdyads  all node pairs (u, v) where u, v  b, u ≠ v, and {u, v}  E 33. end while 34. b  nodes with db link budget ≥ 1 35. pend  V – b 36. while | b | ≥ 1 37. u  b[1] 38. v  node randomly selected from pend 39. S  addlink(S, u, v, db, C, true) 40. b  nodes with db link budget ≥ 1 41. pend  V – b 42. end while 43. return S 44. end function

Figure 10.4 Pseudocode for the GNAC algorithm

121

10.4 Probability search algorithm

The Probability Search (PS) algorithm calculates the probability of the formation of a social network given a personality type assignment A and a personality compatibility table C. Depending upon whether the nodes are assumed to be distinguishable, there exists two approaches to the probability calculation. For this work, it is assumed that the nodes are uniquely identified and are thus always distinguishable from each other. This assumption is appropriate for many social network applications, where nodes correspond to specific known persons. The implication of uniquely identified nodes is that a different network, with the same connection structure (i.e., isomorphic in graph theory terminology) but connecting different specific nodes, would not be equivalent as a social network because different people would be connected.

The probability of the network will be calculated using a simple extension of the

Erdős-Rényi G(n, p) algorithm. In the G(n, p) algorithm, the probability of link formation p is constant for the entire network. In the PS algorithm’s probability calculation, the constant p is instead replaced for each pair of nodes with the probability of a link forming between those nodes, given a personality type assignment A and a personality compatibility table C. Let p(i, j) be the probability given in C of a link being present between two nodes i and j for the personality types assigned to nodes i and j by A.

The probability of a network G = (V, E) being formed is therefore given by

Equation (10.4.1); we will call this the network probability.

푝(푖, 푗) 푖푓 {푖, 푗} ∈ 퐸 푃(퐺) = ∏ { (10.4.1) 푖,푗 휖 푉,푖≠푗 1 − 푝(푖, 푗) 푖푓 {푖, 푗} 푛표푡 ∈ 퐸

122

Given an exemplar network T and a compatibility table C, the network probability can be used to search for the personality type assignment A that has the highest probability

P(T) of producing networks with metrics similar to the exemplar network. Once found, that personality type assignment can be used by the GNAC algorithm to generate synthetic networks that are likely to be similar to the exemplar network.

In theory, an optimal personality type assignment, i.e., the assignment that has the highest possible probability of producing the given exemplar network T, could be found by methodically generating every possible personality type assignment and calculating

P(T) for each one. Unfortunately, this approach is not practical for any but the smallest networks. If a personality type scheme has k different personality types and exemplar network T has n nodes, there are kn different possible type assignments. For the MBTI scheme used in this work, k = 16, thus for even the smallest real-world exemplar network used in this research, the Robins Australian Bank network with 11 nodes, there are

1611 ≈ 1.76 · 1013 possible personality type assignments. Calculating P(T) for that many assignments at the rate of one per millisecond would require over 500 years. Thus, an exhaustive search is impractical.

Instead, the new Probability Search (PS) algorithm performs a heuristic search through the space of possible personality type assignments. After generating an initial personality type assignment randomly, it iteratively changes the assignment, one node at a time. To do so, it uses node probability, a quantity similar to network probability, but calculated for a single node. Given a network G, a personality compatibility table C, and a personality type assignment A, the node probability of a single node i in G is given by

Equation (10.4.2).

123

푝(푖, 푗) 푖푓 {푖, 푗} ∈ 퐸 푃(푖) = ∏ { 1 − 푝(푖, 푗) 푖푓 {푖, 푗} 푛표푡 ∈ 퐸 (10.4.2) 푗 휖 푉,푖≠푗

At each iteration, the PS algorithm selects a node i, either the node with the smallest node probability P(i) under the current personality type assignment (with probability 0.95), or a random node (with probability 0.05). It then calculates P(i) for that node i for each of the possible personality types, holding the network structure and other nodes’ personality types fixed. The personality type that gives the highest node probability P(i) is assigned to node i. This process repeats until the overall network probability improvement achieved in an iteration is less than a threshold, subject to a required minimum number of iterations. Finally, to prevent non-productive repetitive changes to the same node’s personality type, when a node’s personality type is changed, it is added to a list of nodes excluded from adjustment in the next iteration and remains in that list for a certain number of iterations.

The improvement threshold, the minimum number of iterations, and the number of iterations a node remains on the excluded list are all parameters to the algorithm. For the results reported here, the values 0.0001, n · k · 1000, and n / 10 respectively were used for those parameters; those values were found empirically. Figures 10.3, 10.4, and

10.5 present pseudocode for the PS algorithm; V is a set of nodes, E is a set of links, C is a personality compatibility table, A is a personality type assignment, n is the number of nodes, and k is the number of different personality types. In the pseudocode, two subroutines (functions) precede the main logic of the PS algorithm.

124

1. function vprob(V, E, C, A, i) 2. result 1 3. for j  V – {i} 4. if {i, j}  E 5. result  result · C[A[i], A[j]] 6. else 7. result  result · (1 – C[A[i], A[j]]) 8. end if 9. end for 10. return result 11. end function

Figure 10.5 Pseudocode for the vertex probability function

1. function gprob(V, E, C, A) 2. result 1 3. for i 1 to n – 1 4. for j  i + 1 to n 5. if {i, j}  E 6. result  result · C[A[i], A[j]] 7. else 8. result  result · (1 – C[A[i], A[j]]) 9. end if 10. end for 11. end for 12. return result 13. end function

Figure 10.6 Pseudocode for the graph formation probability function

125

1. function ps(T, A, C) 2. excluded   3. prev  0 4. improvement  1 5. iterations  0 6. A  random assignment of personality types to n nodes 7. bestA  A 8. cur  gprob(V, E, C, A) 9. bestprob  cur 10. while (improvement > 0.0001) and (iterations < n · k · 1000) 11. available  V – excluded 12. if random number < 0.05 13. i  randomly selected node  available 14. else 15. lowv  available[1] 16. lowprob  vprob(V, E, C, lowv) 17. for i  available 18. iprob  vprob(V, E, C, i) 19. if iprob < lowprob 20. lowv  i 21. lowprob  iprob 22. end if 23. end for 24. end if 25. adjustv  lowv 26. hight  A[adjustv] 27. highprob  gprob(V, E, C, A) 28. for j  1 to k 29. A[adjust]  j 30. jprob  gprob(V, E, C, A) 31. if jprob > highprob 32. hight  j 33. highprob  jprob 34. end if 35. end for 36. A[adjust]  hight 37. excluded  excluded ∪ { adjustv } 38. if | excluded | > n / 10 39. excluded  excluded – node that has been excluded the longest 40. end if 41. prev  cur 42. cur  gprob(V, E, C, A) 43. improvement  cur – prev 44. iterations  iterations – 1 45. if cur > bestprob 46. bestA  A 47. bestprob  cur 48. end if 49. endwhile 50. return bestA

51. end function Figure 10.7 Pseudocode for the PS algorithm

126

The overall computational complexity of the PS algorithm is O(n3). To see this, consider first the functions vprob and gprob; vprob loops once over the n elements of V

(lines 3-9), and so is O(n), whereas gprob has two nested loops (lines 3-11), each over the n elements of V, and so is O(n2). The main body of PS begins with some O(1) housekeeping (lines 2-9) and an O(n2) call to gprob. The main loop (lines 10-49) executes O(n) times. Within the main loop, the search for the lowest probability vertex

(lines 15-23) begins with an O(n) call to vprob (line 16), then loops over the available nodes O(n) times; within that loop is an O(n) call to vprob, thus this portion of the main loop is O(n2).

Next, the search for the highest probability personality type (lines 25-35) calls gprob once, and then enters a while loop that iterates k times, each time calling gprob, which is O(n2). Because k is a constant and not a function of n, this portion of the loop is

O(n2). The last part of the while loop includes two operations on the excluded list (lines

37 and 39) which can be accomplished in amortized O(1) time if implemented as a deque, and another O(n2) call to gprob. Thus, the complexity of the main loop, and PS algorithm as a whole, is O(n3).

10.5 Compatibility-degree matching algorithm

The Compatibility-degree matching (CDM) algorithm first determines the degree sequence of a given exemplar network T, then it generates a personality type assignment

A in accordance with an empirical distribution based on the frequency of each personality type. If the CDM algorithm is implemented with the makeMBTI function then the empirical distribution is based on the MBTI personality types in the U. S. population

(Table 2.3).

127

1. function cdm(T, C) 2. ds  degree sequence of T 3. w  sort V in decreasing degree order using ds 4. for i  1 to k 5. x[i]  ∑ compatibility values in C for personality type i 6. end for 7. x  sort x in decreasing order 8. for i  1 to n 9. y[i]  random personality type generated using empirical distribution 10. end for 11. y  sort y in decreasing overall compatibility order using x 12. for i  1 to n 13. A[w[i]]  y[i] 14. end for 15. return A 16. end function

Figure 10.8 Pseudocode for the CDM algorithm

The columns of personality compatibility table C provides an overall compatibility of each personality type. The CDM algorithm ordered the personality types by overall compatibility and the nodes of the exemplar network T by decreasing order of degree.

Using those two orderings, the CDM algorithm organized personality types are assigned to the nodes such that the personality types with the highest overall compatibility are assigned to the nodes with the highest degree. In the pseudocode, personality type assignment A is a vector of size n.

The overall computational complexity of the CDM algorithm is O(n log n). The n nodes are sorted (line 3), which is O(n log n). The summing of the compatibility values

(lines 4-6) is O(k2), where k is the number of personality types, and the sort of the sums

(line 7) is O(k log k), but for most networks k << n. The assignment of personality types

(lines 8-10) is O(n) and the sort of the assigned types (line 11) is O(n log n). The final

128 loop (lines 12-14) is O(n). Thus, the complexity of the CDM algorithm as a whole is

O(n log n).

10.6 Configuration Model algorithm as a basis of comparison

To assess the effectiveness of the personality-based algorithms (PS and CDM), they were compared to an existing network generative model that was not personality- based. Two were considered for the role of baseline. Because of its abstract representation of popularity, the Popularity Similarity model (Papadopoulos 2012), as implemented in the R package NetHypGeom (Alanis-Lobato, 2016), was examined.

However, perhaps because of that model’s orientation to large scale-free networks, the implementation sets certain bounds on its input parameters; in particular, the average degree must be ≥ 2 and the scaling exponent must be ≥ 2 and ≤ 3. Of the fourteen real- world networks to be used as exemplars in this work (see Table 5.1), only one (Zachary

Karate Club) had values for the metrics that satisfied both of these bounds; the other thirteen had an average degree < 2, a scaling exponent either < 2 or > 3, or both. Thus, the exemplars to be used did not seem well suited to the capabilities of the Popularity

Similarity model, or its implementation.

On the other hand, the Configuration Model (CM), which was described earlier, produces synthetic networks based upon the degree sequence of an exemplar network, and does not consider personality. Because it is based on degree sequence, it’s usable with the exemplars. Furthermore, it is considered by some to be a standard basis of comparison: “Following the works of Barabási et al., the degree distribution has become accepted as the most fundamental network characteristic… [I]t has become a standard to

129 compare network quantities to a null-model where the degrees of the network (the degree sequence) is fixed and everything else random” (Barrenas, 2009).

10.7 Implementation and execution

This section describes the software implementation of the algorithms and supporting functions. It also discusses their execution.

10.7.1 Implementation of the algorithms

The two new algorithms for finding effective personality type assignments (PS and CDM), as well as the network generator GNAC algorithm, were implemented in the

R language. R is an open-source programming language and environment with powerful and extensive features for data analysis, data visualization, and statistical computing

(R Core Team 2016). R also includes a full range of general-purpose programming language features, including control structures, mathematical operations, and file input/output. It should be noted that for medium and large networks, the network probability value P(G) computed by the PS algorithm can become quite small, as it is the product of n(n – 1) / 2 probabilities, all of which are ≤ 1.

A computer implementation of P(G) meant to handle medium and large networks must take care to avoid numeric underflow. In our implementation, we used the R gmp

(GNU Multiple Precision) package for arbitrary precision arithmetic. As already mentioned, CM is an existing algorithm for generating synthetic social networks. A prior implementation of CM in the R language is available in the R igraph package, which is a collection of R functions for network analysis and visualization (Csárdi 2013). In that

130 package function sample_degseq produces networks using CM. That function was used for this work without modification.

10.7.2 Execution of the algorithms

The PS algorithm was executed on computing clusters provided and supported by the Alabama Supercomputer Center. Typical run times for PS algorithm was highly dependent on the number of nodes in the exemplar graph, which ranged from a few minutes for the smallest real-world network (Robins Australian Bank, 11 nodes) to several hours for the largest real-world network (Lazega Law Firm, 71 nodes). Although the algorithms’ implementation code was not parallelized, scripts were used to initialize and initiate multiple instances of the programs to execute concurrently.

10.8 Results from the heuristic methods

This section reports the results of testing and comparing the PS and CDM algorithms with the Configuration Model. The comparison is in terms of quantitative measures of the generated social networks’ realism. Realism is measured by the absolute difference between the mean metrics of the synthetic networks and the network metrics of the exemplar real-world social network. The metrics used to measure realism are listed in Table 2.2. Smaller absolute difference is preferred. Absolute differences between the metrics of the exemplar real-world social network and the mean metrics of the synthetic networks were calculated for networks generated by the PS and CDM algorithms and compared to networks generated by the CM algorithm.

As an example of the results, Table 10.1 presents a comparison of the realism metrics for the assignments found by the PS and CDM algorithms for only one of the

131 real-world exemplar networks, Bernard & Killworth Technical. The result for only one of the exemplars is shown in this section for brevity. Results for all 14 real-world exemplars are in Appendix E; columns of these tables are organized as follows:

1. Metrics defined in Table 2.2.

2. T = truth metrics calculated for the network.

3. 퐹̅ = the mean metric value of a set of networks generated with CM.

4. |푇 − 퐹̅ | = absolute difference between CM mean metric and the exemplar metric.

30 5. L1(F) = ∑푖=1|푇 − 퐹푖|

30 2 6. L2(F) = √∑푖=1|푇 − 퐹푖|

7. 푃̅ = the mean metric value of a set of networks generated with PS.

8. |푇 − 푃̅ | = absolute difference between PS mean metric and the exemplar metric.

30 9. L1(P) = ∑푖=1|푇 − 푃푖|

30 2 10. L2(P) = √∑푖=1|푇 − 푃푖| .

11. 푀̅ = the mean metric value of a set of networks generated with CDM.

12. |푇 − ̅푀̅̅ | = absolute difference between CDM mean metric and the exemplar metric.

30 13. L1(M) = ∑푖=1|푇 − 푀푖|.

30 2 14. L2(M) = √∑푖=1|푇 − 푀푖|

Columns 3-6 apply to the synthetic social networks generated by the CM algorithm. Columns 7-10 show the same for the synthetic networks generated by the PS algorithm, collectively denoted P, and columns 11-14 show the same for the synthetic social networks produced with the CDM algorithm, collectively denoted M. In columns

132

4-6, 8-10, and 12-14, the cells’ content is set in bold type to show at a glance the realism of the networks generated with the personality assignments generated by the PS and

CDM algorithms compared to the CM-generated networks’ realism. The MBTI personality compatibility was used. For all tables in Appendix E, bold identifies the lowest values. If the bold numbers are in columns 7-14 that indicates that the PS or CDM networks’ mean metric value was closer to the exemplar than the CM networks’ mean metric value. As can be seen in Table 10.1, for the Bernard & Killworth Technical exemplar network, both the PS and the CDM algorithms produced more realistic synthetic social networks than the CM algorithm over the majority of the network metrics.

133

Table 10.1 Realism results for the Bernard & Killworth Technical network.

)

M

0.00 0.00 0.00 0.00 0.00 2.31 0.16 0.33 0.19 0.32 3.17 0.00 0.00 0.30 0.10 0.00 0.80 4.00

16.70

229.79

L2(

M)

0.00 0.00 0.00 0.00 0.00 0.77 1.69 0.95 1.49 0.01 0.02 1.38 0.45 0.00 3.68

11.02 77.00 15.74 16.00

L1(

1176.86

|

̅

푴 -

T 0.00 0.00 0.00 0.00 0.00 0.37 0.03 0.06 0.03 2.50 0.02 0.52 0.00 0.00 0.04 0.01 0.00 0.08 0.53

39.23

|

̅

1.00 0.31 5.00 0.45 0.53 1.78 6.50 0.51 0.02 0.01 0.49 0.07 2.00 2.80 3.47

34.00 10.29 12.81

175.00 102.52

)

P

00

0. 0.00 0.00 0.00 0.00 2.45 0.20 0.32 0.21 0.27 3.49 0.00 0.00 0.27 0.10 0.00 0.85 4.00

21.10

L2(

238.61

)

P

00

0. 0.00 0.00 0.00 0.00 0.98 1.61 1.08 1.18 0.01 0.02 1.32 0.46 0.00 3.74

12.12 17.74 16.00

L1(

103.00

1249.56

|

̅

-

T 0.00 0.00 0.00 0.00 0.00 0.40 0.03 0.05 0.04 3.37 0.02 0.58 0.00 0.00 0.03 0.01 0.00 0.09 0.53

41.65

|

̅

1.00 0.31 5.03 0.44 0.53 1.77 7.37 0.50 0.02 0.01 0.50 0.07 2.00 2.79 3.47

34.00 10.29 12.75

175.00 104.94

)

F

00

0. 0.00 0.31 5.97 0.95 0.88 0.52 0.31 8.51 0.01 0.01 0.38 0.10 2.00 1.33 1.73

10.18 16.82 75.25

L2(

173.03

F)

0.00 0.00 1.68 5.17 4.76 2.70 1.42 0.03 0.03 1.82 0.44 4.00 5.79 3.00

55.35 32.41 75.00 44.53

L1(

941.00 368.03

|

̅

푭 -

.00

0 0.00 0.06 1.85 1.08 0.17 0.16 0.09 2.37 0.01 1.48 0.00 0.00 0.06 0.00 0.13 0.19 0.03

|T 31.37 10.26

̅

1.00 0.26 8.45 3.55 0.30 0.32 1.90 6.37 0.49 0.02 0.01 0.59 0.06 2.13 3.08 3.97

34.00 14.81 53.03

143.63

T

1.00 0.31 4.63 0.48 0.47 1.81 4.00 0.49 0.02 0.01 0.53 0.06 2.00 2.88 4.00

34.00 10.29 13.32 63.29

175.00

Metrics

Nodes Links Components Network density Average degree Standard deviation degree Global cluster coefficient Average cluster coefficient Mean path length Communities Gini coefficient Average betweenness Maximum betweenness Average closeness Minimum closeness Average eigencentrality Minimum eigencentrality Network radius Average eccentricity Network diameter

134

Table 10.2 Realism results summary.

PS vs. CM CDM vs. CM Exemplar Real-World Social Network PS CM = CDM CM = Robins Australian Bank 15 4 1 14 5 1 Roethlisberger & Dickson Bank Wiring Room 9 10 1 9 9 2 Thurman Office 13 6 1 14 5 1 Sampson Monastery 10 8 2 7 9 4 Krackhardt Office CSS 9 10 1 10 9 1 Krackhardt High-Tech Managers 11 8 1 9 9 2 Schwimmer Taro Exchange 5 14 1 5 14 1 Webster Accounting Firm 9 9 2 9 9 2 Zachary Karate Club 9 8 3 10 8 2 Bernard & Killworth Technical 13 5 2 13 5 2 Bernard & Killworth Office 11 6 3 11 6 3 Krebs Fortune 500 IT Department (Advice) 9 8 3 10 7 3 Krebs Fortune 500 IT Department (Business) 7 9 4 8 7 5 Lazega Law Firm 12 2 6 11 3 6 Total 142 107 31 140 105 35

Table 10.2 summarizes the overall realism results. Two realism comparisons were made: PS versus CM, and CDM versus CM. Both are reported in the table. A total of 280 metric values (14 real-world social networks × 20 metrics) were calculated for each of the comparisons. The columns labeled with an algorithm’s abbreviation (PS,

CDM, CM) show the number of metrics where that algorithm’s metric values were closer to the exemplar network than the other algorithm in the comparison, and a column labeled “=” shows the number where the two algorithms’ metric values were equally close. In the PS versus CM comparison, the values of 142 of the 280 metrics (~50.7%) for the PS networks were closer to the values of the exemplar network than those of the

CM algorithm, and another 31 values (~11.1%) were equally close; the CM networks

135 values were closer to the exemplar network on only 107 (~28.2%) of the metrics. In the

CDM versus CM comparison, the values of 140 of the 280 metrics (50.0%) for the DCM networks were closer to the values of the exemplar network than those of the CM algorithm, and another 35 values (~12.5%) were equally close; the CM networks values were closer to the exemplar network on only 105 (~37.5%) of the metrics.

A simple hypothesis test of proportion confirms that both PS and CDM come closer to the exemplar network than CM more often than can be expected from random chance. For PS versus CM, we treat each of the 280 metrics as a binomial trial. A closer metric value in a PS-generated network is counted as a success, a closer metric value in a

CM-generated network is counted as a failure, and equal metric values are omitted from the sample. In a right-tailed test, the hypotheses are H0: p = 0.50 and H1: p > 0.50, so the statistical assumption is that PS is not better than CM. The level of significance is set to

α = 0.05. The sample data is r = 142 and n = 142 + 107 = 249. The results are test statistic 푝̂ = 0.570281, z = 2.218035, and P-value = 0.01326, which is < α, thus we reject the null hypothesis and conclude that PS outperforms CM. The same test applied to

CDM versus CM has r = 140 and n = 140 + 105 = 245. The results are test statistic

푝̂ = 0.571429, z = 2.236068, and P-value = 0.01264, which is again < α, thus we again reject the null hypothesis and conclude that CDM outperforms CM.

136

(a) Real-world exemplar social network (b) Random social network

(c) Synthetic social network with personality (d) Synthetic social network with personality type assignment found by PS algorithm type assignment found by CDM algorithm Figure 10.9 Comparison of real-world social network with generated networks

To support the quantitative realism results at an intuitive level, Figure 10.9 presents an example visual comparison of a real-world social network with a randomly generated network and two networks that were generated using a personality compatibility table. Figure 10.9 (a) shows the Robins Australian Bank social network

(Pattison 2000). Figure 10.9 (b) shows a network that was generated using the random

G(n, p) algorithm. That network has the same number of nodes and network density as the exemplar real-world social network. Figure 10.9 (c) shows a synthetic social network generated using an assignment of personality types found by the PS algorithm.

Figure 10.9 (d) shows a synthetic social network generated using an assignment of

137 personality types found by the CDM algorithm. In the figure, node communities found by the walktrap.community function in the R igraph package are depicted with bounding boxes around them. A visual inspection of the networks in Figure 3 reveals what appear to be more realistic communities within Figure 10.9 (c) and 10.9 (d).

10.9 Discussion of the heuristic methods

The PS and CDM algorithms differ from most prior work on generating synthetic social network in a significant way. Most prior algorithms do not consider the attributes of the nodes, or of the people or entities the nodes represent, when adding links; instead, they are based on retaining or replicating some of the structural characteristics of the exemplar network in the synthetic networks. For example, CM is given a degree sequence, which may be the actual degree sequence of the real-world network serving as an exemplar (Newman 2003). In contrast, the PS and CDM algorithms use the attributes of the nodes, in particular the personality types assigned to them, as the primary driver of their calculations. The algorithms presented in this chapter were designed to work with any reasonable and internally consistent personality compatibility table.

From the quantitative results, it is evident that both the PS and the CDM algorithms, which use personality compatibility information, generate more realistic synthetic social networks than the CM algorithm, which does not. The PS and CDM algorithms are quite similar in terms of realism. However, the CDM algorithm is much more computationally efficient, requiring substantially shorter execution times for large networks. Either PS or

CDM could be used with small to medium exemplars; for exemplars with more than ~40 nodes, PS becomes impractical, at least in its current implementation.

138

A close examination of the results in Table 10.2 shows that the PS and CDM both performed worst on the Schwimer Taro Exchange exemplar network. It seems unlikely to be a coincidence that in that network, only among the fourteen exemplars, the nodes correspond not to individual people, but to households, which is intuitively not as good a fit with personality-based algorithms. Thus, PS and CDM, or future enhancements of them, should be considered when the nodes correspond to individual people and personality compatibility is expected to have a significant effect on whether two people have the relationship that a link represents.

139

CHAPTER 11

ITERATED PRISONERS’ DILEMMA GENERATED COMPATIBILITY TABLE

Chapter 6 described a method for developing an MBTI-based personality compatibility table. Chapters 9 and 10 presented results from using the MBTI-based table to generate synthetic social networks with randomized and heuristic methods. In order to test the robustness of using personality compatibility as a basis for synthesizing social networks, an alternate and unrelated personality type scheme was also used. This chapter describes the development and use of a personality compatibility table generated from an Iterated Prisoners’ Dilemma (IPD) tournament. The network generation algorithms that use the IPD-generated compatibility table are the CDM and GNAC, which were described in Chapter 10.

11.1 Iterated Prisoners' Dilemma strategy tournament

Section 2.7 provided some IPD history and indicated that the strategies have characteristics that are similar to personality characteristics. This section provides details about the implementation of an IPD tournament and the generation of a compatibility table wherein the strategies serve as proxies for personality types.

Most entries in the Axelrod hosted IPD tournaments were named after the people who submitted them, with the notable exception of Tit for Tat. Since those tournaments, many of the strategies were renamed to be more indicative of the decision process. For example, the strategy known as Friedman, has been referred to as Spite, Grim, and

140

Grudger (Axelrod 1984) (Beaufils 1997) (Mittal 2009). A variation of the strategy Joss has been called Naïve Peacemaker (Gaudesi 2016). Nowak and Sigmund developed

Pavlov, which is based on the principle of win stay, lose shift (Nowak 1993). Sources for the names of strategies identified in this section include (Hingston 2011) (Mittal 2009)

(Imhof 2005). The Machiavelli strategy is the exception; it analyzes a series of decisions by an opponent to determine the name of that strategy and it tailors its responses for that strategy. The philosophy of the Machiavelli strategy is similar to the Downing strategy in the Axelrod hosted tournaments (Axelrod 1984).

A good reference for IPD strategies is (Knight 2018). Fourteen IPD strategies were implemented as R functions that returned an action based on rules. The rules determine whether to cooperate, C = 1, or to defect,

D = 2. The implemented IPD strategies include:

1. Random – randomly choose with the probability p(C) = p(D) = 0.5.

2. Always Cooperate – choose C, regardless of the opponent’s choice.

3. Always Defect – always choose D, regardless of the opponent’s choice.

4. Tit for Tat – initially choose C, thereafter choose the opponent’s last choice.

5. Tit for Tat and Random – initially choose C; thereafter, choose opponent’s last choice

with the probability p = 1 – epsilon. Pseudocode in Figure 11.2 includes epsilon.

6. Suspicious Tit for Tat – initially choose D; thereafter, choose opponent’s last choice.

7. Grudger – choose C until the opponent chooses D; thereafter, always choose D.

141

8. Soft Grudger – choose C until the opponent chooses D; if the opponent chooses D

then retaliate with {D, D, D, D, C, C}, then resume choosing C.

9. Naïve Prober – initially choose C; thereafter, respond to the opponent’s choice of D

with D, and respond to the opponent’s choice of C with either C with the probability

p = 1 – epsilon or D with p = epsilon.

10. Remorseful Prober – initially choose C; thereafter, respond to the opponent’s choice

of D with D and respond to the opponent’s choice of C with either C with the

probability p = 1 – epsilon or D with the probability p = epsilon.

11. Naïve Peace Maker – initially choose C; thereafter, respond to the opponent’s choice

of C with C and respond to the opponent’s choice of D with either D with a

probability p = 1 – epsilon or C with a probability p = epsilon.

12. Pavlov – initially choose C; thereafter, determine if the last choice produced a payoff

of reward = 2 or temptation = 3 then repeat the last choice. If the last choice

produced a payoff of sucker = 0 or punishment = 1 then choose the opposite of the

last choice.

13. Gradual – initially choose C; thereafter, respond to each opponent’s D with a

sequence of k consecutive Ds, where k is the total number of Ds previously chosen by

the opponent.

14. Adaptive – choose {C, C, C, C, C, C, D, D, D, D, D} for the first 11 choices;

thereafter, choose C or D depending on the choice that has previously returned the

higher average payoff.

142

In this research, these strategies were programmed in R and competed head-to-head in an extended sequence of interactions (30 trials of 100 interactions) in a round robin tournament. Tournament results are stored in a matrix, results, where cells at the intersections of rows and columns contain the total scores based upon the payoff matrix presented in Table 2.4.

푑 = |푎푖푗 − 푎 푗푖| (11.2.1)

푥 = −1.64 + (1.64 × 푎푖푗 + 푎푗푖 - |d|)) (11.2.2)

2 1 2 푓(푥) = 푒−((푥− 휇)2휎 )) (11.2.3) √2휋 휎

cij = 푓(푥), cji = 푓(푥) (11.2.4)

푛 푛 C = ∑푖=1 ∑푗=1 푓(푥) (11.2.5)

Equations (11.2.1) through (11.2.5) produce a compatibility table from the IPD tournament results. Equation (11.2.1) calculates the absolute pay difference between two cells of the tournament results matrix A. Equation (11.2.2) calculates a vector of quantiles using the tournament results and pay differences. Equation (11.2.3) presents the normal distribution function, which uses the x calculated in equation (11.2.2).

Equation (11.2.4) indicates that the cells of a compatibility table, C, contain the values from the distribution function. Equation (11.2.5) produces the compatibility matrix, C, for each strategy, where n is the number of strategies. Values in the resulting compatibility table represent the propensity for individuals associated with each IPD strategy to form an alliance. Within the context of social network generation, these values represent the likelihood of link formation.

143

Results presented in the next section were produced by an IPD strategy compatibility table that had probability values ranging from ~0.01 to ~0.71. Figure 11.2 presents pseudocode for generating the IPD-based personality compatibility table.

1. strategies  14 2. trials  30 3. rounds  100 4. epsilon  0.10 5. payoff  ((2, 3), (0, 1)) # matrix with values for reward, temptation, sucker, and punishment 6. for i  1 to strategies 7. for j  1 to strategies 8. for k  1 to trials 9. xcs  (0) # player x choices array 10. xps  (0) # player x payoffs array 11. ycs  (0) # player y choices array 12. yps  (0) # player y payoffs array 13. for l  1 to rounds 14. xc  strategyselector(i, xcs, xps, ycs, yps) # function that selects a strategy based 15. yc  strategyselector(j, ycs, yps, xcs, xps) # on loop index, choices, and payoffs 16. xp  payoff(xc, yc) # selected strategy determines whether choice is 1 or 2 17. yp  payoff(yc, xc) 18. xcs(length(xcs) + 1)  xc 19. xps(length(xps) + 1)  xp 20. yps(length(yps) + 1)  yp 21. end for 22. resultsij  resultsij + sum(xps) 23. if i != j then 24. resultsji  resultsji + sum(yps) 25. end if 26. end for 27. end for 28. end for 29. results  results / (trials × rounds × payoff(1, 2)) # ratio of actual payoffs to maximum possible 30. for i  1 to strategies 31. for j  1 to strategies 32. paydiff  Equation (11.2.1) applied to results 33. Z  Equation (11.2.2) applied to results and paydiff 34. compatibilityij  Equation (11.2.5) applied to Z 35. compatibilityji  Equation (11.2.5) applied to Z 36. end for 37. end for

Figure 11.1 Pseudocode for generating the IPD strategy compatibility table

144

11.2 Simulating an ecosystem to produce an empirical distribution

Chapter 8 and section 10.5 mentioned a function named makeMBTI, which returns personality types based on an empirical distribution specified in Table 2.3.

Development of a similar function, named makeIPD, required an empirical distribution of

IPD strategies. This section describes a method to simulate an evolving ecosystem to produce an empirical distribution.

The Axelrod library is Python package designed as an open platform for conducting reproducible IPD research (Knight 2018). A function in the Axelrod library implements the Moran process, which enables development of an evolutionary IPD tournament. In a Moran process, an evolving ecosystem has a fixed size population; as the simulation runs, the sizes of the population constituents change (Kelly 1976). At the start of an evolutionary IPD tournament, each competing strategy is represented by an equal number of players. Eligibility for reproduction of players are determined by their scores each round; players to be replaced are randomly selected (Knight 2018). Over the course of several generations, proportions of players of various strategies will rise and fall and eventually, many of the IPD strategies will be eliminated. Figure 11.2 present the results of a Moran process-based evolutionary IPD tournament10.

10 The IPD tournament described in section 11.1 was implemented before this author found the Axelrod library, which is why there are minor differences in the names of the strategies listed in Figure 11.2. 145

Figure 11.2 Moran process-based evolutionary IPD tournament

Table 11.1 Empirical distribution of IPD Strategies

Strategy % Remorseful Prober 11.11 Random 8.89 Grudger 8.89 Pavlov 8.89 Defector 6.67 Cooperator 6.67 Tit for Tat (TFT) 6.67 Adaptive 6.67 Machiavelli 6.67 Naive Prober 6.67 Random TFT 6.67 Appeaser 4.44 Soft Grudger 4.44 Suspicious TFT 4.44 Gradual 2.22

146

Iterations of the population occurred in rounds wherein every possible pair of players are matched and the cumulative payoffs are recorded. Players reproduce in proportion to the players’ scores from the previous round. Occasionally, players of a different strategy type are randomly replaced (Knight 2018). Table 11.1 presents the resulting empirical distribution for the IPD strategies as of iteration 19; after that iteration, a few strategies disappeared from the population.

11.3 Using the IPD-generated personality compatibility table

Chapter 9 described MC and GA algorithms that use the G(n, A, C) algorithm described in section 9.1. Chapter 10 described heuristic methods that use the GNAC algorithm which was explained in section 10.3. Section 10.5 explained the CDM algorithm. These algorithms can use the IPD-based personality compatibility table described in this chapter. The CDM algorithm includes a call to a function that returns an assignment, A, which is a list of personality types for the nodes of the network to be generated. Section 10.5 described an implementation of the CDM algorithm that used a makeMBTI function, which returns a personality type based on the empirical distribution listed in Table 2.3. In the research described in this chapter, the CDM algorithm was implemented with a function, named makeIPD, which returns an IPD strategy in accordance with the empirical distribution listed in Table 11.1. The CDM algorithm determines the degree sequence of an exemplar network and organizes the indices of that sequence in descending order. After consulting the IPD-generated personality compatibility table, CDM algorithm organizes the assignment from most to least compatible. Then, CDM organizes the assignment by node indices such that the most compatible strategies are in the order of the nodes with the highest degrees.

147

As implemented in this research, the CDM algorithm returns an assignment of IPD strategies, which the GNAC function along with the IPD-generated compatibility table use to determine the nodes to link in a synthetic social network. Using the IPD-generated compatibility table, the CDM and GNAC functions were applied to each of the real-world exemplar networks identified in Table 5.1. Appendix F presents the results tables for each of the exemplar networks. Table 11.2 presents the results for one of them, the

Thurman Office exemplar network. Columns of the table include:

1. Metrics defined in Table 2.2

2. T = truth metrics calculated for the exemplar network

3. 퐹̅ = averaged metrics from 30 networks generated by the Configuration Model

4. |푇 − 퐹̅| = difference between averaged metrics and the truth

30 5. L1(F) = ∑푖=1|푇 − 퐹푖|

30 2 6. L2(F) = √∑푖=1|푇 − 퐹푖|

7. 푀̅ = averaged metrics from 30 networks generated by the GNAC function using the

IPD-based compatibility table

8. |푇 − 푀̅| = difference between averaged metrics and the truth

30 9. L1(M) = ∑푖=1|푇 − 푀푖|

30 2 10. L2(M) = √∑푖=1|푇 − 푀푖|

In Table 11.1 and the tables in Appendix F, bold indicates the lowest values. If the bold numbers are in columns 7-10 that indicates the CDM and GNAC algorithm performed better than the Configuration Model.

148

Table 11.2 Thurman Office network results using an IPD strategy compatiblity table

Metrics T F |푻 − 푭̅| L1(F) L2(F) M |푻 − 푴̅ | L1(M) L2(M)

Nodes 15 15 0 0 0 15 0 0 0 Links 33 26.23 6.767 203 38.092 33 0 0 0 Components 1 1.1 0.1 3 1.732 1.067 0.067 2 2 Network density 0.314 0.25 0.064 1.933 0.363 0.314 0 0 0 Average degree 4.4 3.498 0.902 27.067 5.079 4.4 0 0 0 Standard deviation degree 2.53 1.827 0.703 21.089 3.997 2.94 0.41 12.305 2.396 Global cluster coefficient 0.516 0.272 0.244 7.329 1.381 0.446 0.07 2.109 0.428 Average cluster coefficient 0.477 0.304 0.173 5.271 1.08 0.682 0.205 6.15 1.153 Mean path length 1.876 2.334 0.458 13.743 4.417 1.9 0.024 5.6 3.192 Communities 3 3.8 0.8 34 8.246 3.733 0.733 44 9.592 Gini coefficient 0.178 0.254 0.076 2.592 0.586 0.285 0.108 4.43 0.873 Average betweenness 6.133 7.66 1.527 49.667 9.981 5.462 0.671 20.267 4.713 Maximum betweenness 37.25 27.73 9.517 300.42 59.319 45.55 8.302 272.09 57.57 Average closeness 0.039 0.035 0.004 0.125 0.025 0.042 0.003 0.075 0.019 Minimum closeness 0.03 0.025 0.005 0.178 0.035 0.031 0.001 0.055 0.017 Average eigencentrality 0.528 0.543 0.015 1.026 0.212 0.506 0.022 1.093 0.213 Minimum eigencentrality 0.106 0.119 0.014 1.247 0.289 0.167 0.061 1.899 0.353 Network radius 2 2.567 0.567 17 4.123 1.967 0.033 1 1 Average eccentricity 2.8 3.398 0.598 18.6 3.829 2.571 0.229 8.333 1.829 Network diameter 3 4.3 1.3 39 7.681 3.1 0.1 5 2.236

149

Table 11.3 Score table for all the exemplar networks

Exemplar Network Wins Losses Draws Robins Australian Bank 13 4 3 Roethlisberger & Dickson Wire Room 12 7 1 Thurman Office 15 4 1 Sampson Monastery 7 12 1 Krackhardt Office CSS 8 11 1 Krackhardt High-Tech Managers 8 10 2 Schwimmer Taro Exchange 6 13 1 Webster Accounting Firm 8 9 3 Zachary Karate Club 8 9 3 Bernard & Killworth Technical 13 4 3 Bernard & Killworth Office network 12 5 3 Krebs Fortune 500 IT Dept. (Advice) 10 9 1 Krebs Fortune 500 IT Dept. (Business) 8 7 5 Lazega Law Firm 11 3 6 Total metrics scores 139 107 34 Total exemplar scores 8 6 0

Table 11.3 presents the scores for the complete set of exemplar networks. Scores in the “Wins” column means that the method described in this chapter performed better, in terms of metrics values closer to the exemplar network, than the Configuration Model on that many metrics. Scores in the “Losses” column means that the Configuration

Model performed better. Scores in the “Draws” column means that both methods produced the same values.

11.4 Discussion of the IPD-based results

Results presented in Table 11.2 and Table 11.3 indicate that the IPD-based personality compatibility table, combined with the GNAC algorithm, performs as well as or better than the Configuration Model. Table 11.2 presented 20 metrics which provide insight to the performance in synthesizing the overall network structure, communities,

150 hubs, and influential nodes. The “L1” and “L2” norm columns enable comparison of the

30 generated networks by the two methods. Exemplar networks listed in Table 11.3 are arranged from the smallest, i.e. 11 nodes to the largest, i.e. 71 nodes. The GNAC function performed better at the top and bottom of the list than it did in the middle.

Axelrod set a precedent of attributing personality traits to IPD strategies by using adjectives, such as “nice” and “forgiving.” In section 11.1 and Table 11.1, notice how the names of the IPD strategies convey personality characteristics such as “naïve”,

“suspicious”, and “remorseful”. Modeling personalities with IPD strategies provides an alternative approach to synthesizing social networks. Unique and novel aspects of the methodology presented in this chapter are the idea of using IPD strategies as a proxy for personality types and an IPD strategy compatibility table. Algorithms were presented for compatibility degree matching, synthesizing networks with numerous triangle cliques, and conducting a tournament to produce IPD strategy compatibility tables. Previous sections explained how IPD strategies can simulate the Agreeableness and Neuroticism traits in the Five Factor Model, which were two of the more important personality traits for long duration exploration teams.

151

CHAPTER 12

SYNTHESIZING EVOLVING SOCIAL NETWORKS

Previous chapters explained research related to synthesize static social networks.

In the real-world, social networks are dynamic because nodes and links are added or deleted over time. This chapter identifies data sets for real-world exemplar evolving networks, presents pseudocode that synthesizes evolving social networks based on personality compatibility, and analyzes results from two Turing tests that were conducted to determine the realism of the synthesized evolving social networks.

12.1 Exemplar evolving network data sets

Data sets for evolving social networks are scarce. Fortunately, the Simulation

Investigation for Empirical Network Analysis (SIENA) project provides longitudinal data sets for evolving networks. Table 12.1 identifies the longitudinal data sets that are available at the SIENA project website.

Table 12.1 Longitudinal data sets from the SIENA project website

Data set Individuals Waves Reference Teenage Friends and (West and Sweeting 1995) 50 3 Lifestyle Study (Michell and Amos 1997) Networks and actor attributes 26 4 (Knecht 2006) in early adolescence (van Duijn et al. 2003) Sociology Freshmen 38 5 (van de Bunt et al. 1999) Freshman Sociology Cohort 34 5 (van de Bunt et al. 1999) University Freshmen 32 7 (van de Bunt et al. 1999)

152

The column heading waves, in Table 12.1, indicates the number of times that data was collected from the students. Other columns in the table provide names for the data sets, number of individuals represented, and references. The first data set represents a cohort of 50 girls that participated in a Teenage Friends and Lifestyle study conducted in the mid-1990s. A total of 160 students participated in the study; 129 were present during the three data collection waves. Friendship networks were determined by asking each student to identify their 12 best friends. Students were asked about their behavior with respect to sports, and usage of tobacco, alcohol, and marijuana (West 1995)

(Michell 1997).

Row two of Table 12.1 identifies a data set that represents students in a Dutch school class during September 2003 and June 2004. There were 17 girls and 9 boys ages

11 to 13. Students were asked to indicate up to 12 classmates that were considered to be good friends (Knecht 2006).

Row three of Table 12.1 identifies a data set that represents a cohort of Groningen

University freshmen enrolled in Sociology. During the freshman year, five of seven questionnaires requested data about relationships among students. Students were asked to categorize their relationships as best friend, friend, friendly relationship, neutral relationship, dissonant relationship, and unknown relationship. A list of students included with the questionnaire enabled the students to assign relationship categories to their colleagues by selecting one of six checkboxes (van Duijn 2003).

Rows four and five identify data sets that represent an evolving network university freshman. Data was collected in seven waves during the year. Students were

153 asked to categorize relationships with their colleagues. Category codes were the number

0 for people they did not know, 1 for best friend, 2 for friend, 3 for friendly relationship,

4 for neutral relationship, and 5 for troubled relationship (van de Bunt 1999). A friendly relationship was defined as pleasant relationships with people who might become friends

(van Duijn 2003). Processing of the adjacency matrices from the freshman data sets involved changing values greater than one to one with the exception of five (dissonant or troubled relationships), which were changed to zero.

An R package named RSiena enables the development of Stochastic Actor-Based

Models. These models are derived from social networks defined by adjacency matrices and lists of behaviors (e.g. smoking and drinking). Tables of numerical values that indicate the influence of the behaviors in the formation of the network, can be derived from the models (Snijders 2010) (Ripley 2018). The following chapter includes a section about future work, which addresses potential applications of those table generation functions.

12.2 Pseudocode for synthesizing an evolving network

The equations and pseudocodes described in this subsection characterize the changes in nodes and links in a social network, as it evolves over time. The following list defines the parameters used in the equations and pseudocodes.

핋 = set of real-world social network sequences to be used as exemplars for the synthesis process; networks in the class or category of network sequences to be synthesized t = number of exemplar real-world social network sequences, t = |핋|

Ti = sequence i of real-world networks,Ti  핋 for 1 ≤ i ≤ t ci = number of networks in sequence Ti (may vary for different sequences in 핋)

154

Ti, j = single social network; network j in sequence Ti, 1 ≤ j ≤ ci 핋,T, T = real-world social network set of sequences, network sequence, and single network 핊, S, S = synthetic social network set of sequences, network sequence, and single network

V(Si, j) = node set of network Si, j v, w = individual nodes, e.g., v  V(Si, j) is a node in network Si, j d(v) = degree of node v

E(Si, j) = link set of network Si, j e = an individual link, e.g., e  E(Si, j) is a link in network Si, j

12.2.1 Equations for calculating parameters of sequence set 핋

(1) Rate at which nodes are added to networks in 핋

푐 ∑푡 ∑ 푖 푖=1 푗=2|푁𝐴푖,푗| 휆푁𝐴 = 푡 where NAi, j = set of nodes added from Ti, j – 1 to Ti, j ∑푖=1 푐푖

(2) Rate at which nodes are removed from networks in 핋

푐 ∑푡 ∑ 푖 푖=1 푗=2|푁푅푖,푗| 휆푁푅 = 푡 where NRi, j = set of nodes removed from Ti, j – 1 to Ti, j ∑푖=1 푐푖

(3) Mean degree of added nodes when they are added to networks in 핋

푐 ∑푡 ∑ 푖 ∑ 푑(푛) 푖=1 푗=2 푛휖푁퐴푖,푗 훿 = 푐 where NAi, j = set of nodes added from Ti, j – 1 to Ti, j 푁𝐴 ∑푡 ∑ 푖 푖=1 푗=2|푁𝐴푖,푗| and d(n) is degree of node n (4) Mean degree of removed nodes, when they are removed from networks in 핋

푐 ∑푡 ∑ 푖 ∑ 푑(푛) 푖=1 푗=2 푛휖푁푅푖,푗 훿 = 푐 where NRi, j = set of nodes removed from Ti, j – 1 to Ti, j 푁푅 ∑푡 ∑ 푖 푖=1 푗=2|푁푅푖,푗| and d(n) is degree of node n (5) Rate at which links are added to networks in 핋

푐 ∑푡 ∑ 푖 푖=1 푗=2|퐿𝐴푖,푗| 휆퐿𝐴 = 푡 where LAi, j = set of links added from Ti, j – 1 to Ti, j ∑푖=1 푐푖 LAi, j does not include links added as part of adding a node

155

(6) Rate at which links are removed from networks in 핋

푐 ∑푡 ∑ 푖 푖=1 푗=2|퐿푅푖,푗| 휆퐿푅 = 푡 where LRi, j = set of links removed from Ti, j – 1 to Ti, j ∑푖=1 푐푖 LRi, j does not include links removed, as part of removing a node

12.2.2 Pseudocode for the main function

Figure 12.1 presents the pseudocode to synthesize social network sequence Si from Ti, using sequence parameters calculated from 핋.

1. Si, 1  Ti, 1 # Select an exemplar real-world sequence Ti.

2. for j  2 to ci

3. Si, j  Si, j – 1

4. nr  rpois(1, λNR) # Generate a Poisson distributed random variate. 5. for k  1 to nr

6. node v  selectnodetoremove(V(Si, j), δNR) 7. remove node v and its links from Si, j 8. end for

9. na  rpois(1, λNA) # Generate a Poisson distributed random variate. 10. for k  1 to na 11. add node v to Si, j

12. la  rpois(1, δNA) # Generate a Poisson distributed random variate. 13. for l  1 to la 14. node w  selectnodetolink(…) 15. add link (v, w) to Si, j 16. end for 17. end for

18. lr  rpois(1, λLR) # Generate a Poisson distributed random variate. 19. for k  1 to lr 20. link e  selectlinktoremove(…) 21. remove link e from Si, j 22. end for

23. la  rpois(1, λLA) # Generate a Poisson distributed random variate. 24. for k  1 to la 25. node v  selectnodetolink(…) 26. node w  selectnodetolink(…) 27. add link (v, w) to Si, j 28. end for 29. end for 30. Si = (Si, 1, Si, 2, …, Si, ci)

Figure 12.1 Pseudocode for the evolving social network generation algorithm

156

12.2.3 Pseudocode for the subroutines

The pseudocode in Figures 12.2 and 12.3 either links two nodes or deletes a link between two nodes, respectively. These functions use a personality compatibility look up table, M(a,b), where a is a row index that corresponds to the personality attribute of node v and b is the column that corresponds to the personality attribute node w. Figure 12.2 presents pseudocode for a function to select a node to link. Figure 12.3 presents pseudocode for a function to delete a link. These algorithms are designed to work with any reasonable internally consistent personality compatibility table.

1. function selectnodetolink(V(Si,j), v, la) 2. linkbudget ← la # Initialize counter for links to add. 3. n ← vcount(푉(푆푖,푗)) 4. while linkbudget > 0 5. w ← sample(푉(푆푖,푗), 1) # Randomly select a node. 6. if degree(w) > 1 7. star ← neighbors(푉(푆푖,푗), w) # Get the set of connected nodes. 8. for i in 1:length(star) 9. if linkbudget > 0 10. if runif(1) <= M(personalities(v), personalities(star[i])) 11. E(Si,j) ← c(v, star[i]) # Add a link to a neighbor. 12. linkbudget ← linkbudget – 1 # Decrement number of links to add. 13. end if 14. end if 15. end for 16. else 17. if linkbudget > 0 18. if runif(1) <= M(personalities(v), personalities(w)) 19. E(Si,j) ← c(v, w) # Add a link to the node. 20. linkbudget ← linkbudget – 1 # Decrement number of links to add. 21. end if 22. end if 23. end if 24. end while 25. return E(Si,j)

Figure 12.2 Pseudocode for a function to select a node to link

157

1. function selectlinktoremove(E(Si,j)) 2. linkremovalbudget ← 1 # counter for number of links to remove 3. while linkremovalbudget > 0 # Randomly select a link from the set of links. 4. e ← sample(E(Si,j), 1) 5. if runif(1) > M[personalities[e[1]], personalities[e[2]]] # Is this link still viable? 6. linkremovalbudget ← linkremovalbudget  1 7. end if 8. end while 9. return e

Figure 12.3 Pseudocode for a function to select a link to remove

12.3 Visualizing social trajectories and networks

Dias and Ramadier analyzed the social dimensions of spatial cognition and how social mobility affects cognitive configurations (Dias 2015). Within a real neighborhood,

92 residents answered social representation questionnaires. After isolating four groups with different spatial representations, Dias and Ramadier described the position of the groups within a social structure. The results of the analysis showed that spatial representations depended upon the social trajectory of individuals, which can move up, down, or be stable. Dias and Ramadier defined a social trajectory as a history of social relationships that is experienced and internalized differently, depending upon the individual’s initial social position. An individual typically starts from a social position P

(primary socialization in family) and moves toward a social position P' (secondary socialization in the professional environment). A few people start at P' as the primary socialization. In 1979, Bourdieu defined the concept of social position, social space, and social trajectory (Dias 2015). According to Bourdieu, a multidimensional social space can distinguish an individual’s social position in their financial, cultural, and education

158 dimensions. Within a workplace, the dimensions of a social space could include influence, responsibility, and experience.

The social space provides an alternative view of society from the traditional class structure. Bourdeiu conceptualized a multi-dimensional social space, where each dimension represents a different type of capital (Bourdieu 1987). In general society, the dimensions of a social space could be defined by economic, informational, and social capital. The benefits associated with belonging to a group define social capital.

Informational capital could include knowledge, culture, and recognized expertise.

Figure 12.4 Visualization of a social trajectory

159

Figure 12.5 Pair-wise plots of a 3D social trajectory

Any set of socially distinguishing factors can specify a social space. An individual's background can determine an initial social position within a social space. An individual's changing social positions over time constitute a social trajectory. A social space could potentially have associated probabilities of affinity that pull people together and aversion that pushes people apart. A social trajectory can be plotted using the degree centrality, betweenness centrality, and eigenvector centrality network metrics. Degree centrality indicates an individual’s connections, i.e. the number of links to other individuals.

Betweenness centrality indicates the number of paths in a network that flow through an individual. Eigenvector centrality, also known as eigencentrality, indicates an individual’s level of influence. Figure 12.4 depicts a 3D plot of a social trajectory.

160

Figure 12.5 presents the same social trajectory as three scatter plots. Using Sociology

Freshman the real-world exemplar data set (see Table 12.1), a synthetic network was generated for each wave. Code based on the pseudocode presented in the previous subsection synthesized five networks. In Figure 12.4 and Figure 12.5, the blue line depicts the social trajectory of individual number eight from the real-world exemplar networks and the magenta line depicts the social trajectory of the same individual in the synthesized networks. The two lines are considerably different, because the degree of an individual node can vary, which can affect the paths that pass through the node, and the connectivity of the individual’s neighbors. A social trajectory visualization could enable an analysis of an individual’s career or social standing within a community.

12.4 Method for comparing real-world and synthesized social networks

Previous chapters presented tables that compare the results of a real-world exemplar network metrics with the metrics of generated networks. This section explains the Turing test as another approach to judging the realism of synthesized social networks.

Turing proposed “The Imitation Game” to consider the question “Can machines think?” In this game, a man and a woman respond to questions from an interrogator via teletype. Through those responses, the interrogator decides which individual is the woman. Having proven that the interrogator is an expert in identifying gender via teletyped answers, the man is replaced with a computer program. During the second round that computer program attempts to convince the interrogator that it is a woman

(Turing 1950).

161

Variations of The Imitation Game, known as the Turing test, have now become a method of evaluating the realism of computer-generated content. These Turing tests are typically restricted to a specific knowledge domain. Restricted Turing tests are the basis of the annual Loebner Prize competition for judging chatbots (Mauldin 1994). To evaluate the realism of computer-generated military tactics in a battle field simulation,

Petty conducted a Turing test, where officers judged whether a human officer would make similar decisions in the same situation (Petty 1994). A Turing test serves as a method of face validation.

Visualizing dynamic network and social network analysis statistics of individuals, poses readability problems in a document. One approach presents multiple images of the graph and multiple trend charts for the statistics. Another approach depicts the final graph with time frame numbers on the links and nodes to indicate when they appeared

(Holme 2012). In the Turing tests, the evolving networks were depicted as a series of snapshots. Two Turing tests were conducted to determine the realism of the synthesized evolving social networks.

Table 12.2 Results from the first Turing Test

Question Correct 퐩̂ z score Area P-value H0

Q1 24 0.4545 -0.6030 0.2732 0.7268 Do not reject Q2 23 0.4773 -0.3015 0.3815 0.6185 Do not reject Q3 31 0.2955 -2.7136 0.0033 0.9967 Do not reject Q4 26 0.4091 -1.2060 0.1139 0.8861 Do not reject Q5 24 0.4545 -0.6030 0.2732 0.7268 Do not reject

162

12.4.1 Results of the first Turing test

The first Turing test was conducted via an online questionnaire. Participants were invited through Facebook and LinkedIn and none of the participants were students.

Using a free service provided by Survey Monkey, a five-question survey was developed, where each question presented two images of evolving networks. In each question, one of the images depicted the real-world exemplar of an evolving network identified in

Table 12.1. The other image depicted a network that was generated via an algorithm, based upon the pseudocode presented in Sections 12.2 and 12.3. Appendix G presents the images for those questions. Each question asked which image presented the real- world network; in Figure G.1, the top row depicted the real-world network; in Figure G.2, the bottom row depicted the real-world network; in Figure G.3, the bottom row depicted the real-world network; in Figure G.4, the top row depicted the real-world network; in

Figure G.5, the top row depicted the real-world network. A total of 44 people completed the survey. Table 12.2 presents the survey results. To calculate the 푝̂, the numbers in the column with the heading “correct” are divided by the total number of people who completed that part of the survey.

푝̂ − 푝 퐻 : 푝 ≤ 푝 푧 = 0 { 0 0 퐻 : 푝 > 푝 (12.1) √푝0(1 − 푝0)/푛 1 0

Equation 12.1 calculates a z score and defines a null hypothesis and an alternate hypothesis. In the equation, 푝̂ represents the proportion of the population that provided a correct answer and n is the size of the population. For this analysis, the null hypothesis,

Ho, is that the proportion of the population that successfully identify real-world exemplar would be 0.5 or less, which indicates that they performed no better than random guessing.

163

The alternate hypothesis is that the population proportion successfully identifying the real-world exemplar networks would be greater than 0.5, which would indicate they performed better than random guessing. The level of significance for the test, α = 0.05, which means the P-value must be less than α in order to reject Ho. For all six synthetic social network sequences, the participants did no better than random guessing at distinguishing real-world and synthetic network sequences.

12.5 Results from the second Turing test

The second Turing test was conducted by Petty in two of his classes. Figures in

Appendix H were used in the test. Appendix I presents the approval letter from the

Institutional Review Board that allowed the test to be conducted with students. Students were informed that participation was optional, and their decisions would not affect their grades. Captions of the figures in Appendix H indicate which images depict evolving networks that were generated from the human collected data sets as described in section

12.4 and which images depict evolving social-networks that were synthesized using the algorithms that were presented in section 12.2. Section 12.4 explained the method for conducting a proportion test.

A total of 77 students responded to eight questions asked whether images in a

PowerPoint presentation depicted evolving social networks that were generated by humans or synthesized by a computer algorithm. Students recorded their decision by circling the word “human” or “synthetic” on a prepared response sheet.

The null hypothesis, H0, was that half or less of the population would answer correctly, meaning that the responses would be no better than guessing. The number of

164 correct responses divided by the total number of responses are presented in the column named 푝̂, in Table 12.3. Other columns in Table 12.3 present the analytical values and the conclusions on whether to reject the null hypothesis. For seven of the eight synthetic social network sequences, the participants did no better than random guessing at distinguishing real-world and synthetic network sequences.

Testing was performed with the conventional α = 0.05, meaning the probability of a Type I error is 5%. A Type I error is incorrectly rejecting the null hypothesis when the null hypothesis is actually true. Therefore, the probability if not making a Type I error is

1 — 0.05 = 0.95. With eight tests, the probability of not making a Type I error on all eight is (0.95) ^ 8 = ~0.66 and the probability of making at least one Type I error is

1 — (0.95) ^ 8 = ~0.34. It is possible that the Q6 results indicating a rejection of the null hypothesis is an example of a Type I error. The results of the two Turing tests indicate that the algorithms presented in section 12.2 produce realistic evolving social networks.

Table 12.3 Analytical results from the second Turing test

Question Correct 풑̂ z score Area P-value H0

Q1 31 0.4026 -1.7094 0.0437 0.9563 Do not reject Q2 30 0.3986 -1.9373 0.0264 0.9736 Do not reject Q3 44 0.5714 1.2536 0.8950 0.1050 Do not reject Q4 33 0.4286 -1.2536 0.1050 0.8950 Do not reject Q5 31 0.4026 -1.7094 0.0437 0.9563 Do not reject Q6 46 0.5974 1.7094 0.9563 0.0437 Reject Q7 38 0.4935 -0.1140 0.4546 0.5454 Do not reject Q8 45 0.5844 1.4815 0.9308 0.0692 Do not reject

165

CHAPTER 13

RESULTS, CONCLUSIONS, AND FUTURE WORK

The previous chapters have explained methods for manually and automatically producing compatibility tables, described randomized and heuristic algorithms for synthesizing social networks, and presented results from the application of those algorithms and compatibility tables to a set of real-world exemplar networks. Has the research described in those chapters answered the research questions posed in Chapter 3?

What are the contributions of this research? How can other researchers build upon this work? This chapter addresses these questions. A results subsection answers each of the research questions and identifies the sections that presented the evidence. A conclusions subsection discusses the contributions and potential applications of this research. A future work subsection specifies potential follow-on research and development activities.

13.1 Results

This section serves as an executive summary of the dissertation. The following subsections answer the research questions. Figure 3.1 provides a flow chart of the research that depicts the flow of research activity products to answers to the research questions. Boxes on the left side of the network identify products that flow into the integration activities identified by boxes in the middle column. Answers to research questions are depicted as ellipses on the right side of the network. Arrows flowing into those ellipses identify figures and tables mentioned in this section.

166

13.1.1 Answer to research question 1

Can a social network synthesis algorithm driven by a personality compatibility table, produce realistic synthetic social networks? Affirmative; this section summarizes the evidence presented in previous chapters.

The results reported in Chapters 7, 9, 10, and 12 demonstrated that realistic social networks can be synthesized using a personality compatibility table. Realism was evaluated by taking the absolute differences between metrics of an exemplar network and averaged metrics from synthesized networks. As explained in Chapter 7, social networks were synthesized with agent-based models implemented in Netlogo and the personality table specified in Table A.1. Averaged metrics from randomly generated metrics were used as a basis of comparison. Table 7.8 summarized the results, which were that the personality compatibility table-based synthesis method performed better than the randomly generated networks.

Chapter 9 describes two randomized methods to synthesize realistic social networks. The Monte Carlo (MC) and Genetic Algorithm (GA) methods searched for optimum configurations of personality types and networks synthesized using the personality table specified in Table A.1. A summary of those results was presented in

Table 9.2, which demonstrated that these methods performed significantly better than the randomly generated networks. Descriptions of the randomized methods and results were published in (O’Neil 2019a)

Chapter 10 presented two heuristic methods to synthesize realistic social networks. The first, the Probability Search (PS) algorithm involved a probability-based

167 search for a configuration of personality types that would maximize compatibility, in accordance with the personality table specified in Table A.1. Another heuristic method developed for this research was Compatibility Degree Matching (CDM), which involved matching the most compatible personality types to the nodes, with the highest degrees.

Unlike Chapters 7 and 9, the basis of comparison used in the research described in

Chapter 10 was the Configuration Model (CM), which involves generating networks in accordance with a degree distribution from a real-world exemplar network. Table 10.2 summarized the results, which were that the new PS and CDM heuristic methods performed better than the CM. Descriptions of the heuristic methods and results were published in (O’Neil 2019b).

Chapter 12, presented results from two Turing tests were used to prove the realism of synthesized evolving networks. Pseudocode the functions that generated synthetic evolving were presented in Figures 12.1, 12.2, and 12.3 and the implemented functions used the MBTI personality compatibility table (Table A.1).

13.1.1.1 Answer to research question 1.1

Do different personality compatibility tables with different probabilities of link formation affect the realism of the generated the synthetic social networks? The network generation algorithms may affect the realism more than the compatibility table used by the algorithm. This section summarizes the findings from research that applied randomized methods using a MBTI personality compatibility table, customized versions of that compatibility table, and heuristic methods that used an IPD generated compatibility table.

168

Chapter 8 explained a method for customizing the personality table specified in

Table A.1 produce preference matrices for the Stochastic Block Model (SBM). Table

C.1 through Table C.14 present the preference matrices and Table 9.2 summarized the results from the SBM, as it compared to the MC and GA methods, which used the personality compatibility table specified by Table A.1. Although the SBM method with the customized preference matrices performed better than the randomly generated networks, it did not perform as well as the MC and GA based methods.

Chapter 11 described the development of an IPD-generated compatibility table.

Table 11.2 and the tables in Appendix E present results from the CDM and GNAC algorithms using the IPD-based compatibility table. The method of generating and IPD tournament and the results from heuristic methods using an IPD-generated compatibility table were published in (O’Neil 2019c).

13.1.1.2 Answer to research question 1.2

Are the social networks generated using personality compatibility tables produced manually and automatically similarly realistic? Affirmative; this section identifies chapters described the manual and automated development of compatibility tables and results summary tables that indicate that both approaches were effective.

Chapter 6 explained the manual process that produced the MBTI personality compatibility table. Chapter 10 explained that the CDM and GNAC algorithms and the implementation in that chapter used an MBTI personality type compatibility table. A summary of the results was presented in Table 10.2.

169

Sections 11.1 and 11.2 described an automated process that applied the Iterated

Prisoners’ Dilemma (IPD) to generate a compatibility table. Section 11.3 explained that the CDM and GNAC algorithms used an IPD strategy compatibility table specified in

Table A.2. A summary of the results was presented in Table 11.3, which showed that the

CDM with the IPD strategy compatibility table performed about as well as the CDM with the personality compatibility table. The MBTI personality type compatibility table was manually produced. The IPD strategy table was produced automatically via an IPD tournament. Both compatibility table produced similarly realistic social networks.

13.1.1.3 Answer to research question 1.3

Are there at least two effective algorithms for generating synthetic social networks from personality compatibility tables? Affirmative; this dissertation described five such algorithms.

Chapter 9 presented two randomized methods, MC and GA. Chapter 10 presented two heuristic methods, PS and CDM. Another algorithm for synthesizing an evolving network was presented in Chapter 12. These five algorithms effectively synthesized social networks, as demonstrated by the results in tables Table 9.2, Table

10.2, and Table 11.4. A Turing test determined the realism of networks synthesized with the algorithm described in Chapter 12. Results of two Turing tests were presented in

Table 12.2 and Table 12.3. Significance test results in those indicated that the synthesized social networks could not be distinguished from real-world exemplar social networks.

170

13.1.2 Answer to research question 2

If effective social network synthesis algorithms driven by personality compatibility tables exist, how computationally efficient are they?

The GA, MC, and PS algorithms may require many hours of processor time and thousands or millions of metrics calculations. Section 9.9.2 addresses the computational efficiency of the GA, MC, and SBM methods. Table 9.3 presented the number of evaluations, which indicated that SBM was the most computationally efficient method and the MC was the least computationally efficient. The CDM and GNAC algorithms implemented in R produced results for the 14 exemplar networks within 30 minutes on a ten-year-old laptop computer.

13.1.2.1 Answer to research question 2.1

What machine-independent metric can be used to measure such algorithms’ efficiency? A machine independent metric for measuring efficiency is the number of evaluations. Counting the number of basic operations for calculations indicates the amount of computational work performed by the algorithm. Another machine independent metric is order notation. Sections 9.9.2, 10.3, 10.4, and 10.5 explain the order notation produced from an analysis of the pseudocode of the MC, GA, PS, CDM, and GNAC algorithms.

13.1.2.2 Answer to research question 2.2

How does the performance of the effective algorithms compare, in terms of the machine-independent metric? Based on the analysis presented in sections 9.9.2, 10.3,

10.4, and 10.5, the complexities of algorithms described in this dissertation are:

 O(n3) for the MC-based search algorithm

171

 O(n3) for the GA-based search algorithm

 O(n3) for the PS algorithm

 O(n log n) for the CDM algorithm

 O(n4) for the GNAC algorithm, which includes the CDM complexity

Table 9.3 presents the number of evaluations performed by the MC, GA, and

SBM methods. The mean number of evaluations for the MC was more than 50,000. For the GA method, the mean number of evaluations was more than 28,000. The SBM required 40 evaluations; however, required customized compatibility tables and it did not perform as well as the MC and GA with respect to realism. For the 14 exemplar networks, running the functions that implemented MC, GA, and PS algorithms required resources at the Alabama Supercomputer Center. For the 14 exemplar networks, the functions that implemented the CDM and GNAC algorithms ran on a 10-year-old laptop computer in ~30 minutes.

13.1.3 Answer to research question 3

Can a “closed-loop” simulation, where a social network is incrementally updated in an agent-based model, lead to an increasingly realistic social network? Affirmative;

Chapter 7 presented the research related to agent-based models implemented in Netlogo.

Section 7.5 described a closed-loop agent-based model, where metrics were calculated after each link was added to a network. Figure 7.3 presented plots of several network metrics that approach a realistic value, as defined by metrics from a real-world exemplar network.

172

13.1.3.1 Answer to question 3.1

What actions or events in the agent-based model should modify the social network? In an agent-based model, proximity of the agents could trigger a function to look up the compatibility of the personalities assigned to the two agents. In the Netlogo models described in Chapter 7, each possible pair of nodes was reviewed with respect to personality compatibility.

13.1.3.2 Answer to research question 3.2

How should the agent-based model use a personality compatibility table? A personality compatibility table could be used in simulations of interacting teams.

Section 7.7 discusses prior work on the application of agent-based models in organization simulation and presents the idea of applying a personality table to specify link weights that affect negotiations among the interacting teams.

13.1.3.3 Answer to research question 3.3

What stopping criteria should be used to reach or preserve realism? In the agent- based models described in Chapter 7, the stopping criteria was the overall network density. When the density of the synthesized network reached the density of the real- world exemplar network, the code stopped adding links to the network.

13.1.4 Answer to research question 4

Can social network synthesis algorithms produce sequences of synthetic social networks? Affirmative; Chapter 12 described methods and results for synthesizing evolving social networks. Table 12.1 identified longitudinal data sets for real-world

173 exemplars of evolving social networks. Section 12.2 presented pseudocode for an algorithm to synthesize evolving social networks.

13.1.4.1 Answer to research question 4.1

What social network metrics are useful as dimensions in social trajectory analysis?

Section 12.3 explained the concept of social trajectories and asserted that plotting a series’ degree centrality, betweenness centrality, and eigencentrality metric values could depict a social trajectory. Figure 12.4 and Figure 12.5 presented a 3D plot of a social trajectory and pair-wise scatter plots of the three metrics.

13.1.4.2 Answer to research question 4.2

How can the realism of social trajectories be measured? Measuring the realism of an evolving social network is easier than measuring the realism of an individual social trajectory within a synthetic evolving network.

Measuring the realism of a specific social trajectory is difficult because individual nodes in a stochastically synthesized social network will have different degrees and be members of different paths than the corresponding nodes from real-world exemplar networks. Thus, the social trajectory of a particular individual in a synthesized social network can look quite different from the social trajectory of the same individual in the exemplar social network. Figure 12.4 and Figure 12.5 demonstrate the difference in the social trajectories of the same individuals from the real-world exemplar series of social networks and synthesized series of social networks.

Chapter 12 presented the results from two Turing tests that compared synthetic evolving social networks with real-world exemplar evolving social networks. The results

174 indicated that the algorithm produced realistic evolving networks. Though each individual social trajectory within the synthetic evolving social network can differ considerably the from their corresponding individual social trajectories in the exemplar network, the integrated set of trajectories produces a realistic evolving social network.

13.1.4.3 Answer to research question 4.3

What algorithm or algorithms produce realistic social trajectories? Any network generation method applied to producing a series of social network can produce social trajectories. All of them will have the problem described in section 13.1.4.2. A potential approach to ensuring that the social trajectory in a synthesized series of social network resembles the social trajectory of the same individual in an exemplar series could be to fix the degrees of a particular node and ensure that the number of paths that include that node matches the number of paths of the same node in the exemplar networks.

Eigencentrality depends upon the degree of neighboring nodes. Thus, that metric could vary, but it may not significantly affect the overall shape of the simulated social trajectory. The future work section will expound upon this research opportunity.

13.2 Conclusions

Crew compatibility can contribute to mission success, especially, when the environment is isolated, confined, and hazardous. This dissertation presented methods for developing personality compatibility tables and algorithms that apply those tables to stochastically synthesize social networks. A literature search indicated that the current trend in social network analysis is in large-scale online communities. Data sets for relatively small social networks are scarce and there are fewer data sets for evolving

175 social networks. Scarcity of this type of empirical data implies the uniqueness of the research related to relatively small social networks.

This dissertation presented five algorithms for finding optimum assignments of personality types and generating synthetic social networks based on those personality assignments and a personality compatibility table. These algorithms are designed to work with any reasonable internally consistent personality compatibility table. Within the context of this research, reasonable means a manageable number of discrete personality types that can be understood and internally consistent means that the table is symmetric in the positions of link formation likelihood values. Three methods were described for producing compatibility tables. A manual method involved a subjective pair-wise comparison of attitudes towards factors that engender harmonious productive teams; this method could be applied to any discrete personality model. An automated method generated a compatibility table from an IPD tournament. The compatibility table generation algorithm for the IPD could be adapted to other repeated coordination games.

A semi-automated method produced custom compatibility tables, known as preference matrices for the SBM. The code for customizing the compatibility table could be adapted to work with larger or smaller tables.

Except for the SBM, most network generators use structural metrics, such as network density and degree distribution, as input parameters. The SBM uses a preference matrix that provides probabilities of link formation among nodes of various types, known as blocks. These preference matrices derive from the link weights of an exemplar network, for example, transportation of goods among companies or countries. Density of a network generated by the SBM is inherent in the distribution of nodes among blocks

176 and the probabilities in the preference matrix. This dissertation described a method for customizing a general-purpose personality compatibility table to produce a preference matrix for a network with a specific density.

While the randomized method of Monte Carlo and genetic algorithms are used to solve a wide variety of problems, the application of these methods to find a variable sized optimum set of personality types for a social network is unique. The algorithms for the heuristic methods and evolving networks are new. Novel aspects of the methods and algorithms presented in this dissertation include generality, flexibility, and adaptability.

Unlike the preference matrix of the SBM, the algorithms and personality compatibility table described in this dissertation can synthesize social networks with a variety of densities. The algorithms are flexible, because they can work with different types and sizes of compatibility tables. An MBTI personality table and an IPD strategy compatibility table were both used with the same network synthesis algorithm.

Compatibility table production methods are adaptable to various personality models, provided that those models can specify a reasonable number (~25 or less) of discrete types or blocks. For the MBTI personality compatibility table, a personality type generator was developed using an empirical distribution based on national statistics. To develop a similar IPD strategy generator, an evolutionary tournament was conducted to produce an empirical distribution for the selection of IPD strategies.

A novel concept that emerged from this research is the possibility of using IPD strategies as proxies for personality types. Characteristics of the OCEAN model, particularly conscientiousness, agreeableness, and neuroticism can map to IPD strategies.

For example, the level of conscientiousness can correspond to the forgiveness of a

177 strategy. Agreeableness can map directly to the tendency to cooperate. Neuroticism can map to a random factor, i.e. the more neurotic, the more random the decisions.

Mapping IPD strategies to personality types defined by multiple personality models could unify complementary theories.

13.3 Future Work

This area of research offers many opportunities for sociology experiments, analyzing data from enterprise social networks, and applying existing code libraries to produce personality compatibility tables. This section presents a few ideas for future work that involve discretizing continuous personality models, conducting IPD tournaments to determine correlations between personality types and IPD strategies, and developing code using the Axelrod Python library and RSiena package to produce compatibility tables.

Methods for assigning personalities to nodes covered in this dissertation included

MC and GA based searches, a probability search, and compatibility degree matching.

Any method for assigning personalities to nodes should work with the G(n, A, C) algorithm to synthesize social networks. The MC, GA, and PS algorithms were computationally intense and required the use of computing resources at the Alabama

Supercomputer Center. The CDM algorithm is based upon the conjecture that people with many connections tend to have a more compatible personality than people with fewer connections. A follow-on research activity could develop fast personality assignment methods that are based upon empirical evidence or other assumptions about social network structures and personality compatibility.

178

The results in this dissertation demonstrated that the G(n, A, C) algorithm can work with more than one type of compatibility table. Personality models, such as the

Five Factor Model, a.k.a. OCEAN and the Dominance, Influence, Steadiness, and

Conscientious (DiSC) are continuous models. A similar compatibility table construction process could be applied to the OCEAN or a DiSC personality type model, with the additional preliminary step of discretizing continuous scales for each personality factor into a finite number of discrete values or intervals. An approach to discretizing OCEAN and DiSC would to permute the capitalization of the letters to specify sets of personality types. For OCEAN, a discrete set of nine personality types could be {Ocean, oCean, oceAn, OcEan, oCEan, ocEAn, OcEaN, oCEaN, ocEAN}. For DiSC, a discrete set of nine personality types could be {Disc, dIsc, diSc, disC, DIsc, dISc, diSC, DisC, DISC}.

Originally developed in the fourth century by a monk named Evagrius Ponticus, the Enneagram is a model with nine discrete personality types. These personality types include Challenger, Peacemaker, Perfectionist, Helper, Performer, Romantic,

Investigator, Loyalist, and Enthusiast (Chron 2016). These nine personality types could be mapped to the discretized sets of nine personality types for OCEAN and DiSC.

The Axelrod Python library includes functions for a growing number of IPD strategies. It currently has ~200 strategies. Other functions enable evolutionary tournaments for determining empirical distributions for assignments. Future work could develop a methodology for mapping IPD strategies to various personality types. The methodology described in this chapter works for discrete personality models. The number of factors in these models could be reduced and discrete levels, such as high and low, could enable the development of a reasonably sized compatibility table. A

179 methodology to map IPD strategies to the personality traits in OCEAN, DiSC, and MBTI could provide a standardized approach to sociological simulation. A potential sociology experiment could involve questionnaires to determine the personality type of participants and could then conduct round robin IPD tournaments with the participants to determine whether correlations exist between personality types and IPD strategies.

Looking beyond personalities, values in a compatibility table could represent the degree of difficulty to integrate two different types of technologies. A design structure matrix could define the links within a system network. A function could randomly select technologies from lists and assign the technologies to nodes in the network. The technology compatibility table would be used for assigning the weights to the links among the subsystems.

According to (Aiello 2012), there has been considerable research aimed at predicting the overall evolution of social networks. However, there have been very few attempts to predict future connections of individual people within such networks. With the rise of Enterprise Social Networks (ESN), it is now possible to collect and characterize communication among employees from threads in discussion forums. A few research teams that have generated social network diagrams from ESN data, include

(Smith et al. 2009) (Behrendt et al. 2014) (Friedman et al. 2014). Similar to social media websites, an ESN provides a personal profile page, where employees can publish information about their skills and interests. Organizations could encourage employees to post personality types on their profile pages. Collecting personality types along with communication links, would enable the extraction of SBM preference matrices, which could be generalized into a personality compatibility table.

180

Functions in the RSiena package provide the capability to produce a matrix that indicates the amount of influence that behaviors have in friendship formation. If one considers personality types as behaviors, then it would be possible to produce a behavior table to accompany a real-world social network. Using RSiena, it would be possible to produce a personality compatibility table, based upon empirical evidence.

181

APPENDICES

182

COMPATIBILITY TABLES

Table A.1 A Myers Briggs Type Indicator personality compatibility table

INTP

0.506 0.506 0.714 0.296 0.506 0.139 0.296 0.506 0.506 0.296 0.296 0.714 0.139 0.867 0.714 0.250

Table A.2 A Myers Briggs Type Indicator compatibility table

0.867 0.506 0.714 0.296 0.506 0.139 0.296 0.506 0.506 0.296 0.714 0.296 0.139 0.867 0.110 0.714

ENTP

INTJ

0.714 0.714 0.506 0.139 0.296 0.051 0.139 0.296 0.714 0.506 0.506 0.506 0.296 0.030 0.867 0.867

0.296 0.714 0.139 0.506 0.296 0.714 0.506 0.296 0.714 0.867 0.506 0.506 0.110 0.296 0.139 0.139

ENTJ

INFP

0.506 0.506 0.296 0.714 0.139 0.506 0.296 0.139 0.506 0.714 0.296 0.250 0.506 0.506 0.296 0.714

0.506 0.867 0.714 0.296 0.506 0.506 0.296 0.506 0.139 0.714 0.460 0.296 0.506 0.506 0.714 0.296

ENFP

INFJ

0.506 0.867 0.296 0.296 0.139 0.506 0.296 0.139 0.506 0.680 0.714 0.714 0.867 0.506 0.296 0.296

0.714 0.296 0.139 0.506 0.296 0.296 0.506 0.296 0.840 0.506 0.139 0.506 0.714 0.714 0.506 0.506

ENFJ

ISFJ

0.296 0.296 0.867 0.506 0.952 0.714 0.867 0.940 0.296 0.139 0.506 0.139 0.296 0.296 0.506 0.506

ISTJ

0.506 0.139 0.714 0.714 0.867 0.867 0.940 0.867 0.506 0.296 0.296 0.296 0.506 0.139 0.296 0.296

ESFJ

0.296 0.296 0.506 0.867 0.714 0.840 0.867 0.714 0.296 0.506 0.506 0.506 0.714 0.051 0.139 0.139

ESTJ

0.296 0.296 0.867 0.506 0.680 0.714 0.867 0.952 0.296 0.139 0.506 0.139 0.296 0.296 0.506 0.506

0.506 0.139 0.296 0.460 0.506 0.867 0.714 0.506 0.506 0.296 0.296 0.714 0.506 0.139 0.296 0.296

ESFP

ISTP

0.506 0.506 0.259 0.296 0.867 0.506 0.714 0.867 0.139 0.296 0.714 0.296 0.139 0.506 0.714 0.714

P

ISF

0.296 0.110 0.506 0.139 0.296 0.296 0.139 0.296 0.296 0.867 0.867 0.506 0.714 0.714 0.506 0.506

0.040 0.296 0.506 0.506 0.296 0.296 0.506 0.296 0.714 0.506 0.506 0.506 0.296 0.714 0.867 0.506

ESTP

ISFJ

ISTJ

ISFP INFJ

ISTP INTJ

ESFJ

ESTJ INFP

INTP

ESFP ENFJ

ENTJ

ESTP

ENFP

ENTP

183

Table A.2 Iterated Prisoners’ Dilemma strategy compatibility table

TableMachiavelli A.3 Iterated Prisoners’ Dilemma strategy compatibility table

0.685 0.237 0.098 0.284 0.693 0.638 0.299 0.685 0.592 0.596 0.693 0.681 0.697 0.681 0.704

Adaptive

0.27

0.681 0.188 0.065 0.689 0.564 0.299 0.685 0.574 0.525 0.538 0.673 0.689 0.689 0.681

Gradual

0.697 0.211 0.708 0.288 0.708 0.527 0.708 0.708 0.355 0.403 0.708 0.708 0.708 0.689 0.697

Pavlov

0.422 0.488 0.708 0.136 0.708 0.524 0.708 0.708 0.431 0.442 0.708 0.708 0.708 0.673 0.681

Naïve Peace Maker

0.48

0.665 0.708 0.255 0.708 0.596 0.708 0.708 0.485 0.668 0.708 0.708 0.708 0.538 0.693

Remorseful

0.48

0.481 0.288 0.673 0.511 0.336 0.604 0.334 0.644 0.668 0.442 0.403 0.525 0.596

Prober 0.624

Naïve Prober

0.336 0.462 0.628 0.288 0.346 0.415 0.329 0.543 0.328 0.334 0.485 0.431 0.355 0.574 0.592

Soft Grudger

0.685 0.444 0.708 0.181 0.708 0.632 0.708 0.708 0.543 0.604 0.708 0.708 0.708 0.685 0.685

Grudger

0.296 0.145 0.708 0.288 0.708 0.364 0.708 0.708 0.329 0.336 0.708 0.708 0.708 0.299 0.299

Suspicious

0.5

0.292 0.701 0.292 0.494 0.296 0.685 0.336 0.481 0.665 0.422 0.697 0.681 0.685

Tit for Tat 0.499

Tit for Tat

0.56

0.494 0.672 0.269 0.515 0.364 0.632 0.415 0.511 0.596 0.524 0.527 0.564 0.638

and Random 0.499

Tit for Tat

0.5

0.49 0.56

0.708 0.288 0.708 0.708 0.708 0.346 0.673 0.708 0.708 0.708 0.689 0.693

Always

0.05 0.27

0.292 0.292 0.288 0.269 0.288 0.181 0.288 0.288 0.255 0.136 0.288 0.284

Defect 0.135

Always

0.05

0.701 0.708 0.708 0.672 0.708 0.708 0.628 0.624 0.708 0.708 0.708 0.065 0.098

Cooperate 0.286

Random

0.49 0.48 0.48

0.499 0.495 0.286 0.135 0.499 0.145 0.444 0.462 0.488 0.211 0.188 0.237

184

RESULTS FROM AGENT-BASED MODELING WITH MONTE CARLO SEARCH

For Appendix B tables, a green background indicates values < |푇 − 푅푥̅| < values with gold background. Table B.1 Robins Australian Bank ABM MC vs. Random

Robins Bank T 푺풙̅ |푻 − 푺풙̅| 푹풙̅ |푻 − 푹풙̅| Nodes 11 11.000 0.000 11 0 Links 16 16.000 0.000 15.667 0.333 Components 1 1.433 0.433 1.433 0.433 Network density 0.291 0.291 0.000 0.285 0.006 Average degree 2.909 2.909 0.000 2.849 0.061 Standard deviation degree 1.868 1.860 0.008 1.358 0.510 Global clustering coefficient 0.375 0.305 0.070 0.284 0.091 Average cluster coefficient 0.294 0.308 0.014 0.257 0.037 Mean path length 2.018 1.912 0.106 2.054 0.035 Average betweenness 5.091 4.386 0.705 5.014 0.077 Maximum betweenness 25.167 17.547 7.620 16.492 8.675 Average closeness 0.518 0.543 0.026 0.510 0.008 Minimum closeness 0.385 0.397 0.012 0.370 0.015 Average eigencentrality 0.492 0.563 0.071 0.612 0.120 Network radius 2.000 3.167 1.167 3.200 1.200 Average eccentricity 3.727 3.542 0.185 3.923 0.195 Network diameter 4.000 3.700 0.300 4.133 0.133 Table B.2 Roethlisberger & Dickson Wiring Room ABM MC vs. Random

Roethlisberger & Dickson T 푺풙̅ |푻 − 푺풙̅| 푹풙̅ |푻 − 푹풙̅| Nodes 14 14.000 0.000 14 0 Links 13 14.000 1.000 13.033 0.033 Components 6 3.033 2.967 3.2333 2.766 Network density 0.143 0.154 0.011 0.1445 0.002 Average degree 1.857 2.000 0.143 1.8762 0.019 Standard deviation degree 1.610 1.919 0.309 1.1747 0.435 Global clustering coefficient 0.643 0.168 0.475 0.1227 0.520 Average cluster coefficient 0.405 0.154 0.251 0.0882 0.316 Mean path length 2.222 2.301 0.079 2.4272 0.205 Average betweenness 4.889 7.202 2.313 7.3131 2.424 Maximum betweenness 16.000 33.797 17.797 23.9639 7.964 Average closeness 0.466 0.458 0.009 0.4452 0.021 Minimum closeness 0.333 0.341 0.008 0.3226 0.011 Average eigencentrality 0.594 0.461 0.133 0.5728 0.022 Network radius 5.000 3.633 1.367 3.9667 1.033 Average eccentricity 5.000 4.116 0.884 4.7154 0.285 Network diameter 5.000 4.200 0.800 4.900 0.100

185

Table B.3 Thurman Office ABM MC vs. Random

Thurman Office T 푺풙̅ |푻 − 푺풙̅| 푹풙̅ |푻 − 푹풙̅| Nodes 15 15.000 0.000 15 0 Links 33 34.000 1.000 34.600 1.600 Components 1 1.233 0.233 1.0667 0.067 Network density 0.314 0.324 0.010 0.330 0.015 Average degree 4.400 4.533 0.133 4.613 0.213 Standard deviation degree 2.530 2.574 0.045 1.665 0.865 Global clustering coefficient 0.516 0.374 0.142 0.331 0.185 Average cluster coefficient 0.414 0.426 0.013 0.345 0.069 Mean path length 1.876 1.780 0.096 1.818 0.058 Average betweenness 6.133 5.380 0.754 5.696 0.438 Maximum betweenness 37.248 24.347 12.900 17.536 19.711 Average closeness 0.546 0.576 0.030 0.561 0.015 Minimum closeness 0.424 0.429 0.005 0.440 0.016 Average eigencentrality 0.528 0.574 0.046 0.610 0.082 Network radius 3.000 2.800 0.200 2.733 0.267 Average eccentricity 3.000 3.102 0.102 3.137 0.137 Network diameter 3.000 3.233 0.233 3.233 0.233

Table B.4 Sampson Monastery ABM MC vs. Random

Sampson Monastery T 푺풙̅ |푻 − 푺풙̅| 푹풙̅ |푻 − 푹풙̅| Nodes 18 18.000 0.000 18.000 0 Links 41 42.000 1.000 40.933 0.0667 Components 1 1.233 0.233 1.033 0.0333 Network density 0.268 0.275 0.007 0.266 0.0019 Average degree 4.556 4.667 0.111 4.522 0.0334 Standard deviation degree 2.093 2.988 0.895 1.721 0.3712 Global clustering coefficient 0.262 0.351 0.089 0.266 0.0038 Average cluster coefficient 0.285 0.419 0.134 0.280 0.0054 Mean path length 1.967 1.894 0.073 1.979 0.0119 Average betweenness 8.222 7.500 0.722 8.310 0.0877 Maximum betweenness 37.623 38.996 1.373 26.158 11.4649 Average closeness 0.518 0.543 0.025 0.516 0.0014 Minimum closeness 0.405 0.394 0.011 0.393 0.0118 Average eigencentrality 0.481 0.522 0.041 0.575 0.0946 Network radius 3.000 3.100 0.100 3.100 0.1 Average eccentricity 3.778 3.642 0.136 3.621 0.1567 Network diameter 4.000 3.800 0.200 3.767 0.2333

186

Table B.5 Krackhardt Managers ABM MC vs. Random

Krackhardt Managers T 푺풙̅ |푻 − 푺풙̅| 푹풙̅ |푻 − 푹풙̅| Nodes 21 21.000 0.000 21 0 Links 36 36.000 0.000 36.733 0.733 Components 5 2.767 2.233 1.400 3.600 Network density 0.171 0.171 0.000 0.175 0.004 Average degree 3.429 3.429 0.000 3.498 0.070 Standard deviation degree 2.135 3.326 1.191 1.659 0.476 Global clustering coefficient 0.496 0.250 0.246 0.179 0.317 Average cluster coefficient 0.474 0.381 0.092 0.169 0.305 Mean path length 2.434 2.023 0.411 2.406 0.028 Average betweenness 11.471 9.355 2.116 13.753 2.282 Maximum betweenness 44.667 60.496 15.829 51.362 6.696 Average closeness 0.422 0.510 0.088 0.429 0.007 Minimum closeness 0.296 0.386 0.089 0.314 0.018 Average eigencentrality 0.450 0.447 0.003 0.489 0.039 Network radius 4.000 3.133 0.867 4.000 0.000 Average eccentricity 4.941 3.577 1.364 4.668 0.273 Network diameter 5.000 3.667 1.333 4.833 0.167

Table B.6 Krackhardt Office ABM MC vs. Random

Krackhardt Office T 푺풙̅ |푻 − 푺풙̅| 푹풙̅ |푻 − 푹풙̅| Nodes 21 21.000 0.000 21 0 Links 14 15.000 1.000 14 0 Components 9 8.300 0.700 8.867 0.1333 Network density 0.067 0.071 0.004 0.063 0.0042 Average degree 1.333 1.429 0.096 1.251 0.0825 Standard deviation degree 1.390 2.301 0.911 1.020 0.3703 Global clustering coefficient 0.125 0.084 0.041 0.081 0.0438 Average cluster coefficient 0.060 0.086 0.026 0.031 0.0293 Mean path length 2.382 2.108 0.273 2.409 0.027 Average betweenness 6.909 7.059 0.150 6.388 0.5209 Maximum betweenness 22.500 56.683 34.183 20.778 1.7222 Average closeness 0.433 0.496 0.062 0.460 0.0272 Minimum closeness 0.303 0.391 0.088 0.335 0.0324 Average eigencentrality 0.471 0.399 0.071 0.568 0.0971 Network radius 4.000 2.933 1.067 4.133 0.1333 Average eccentricity 4.818 3.316 1.502 4.729 0.0889 Network diameter 5.000 3.367 1.633 4.900 0.1

187

Table B.7 Schwimmer Taro Exchange ABM MC vs. Random

Schwimmer Taro Ex. T 푺풙̅ |푻 − 푺풙̅| 푹풙̅ |푻 − 푹풙̅| Nodes 22 22.000 0.000 22.000 0.0000 Links 39 39.000 0.000 39.750 0.7500 Components 1 2.433 1.433 1.500 0.5000 Network density 0.1688 0.169 0.000 0.1721 0.0033 Average degree 3.5455 3.545 0.001 3.6152 0.0697 Standard deviation degree 0.9625 3.657 2.695 1.7021 0.7396 Global clustering coefficient 0.2752 0.238 0.038 0.1676 0.1076 Average cluster coefficient 0.3394 0.402 0.062 0.1559 0.1835 Mean path length 2.4935 2.007 0.486 2.4319 0.0616 Average betweenness 15.6818 9.876 5.805 14.7772 0.9046 Maximum betweenness 46.3833 78.487 32.104 52.0263 5.6430 Average closeness 0.4047 0.513 0.109 0.4257 0.0210 Minimum closeness 0.3387 0.381 0.042 0.2947 0.0440 Average eigencentrality 0.6149 0.412 0.202 0.487 0.1277 Network radius 3 3.100 0.100 4.267 1.2667 Average eccentricity 4.6364 3.644 0.993 4.995 0.3587 Network diameter 5 3.767 1.233 5.233 0.2333

Table B.8 Webster Accounting Firm ABM MC vs. Random

Webster Accounting T 푺풙̅ |푻 − 푺풙̅| 푹풙̅ |푻 − 푹풙̅| Nodes 24 24.000 0.000 24 0.000 Links 150 151.000 1.000 149.533 0.467 Components 2 1.000 1.000 1 1 Network density 0.544 0.547 0.004 0.542 0.002 Average degree 12.500 12.583 0.083 12.461 0.039 Standard deviation degree 5.509 3.446 2.063 2.313 3.196 Global clustering coefficient 0.932 0.589 0.343 0.537 0.395 Average cluster coefficient 0.939 0.609 0.330 0.539 0.399 Mean path length 1.617 1.454 0.163 1.458 0.158 Average betweenness 6.783 5.215 1.568 5.271 1.512 Maximum betweenness 26.804 13.698 13.106 10.730 16.073 Average closeness 0.642 0.695 0.054 0.689 0.048 Minimum closeness 0.373 0.572 0.200 0.607 0.234 Average eigencentrality 0.654 0.712 0.058 0.752 0.098 Network radius 3.000 2.000 1.000 2.033 0.967 Average eccentricity 3.696 2.111 1.585 2.033 1.662 Network diameter 4.000 2.133 1.867 2.033 1.967 .

188

Table B.9 Zachary Karate Club ABM MC vs. Random

Zachary Karate Club T 푺풙̅ |푻 − 푺풙̅| 푹풙̅ |푻 − 푹풙̅| Nodes 34 34.000 0.000 34 0 Links 78 78.000 0.000 77.233 0.767 Components 1 2.100 1.100 1.067 0.067 Network density 0.139 0.139 0.000 0.145 0.005 Average degree 4.588 4.588 0.000 4.773 0.184 Standard deviation degree 3.878 5.384 1.506 1.957 1.921 Global clustering coefficient 0.256 0.208 0.047 0.142 0.113 Average cluster coefficient 0.571 0.503 0.067 0.145 0.425 Mean path length 2.408 2.031 0.377 2.361 0.047 Average betweenness 23.235 16.461 6.774 22.407 0.829 Maximum betweenness 231.071 160.366 70.705 79.619 151.453 Average closeness 0.427 0.504 0.077 0.431 0.004 Minimum closeness 0.285 0.377 0.093 0.319 0.035 Average eigencentrality 0.392 0.358 0.034 0.490 0.098 Network radius 4.000 3.100 0.900 3.733 0.267 Average eccentricity 4.794 3.637 1.157 4.540 0.255 Network diameter 5.000 3.733 1.267 4.700 0.300

Table B.10 Bernard & Killworth Technical ABM MC vs. Random

Bernard & Killworth Tech. T 푺풙̅ |푻 − 푺풙̅| 푹풙̅ |푻 − 푹풙̅| Nodes 34 34.000 0.000 34 0 Links 175 175.000 0.000 174.7 0.3 Components 1 1.000 0.000 1.000 0 Network density 0.312 0.312 0.000 0.311 0.001 Average degree 10.294 10.294 0.000 10.276 0.018 Standard deviation degree 4.629 6.289 1.660 2.577 2.052 Global clustering coefficient 0.476 0.423 0.053 0.311 0.165 Average cluster coefficient 0.460 0.563 0.103 0.314 0.147 Mean path length 1.808 1.718 0.089 1.718 0.090 Average betweenness 13.324 11.848 1.475 11.846 1.477 Maximum betweenness 63.291 62.923 0.367 30.957 32.334 Average closeness 0.564 0.592 0.028 0.585 0.021 Minimum closeness 0.393 0.461 0.068 0.493 0.101 Average eigencentrality 0.527 0.521 0.006 0.644 0.117 Network radius 2.000 2.500 0.500 2.300 0.300 Average eccentricity 3.765 3.018 0.747 2.944 0.821 Network diameter 4.000 3.067 0.933 3.000 1.000

189

Table B.11 Bernard & Killworth Office ABM MC vs. Random

Bernard & Killworth Office T 푺풙̅ |푻 − 푺풙̅| 푹풙̅ |푻 − 푹풙̅| Nodes 40 40.000 0.000 40 0 Links 238 238.000 0.000 240.6 2.600 Components 1 1.000 0.000 1 0 Network density 0.305 0.305 0.000 0.309 0.003 Average degree 11.900 11.900 0.000 12.030 0.130 Standard deviation degree 4.477 7.139 2.663 2.883 1.594 Global clustering coefficient 0.409 0.402 0.007 0.307 0.102 Average cluster coefficient 0.430 0.546 0.116 0.310 0.120 Mean path length 1.764 1.713 0.051 1.708 0.056 Average betweenness 14.900 13.903 0.998 13.802 1.098 Maximum betweenness 46.128 69.012 22.884 34.327 11.801 Average closeness 0.573 0.592 0.019 0.587 0.014 Minimum closeness 0.406 0.478 0.071 0.512 0.106 Average eigencentrality 0.583 0.534 0.049 0.652 0.070 Network radius 3.000 2.367 0.633 2.500 0.500 Average eccentricity 3.600 2.939 0.661 2.953 0.648 Network diameter 4.000 3.000 1.000 3.000 1.000

Table B.12 Krebs IT Department (Advice) ABM MC vs. Random

Krebs IT Department T 푺풙̅ |푻 − 푺풙̅| 푹풙̅ |푻 − 푹풙̅| Nodes 56 56.000 0.000 56.000 0.000 Links 203 203.000 0.000 200.667 2.333 Components 2 1.333 0.667 1.033 0.967 Network density 0.132 0.132 0.000 0.128 0.004 Average degree 7.250 7.250 0.000 7.033 0.217 Standard deviation degree 4.179 9.219 5.040 2.414 1.765 Global clustering coefficient 0.350 0.216 0.134 0.126 0.224 Average cluster coefficient 0.409 0.653 0.243 0.127 0.282 Mean path length 2.370 1.953 0.416 2.246 0.123 Average betweenness 36.982 26.058 10.924 34.259 2.723 Maximum betweenness 262.141 308.894 46.753 114.225 147.916 Average closeness 0.430 0.520 0.091 0.449 0.019 Minimum closeness 0.305 0.398 0.093 0.345 0.040 Average eigencentrality 0.316 0.314 0.002 0.507 0.191 Network radius 4.000 2.867 1.133 3.367 0.633 Average eccentricity 4.164 3.441 0.723 4.034 0.130 Network diameter 5.000 3.533 1.467 4.100 0.900

190

Table B.13 Lazega Law Firm ABM MC vs. Random

Lazega Law Firm T 푺풙̅ |푻 − 푺풙̅| 푹풙̅ |푻 − 푹풙̅| Nodes 71 71.000 0.000 71 0 Links 726 727.000 1.000 717.867 8.133 Components 1 1 0 1 0 Network density 0.292 0.293 0.001 0.289 0.003 Average degree 20.451 20.479 0.028 20.222 0.229 Standard deviation degree 8.096 13.612 5.516 3.772 4.324 Global clustering coefficient 0.444 0.410 0.034 0.287 0.157 Average cluster coefficient 0.452 0.597 0.145 0.288 0.164 Mean path length 1.751 1.709 0.042 1.713 0.039 Average betweenness 26.296 24.822 1.474 24.944 1.352 Maximum betweenness 106.694 125.214 18.520 54.208 52.486 Average closeness 0.576 0.594 0.018 0.585 0.009 Minimum closeness 0.446 0.512 0.066 0.539 0.093 Average eigencentrality 0.449 0.492 0.044 0.688 0.239 Network radius 3.000 2.100 0.900 2.033 0.967 Average eccentricity 3.000 2.730 0.270 2.740 0.260 Network diameter 3.000 2.900 0.100 2.933 0.067

191

CUSTOM COMPATIBILITY TABLES FOR STOCHASTIC BLOCK MODELING

Table C.1 Robins Australian Bank preference matrix for SBM

ISFP

0.326 0.326 0.460 0.191 0.326 0.090 0.191 0.326 0.326 0.191 0.191 0.460 0.090 0.559 0.460 0.161

ISTP

0.559 0.326 0.460 0.191 0.326 0.090 0.191 0.326 0.326 0.191 0.460 0.191 0.090 0.559 0.071 0.460

0.460 0.460 0.326 0.090 0.191 0.038 0.090 0.191 0.460 0.326 0.326 0.326 0.191 0.030 0.559 0.559

ESFP

0.191 0.460 0.090 0.326 0.191 0.460 0.326 0.191 0.460 0.559 0.326 0.326 0.071 0.191 0.090 0.090

ESTP

0.326 0.326 0.191 0.460 0.090 0.326 0.191 0.090 0.326 0.460 0.191 0.161 0.326 0.326 0.191 0.460

INFP

INFJ

0.326 0.559 0.460 0.191 0.326 0.326 0.191 0.326 0.090 0.460 0.296 0.191 0.326 0.326 0.460 0.191

0.326 0.559 0.191 0.191 0.090 0.326 0.191 0.090 0.326 0.438 0.460 0.460 0.559 0.326 0.191 0.191

ENFP

0.460 0.191 0.090 0.326 0.191 0.191 0.326 0.191 0.541 0.326 0.090 0.326 0.460 0.460 0.326 0.326

ENFJ

ISFJ

0.191 0.191 0.559 0.326 0.952 0.460 0.559 0.606 0.191 0.090 0.326 0.090 0.191 0.191 0.326 0.326

ISTJ

0.326 0.090 0.460 0.460 0.559 0.559 0.606 0.559 0.326 0.191 0.191 0.191 0.326 0.090 0.191 0.191

0.191 0.191 0.326 0.559 0.460 0.541 0.559 0.460 0.191 0.326 0.326 0.326 0.460 0.038 0.090 0.090

ESFJ

0.191 0.191 0.559 0.326 0.438 0.460 0.559 0.952 0.191 0.090 0.326 0.090 0.191 0.191 0.326 0.326

ESTJ

0.326 0.090 0.191 0.296 0.326 0.559 0.460 0.326 0.326 0.191 0.191 0.460 0.326 0.090 0.191 0.191

INTP

0.326 0.326 0.167 0.191 0.559 0.326 0.460 0.559 0.090 0.191 0.460 0.191 0.090 0.326 0.460 0.460

INTJ

0.191 0.071 0.326 0.090 0.191 0.191 0.090 0.191 0.191 0.559 0.559 0.326 0.460 0.460 0.326 0.326

ENTP

0.040 0.191 0.326 0.326 0.191 0.191 0.326 0.191 0.460 0.326 0.326 0.326 0.191 0.460 0.559 0.326

ENTJ

0 1 1 2 0 0 0 0 1 1 1 2 0 2 0 0

Sizes

Block

192

Table C.2 Roethlisberger & Dickson Bank Wiring Room preference matrix for SBM

ISFP

0.146 0.146 0.207 0.086 0.146 0.040 0.086 0.146 0.146 0.086 0.086 0.207 0.040 0.251 0.207 0.072

ISTP

0.251 0.146 0.207 0.086 0.146 0.040 0.086 0.146 0.146 0.086 0.207 0.086 0.040 0.251 0.039 0.207

0.207 0.207 0.146 0.040 0.086 0.038 0.040 0.086 0.207 0.146 0.146 0.146 0.086 0.030 0.251 0.251

ESFP

0.086 0.207 0.040 0.146 0.086 0.207 0.146 0.086 0.207 0.251 0.146 0.146 0.039 0.086 0.040 0.040

ESTP

0.146 0.146 0.086 0.207 0.040 0.146 0.086 0.040 0.146 0.207 0.086 0.072 0.146 0.146 0.086 0.207

INFP

INFJ

0.146 0.251 0.207 0.086 0.146 0.146 0.086 0.146 0.040 0.207 0.133 0.086 0.146 0.146 0.207 0.086

0.146 0.251 0.086 0.086 0.040 0.146 0.086 0.040 0.146 0.197 0.207 0.207 0.251 0.146 0.086 0.086

ENFP

0.207 0.086 0.040 0.146 0.086 0.086 0.146 0.086 0.243 0.146 0.040 0.146 0.207 0.207 0.146 0.146

ENFJ

ISFJ

0.086 0.086 0.251 0.146 0.952 0.207 0.251 0.272 0.086 0.040 0.146 0.040 0.086 0.086 0.146 0.146

ISTJ

0.146 0.040 0.207 0.207 0.251 0.251 0.272 0.251 0.146 0.086 0.086 0.086 0.146 0.040 0.086 0.086

0.086 0.086 0.146 0.251 0.207 0.243 0.251 0.207 0.086 0.146 0.146 0.146 0.207 0.038 0.040 0.040

ESFJ

0.086 0.086 0.251 0.146 0.197 0.207 0.251 0.952 0.086 0.040 0.146 0.040 0.086 0.086 0.146 0.146

ESTJ

0.146 0.040 0.086 0.133 0.146 0.251 0.207 0.146 0.146 0.086 0.086 0.207 0.146 0.040 0.086 0.086

INTP

0.146 0.146 0.075 0.086 0.251 0.146 0.207 0.251 0.040 0.086 0.207 0.086 0.040 0.146 0.207 0.207

INTJ

0.086 0.039 0.146 0.040 0.086 0.086 0.040 0.086 0.086 0.251 0.251 0.146 0.207 0.207 0.146 0.146

ENTP

0.040 0.086 0.146 0.146 0.086 0.086 0.146 0.086 0.207 0.146 0.146 0.146 0.086 0.207 0.251 0.146

ENTJ

1 0 0 1 0 0 1 1 2 3 0 1 2 1 1 0

Sizes

Block

193

Table C.3 Thurman Office preference matrix for SBM

ISFP

0.345 0.345 0.487 0.202 0.345 0.095 0.202 0.345 0.345 0.202 0.202 0.487 0.095 0.592 0.487 0.171

ISTP

0.592 0.345 0.487 0.202 0.345 0.095 0.202 0.345 0.345 0.202 0.487 0.202 0.095 0.592 0.075 0.487

0.487 0.487 0.345 0.095 0.202 0.040 0.095 0.202 0.487 0.345 0.345 0.345 0.202 0.040 0.592 0.592

ESFP

0.202 0.487 0.095 0.345 0.202 0.487 0.345 0.202 0.487 0.592 0.345 0.345 0.075 0.202 0.095 0.095

ESTP

0.345 0.345 0.202 0.487 0.095 0.345 0.202 0.095 0.345 0.487 0.202 0.171 0.345 0.345 0.202 0.487

INFP

INFJ

0.345 0.592 0.487 0.202 0.345 0.345 0.202 0.345 0.095 0.487 0.314 0.202 0.345 0.345 0.487 0.202

0.345 0.592 0.202 0.202 0.095 0.345 0.202 0.095 0.345 0.464 0.487 0.487 0.592 0.345 0.202 0.202

ENFP

0.487 0.202 0.095 0.345 0.202 0.202 0.345 0.202 0.573 0.345 0.095 0.345 0.487 0.487 0.345 0.345

ENFJ

ISFJ

0.202 0.202 0.592 0.345 0.950 0.487 0.592 0.642 0.202 0.095 0.345 0.095 0.202 0.202 0.345 0.345

ISTJ

0.345 0.095 0.487 0.487 0.592 0.592 0.642 0.592 0.345 0.202 0.202 0.202 0.345 0.095 0.202 0.202

0.202 0.202 0.345 0.592 0.487 0.573 0.592 0.487 0.202 0.345 0.345 0.345 0.487 0.040 0.095 0.095

ESFJ

0.202 0.202 0.592 0.345 0.464 0.487 0.592 0.950 0.202 0.095 0.345 0.095 0.202 0.202 0.345 0.345

ESTJ

0.345 0.095 0.202 0.314 0.345 0.592 0.487 0.345 0.345 0.202 0.202 0.487 0.345 0.095 0.202 0.202

INTP

0.345 0.345 0.177 0.202 0.592 0.345 0.487 0.592 0.095 0.202 0.487 0.202 0.095 0.345 0.487 0.487

INTJ

0.202 0.075 0.345 0.095 0.202 0.202 0.095 0.202 0.202 0.592 0.592 0.345 0.487 0.487 0.345 0.345

ENTP

0.040 0.202 0.345 0.345 0.202 0.202 0.345 0.202 0.487 0.345 0.345 0.345 0.202 0.487 0.592 0.345

ENTJ

1 2 3 0 0 2 0 0 2 1 1 1 0 1 1 0

Sizes

Block

194

Table C.4 Sampson Monastery preference matrix for SBM

ISFP

0.277 0.277 0.390 0.162 0.277 0.076 0.162 0.277 0.277 0.162 0.162 0.390 0.076 0.474 0.390 0.137

ISTP

0.474 0.277 0.390 0.162 0.277 0.076 0.162 0.277 0.277 0.162 0.390 0.162 0.076 0.474 0.060 0.390

0.390 0.390 0.277 0.076 0.162 0.038 0.076 0.162 0.390 0.277 0.277 0.277 0.162 0.030 0.474 0.474

ESFP

0.162 0.390 0.076 0.277 0.162 0.390 0.277 0.162 0.390 0.474 0.277 0.277 0.060 0.162 0.076 0.076

ESTP

0.277 0.277 0.162 0.390 0.076 0.277 0.162 0.076 0.277 0.390 0.162 0.137 0.277 0.277 0.162 0.390

INFP

INFJ

0.277 0.474 0.390 0.162 0.277 0.277 0.162 0.277 0.076 0.390 0.251 0.162 0.277 0.277 0.390 0.162

0.277 0.474 0.162 0.162 0.076 0.277 0.162 0.076 0.277 0.372 0.390 0.390 0.474 0.277 0.162 0.162

ENFP

0.390 0.162 0.076 0.277 0.162 0.162 0.277 0.162 0.459 0.277 0.076 0.277 0.390 0.390 0.277 0.277

ENFJ

ISFJ

0.162 0.162 0.474 0.277 0.952 0.390 0.474 0.514 0.162 0.076 0.277 0.076 0.162 0.162 0.277 0.277

ISTJ

0.277 0.076 0.390 0.390 0.474 0.474 0.514 0.474 0.277 0.162 0.162 0.162 0.277 0.076 0.162 0.162

0.162 0.162 0.277 0.474 0.390 0.459 0.474 0.390 0.162 0.277 0.277 0.277 0.390 0.038 0.076 0.076

ESFJ

0.162 0.162 0.474 0.277 0.372 0.390 0.474 0.952 0.162 0.076 0.277 0.076 0.162 0.162 0.277 0.277

ESTJ

0.277 0.076 0.162 0.251 0.277 0.474 0.390 0.277 0.277 0.162 0.162 0.390 0.277 0.076 0.162 0.162

INTP

0.277 0.277 0.142 0.162 0.474 0.277 0.390 0.474 0.076 0.162 0.390 0.162 0.076 0.277 0.390 0.390

INTJ

0.162 0.060 0.277 0.076 0.162 0.162 0.076 0.162 0.162 0.474 0.474 0.277 0.390 0.390 0.277 0.277

ENTP

0.040 0.162 0.277 0.277 0.162 0.162 0.277 0.162 0.390 0.277 0.277 0.277 0.162 0.390 0.474 0.277

ENTJ

2 2 0 2 0 1 0 0 2 3 1 1 0 3 0 1

Sizes

Block

195

Table C.5 Krackhardt Office preference matrix for SBM

ISFP

0.051 0.051 0.072 0.040 0.051 0.037 0.040 0.051 0.051 0.040 0.040 0.072 0.037 0.088 0.072 0.037

ISTP

0.088 0.051 0.072 0.040 0.051 0.037 0.040 0.051 0.051 0.040 0.072 0.040 0.037 0.088 0.039 0.072

0.072 0.072 0.051 0.037 0.040 0.038 0.037 0.040 0.072 0.051 0.051 0.051 0.040 0.030 0.088 0.088

ESFP

0.040 0.072 0.037 0.051 0.040 0.072 0.051 0.040 0.072 0.088 0.051 0.051 0.039 0.040 0.037 0.037

ESTP

0.051 0.051 0.040 0.072 0.037 0.051 0.040 0.037 0.051 0.072 0.040 0.037 0.051 0.051 0.040 0.072

INFP

INFJ

0.051 0.088 0.072 0.040 0.051 0.051 0.040 0.051 0.037 0.072 0.046 0.040 0.051 0.051 0.072 0.040

0.051 0.088 0.040 0.040 0.037 0.051 0.040 0.037 0.051 0.069 0.072 0.072 0.088 0.051 0.040 0.040

ENFP

0.072 0.040 0.037 0.051 0.040 0.040 0.051 0.040 0.085 0.051 0.037 0.051 0.072 0.072 0.051 0.051

ENFJ

ISFJ

0.040 0.040 0.088 0.051 0.952 0.072 0.088 0.095 0.040 0.037 0.051 0.037 0.040 0.040 0.051 0.051

ISTJ

0.051 0.037 0.072 0.072 0.088 0.088 0.095 0.088 0.051 0.040 0.040 0.040 0.051 0.037 0.040 0.040

0.040 0.040 0.051 0.088 0.072 0.085 0.088 0.072 0.040 0.051 0.051 0.051 0.072 0.038 0.037 0.037

ESFJ

0.040 0.040 0.088 0.051 0.069 0.072 0.088 0.952 0.040 0.037 0.051 0.037 0.040 0.040 0.051 0.051

ESTJ

0.051 0.037 0.040 0.046 0.051 0.088 0.072 0.051 0.051 0.040 0.040 0.072 0.051 0.037 0.040 0.040

INTP

0.051 0.051 0.038 0.040 0.088 0.051 0.072 0.088 0.037 0.040 0.072 0.040 0.037 0.051 0.072 0.072

INTJ

0.040 0.039 0.051 0.037 0.040 0.040 0.037 0.040 0.040 0.088 0.088 0.051 0.072 0.072 0.051 0.051

ENTP

0.040 0.040 0.051 0.051 0.040 0.040 0.051 0.040 0.072 0.051 0.051 0.051 0.040 0.072 0.088 0.051

ENTJ

1 2 2 1 1 1 0 3 1 2 3 2 1 0 1 0

Sizes

Block

196

Table C.6 Krackhardt High-Tech Managers preference matrix for SBM

ISFP

0.173 0.173 0.244 0.101 0.173 0.048 0.101 0.173 0.173 0.101 0.101 0.244 0.048 0.297 0.244 0.086

ISTP

0.297 0.173 0.244 0.101 0.173 0.048 0.101 0.173 0.173 0.101 0.244 0.101 0.048 0.297 0.039 0.244

0.244 0.244 0.173 0.048 0.101 0.038 0.048 0.101 0.244 0.173 0.173 0.173 0.101 0.030 0.297 0.297

ESFP

0.101 0.244 0.048 0.173 0.101 0.244 0.173 0.101 0.244 0.297 0.173 0.173 0.039 0.101 0.048 0.048

ESTP

0.173 0.173 0.101 0.244 0.048 0.173 0.101 0.048 0.173 0.244 0.101 0.086 0.173 0.173 0.101 0.244

INFP

INFJ

0.173 0.297 0.244 0.101 0.173 0.173 0.101 0.173 0.048 0.244 0.157 0.101 0.173 0.173 0.244 0.101

0.173 0.297 0.101 0.101 0.048 0.173 0.101 0.048 0.173 0.233 0.244 0.244 0.297 0.173 0.101 0.101

ENFP

0.244 0.101 0.048 0.173 0.101 0.101 0.173 0.101 0.287 0.173 0.048 0.173 0.244 0.244 0.173 0.173

ENFJ

ISFJ

0.101 0.101 0.297 0.173 0.952 0.244 0.297 0.322 0.101 0.048 0.173 0.048 0.101 0.101 0.173 0.173

ISTJ

0.173 0.048 0.244 0.244 0.297 0.297 0.322 0.297 0.173 0.101 0.101 0.101 0.173 0.048 0.101 0.101

0.101 0.101 0.173 0.297 0.244 0.287 0.297 0.244 0.101 0.173 0.173 0.173 0.244 0.038 0.048 0.048

ESFJ

0.101 0.101 0.297 0.173 0.233 0.244 0.297 0.952 0.101 0.048 0.173 0.048 0.101 0.101 0.173 0.173

ESTJ

0.173 0.048 0.101 0.157 0.173 0.297 0.244 0.173 0.173 0.101 0.101 0.244 0.173 0.048 0.101 0.101

INTP

0.173 0.173 0.089 0.101 0.297 0.173 0.244 0.297 0.048 0.101 0.244 0.101 0.048 0.173 0.244 0.244

INTJ

0.101 0.039 0.173 0.048 0.101 0.101 0.048 0.101 0.101 0.297 0.297 0.173 0.244 0.244 0.173 0.173

ENTP

0.040 0.101 0.173 0.173 0.101 0.101 0.173 0.101 0.244 0.173 0.173 0.173 0.101 0.244 0.297 0.173

ENTJ

2 4 1 0 3 0 1 2 2 1 0 1 2 1 1 0

Sizes

Block

197

Table C.7 Schwimmer Taro Exchange preference matrix for SBM

ISFP

0.177 0.177 0.249 0.103 0.177 0.048 0.103 0.177 0.177 0.103 0.103 0.249 0.048 0.303 0.249 0.087

ISTP

0.303 0.177 0.249 0.103 0.177 0.048 0.103 0.177 0.177 0.103 0.249 0.103 0.048 0.303 0.039 0.249

0.249 0.249 0.177 0.048 0.103 0.038 0.048 0.103 0.249 0.177 0.177 0.177 0.103 0.030 0.303 0.303

ESFP

0.103 0.249 0.048 0.177 0.103 0.249 0.177 0.103 0.249 0.303 0.177 0.177 0.039 0.103 0.048 0.048

ESTP

0.177 0.177 0.103 0.249 0.048 0.177 0.103 0.048 0.177 0.249 0.103 0.087 0.177 0.177 0.103 0.249

INFP

INFJ

0.177 0.303 0.249 0.103 0.177 0.177 0.103 0.177 0.048 0.249 0.161 0.103 0.177 0.177 0.249 0.103

0.177 0.303 0.103 0.103 0.048 0.177 0.103 0.048 0.177 0.237 0.249 0.249 0.303 0.177 0.103 0.103

ENFP

0.249 0.103 0.048 0.177 0.103 0.103 0.177 0.103 0.293 0.177 0.048 0.177 0.249 0.249 0.177 0.177

ENFJ

ISFJ

0.103 0.103 0.303 0.177 0.952 0.249 0.303 0.328 0.103 0.048 0.177 0.048 0.103 0.103 0.177 0.177

ISTJ

0.177 0.048 0.249 0.249 0.303 0.303 0.328 0.303 0.177 0.103 0.103 0.103 0.177 0.048 0.103 0.103

0.103 0.103 0.177 0.303 0.249 0.293 0.303 0.249 0.103 0.177 0.177 0.177 0.249 0.038 0.048 0.048

ESFJ

0.103 0.103 0.303 0.177 0.237 0.249 0.303 0.952 0.103 0.048 0.177 0.048 0.103 0.103 0.177 0.177

ESTJ

0.177 0.048 0.103 0.161 0.177 0.303 0.249 0.177 0.177 0.103 0.103 0.249 0.177 0.048 0.103 0.103

INTP

0.177 0.177 0.090 0.103 0.303 0.177 0.249 0.303 0.048 0.103 0.249 0.103 0.048 0.177 0.249 0.249

INTJ

0.103 0.039 0.177 0.048 0.103 0.103 0.048 0.103 0.103 0.303 0.303 0.177 0.249 0.249 0.177 0.177

ENTP

0.040 0.103 0.177 0.177 0.103 0.103 0.177 0.103 0.249 0.177 0.177 0.177 0.103 0.249 0.303 0.177

ENTJ

0 1 1 0 1 2 0 3 3 2 0 4 0 2 1 2

Sizes

Block

198

Table C.8 Webster Accounting Firm preference matrix for SBM

ISFP

0.641 0.641 0.950 0.375 0.641 0.176 0.375 0.641 0.641 0.375 0.375 0.950 0.176 0.950 0.950 0.317

ISTP

0.950 0.641 0.950 0.375 0.641 0.176 0.375 0.641 0.641 0.375 0.950 0.375 0.176 0.950 0.139 0.950

0.950 0.950 0.641 0.176 0.375 0.065 0.176 0.375 0.950 0.641 0.641 0.641 0.375 0.040 0.950 0.950

ESFP

0.375 0.950 0.176 0.641 0.375 0.950 0.641 0.375 0.950 0.950 0.641 0.641 0.139 0.375 0.176 0.176

ESTP

0.641 0.641 0.375 0.950 0.176 0.641 0.375 0.176 0.641 0.950 0.375 0.317 0.641 0.641 0.375 0.950

INFP

INFJ

0.641 0.950 0.950 0.375 0.641 0.641 0.375 0.641 0.176 0.950 0.583 0.375 0.641 0.641 0.950 0.375

0.641 0.950 0.375 0.375 0.176 0.641 0.375 0.176 0.641 0.862 0.950 0.950 0.950 0.641 0.375 0.375

ENFP

0.950 0.375 0.176 0.641 0.375 0.375 0.641 0.375 0.950 0.641 0.176 0.641 0.950 0.950 0.641 0.641

ENFJ

ISFJ

0.375 0.375 0.950 0.641 0.950 0.950 0.950 0.950 0.375 0.176 0.641 0.176 0.375 0.375 0.641 0.641

ISTJ

0.641 0.176 0.950 0.950 0.950 0.950 0.950 0.950 0.641 0.375 0.375 0.375 0.641 0.176 0.375 0.375

0.375 0.375 0.641 0.950 0.950 0.950 0.950 0.950 0.375 0.641 0.641 0.641 0.950 0.065 0.176 0.176

ESFJ

0.375 0.375 0.950 0.641 0.862 0.950 0.950 0.950 0.375 0.176 0.641 0.176 0.375 0.375 0.641 0.641

ESTJ

0.641 0.176 0.375 0.583 0.641 0.950 0.950 0.641 0.641 0.375 0.375 0.950 0.641 0.176 0.375 0.375

INTP

0.641 0.641 0.328 0.375 0.950 0.641 0.950 0.950 0.176 0.375 0.950 0.375 0.176 0.641 0.950 0.950

INTJ

0.375 0.139 0.641 0.176 0.375 0.375 0.176 0.375 0.375 0.950 0.950 0.641 0.950 0.950 0.641 0.641

ENTP

0.040 0.375 0.641 0.641 0.375 0.375 0.641 0.375 0.950 0.641 0.641 0.641 0.375 0.950 0.950 0.641

ENTJ

1 2 4 2 1 0 1 1 1 2 1 7 0 0 1 0

Sizes

Block

199

Table C.9 Zachary Karate Club preference matrix for SBM

ISFP

0.145 0.145 0.205 0.085 0.145 0.037 0.085 0.145 0.145 0.085 0.085 0.205 0.037 0.249 0.205 0.072

ISTP

0.249 0.145 0.205 0.085 0.145 0.037 0.085 0.145 0.145 0.085 0.205 0.085 0.037 0.249 0.039 0.205

0.205 0.205 0.145 0.037 0.085 0.038 0.037 0.085 0.205 0.145 0.145 0.145 0.085 0.030 0.249 0.249

ESFP

0.085 0.205 0.037 0.145 0.085 0.205 0.145 0.085 0.205 0.249 0.145 0.145 0.039 0.085 0.037 0.037

ESTP

0.145 0.145 0.085 0.205 0.037 0.145 0.085 0.037 0.145 0.205 0.085 0.072 0.145 0.145 0.085 0.205

INFP

INFJ

0.145 0.249 0.205 0.085 0.145 0.145 0.085 0.145 0.037 0.205 0.132 0.085 0.145 0.145 0.205 0.085

0.145 0.249 0.085 0.085 0.037 0.145 0.085 0.037 0.145 0.195 0.205 0.205 0.249 0.145 0.085 0.085

ENFP

0.205 0.085 0.037 0.145 0.085 0.085 0.145 0.085 0.241 0.145 0.037 0.145 0.205 0.205 0.145 0.145

ENFJ

ISFJ

0.085 0.085 0.249 0.145 0.952 0.205 0.249 0.270 0.085 0.037 0.145 0.037 0.085 0.085 0.145 0.145

ISTJ

0.145 0.037 0.205 0.205 0.249 0.249 0.270 0.249 0.145 0.085 0.085 0.085 0.145 0.037 0.085 0.085

0.085 0.085 0.145 0.249 0.205 0.241 0.249 0.205 0.085 0.145 0.145 0.145 0.205 0.038 0.037 0.037

ESFJ

0.085 0.085 0.249 0.145 0.195 0.205 0.249 0.952 0.085 0.037 0.145 0.037 0.085 0.085 0.145 0.145

ESTJ

0.145 0.037 0.085 0.132 0.145 0.249 0.205 0.145 0.145 0.085 0.085 0.205 0.145 0.037 0.085 0.085

INTP

0.145 0.145 0.074 0.085 0.249 0.145 0.205 0.249 0.037 0.085 0.205 0.085 0.037 0.145 0.205 0.205

INTJ

0.085 0.039 0.145 0.037 0.085 0.085 0.037 0.085 0.085 0.249 0.249 0.145 0.205 0.205 0.145 0.145

ENTP

0.040 0.085 0.145 0.145 0.085 0.085 0.145 0.085 0.205 0.145 0.145 0.145 0.085 0.205 0.249 0.145

ENTJ

4 5 2 1 1 1 1 0 2 4 3 4 3 1 1 1

Sizes

Block

200

Table C.10 Bernard & Killworth Technical preference matrix

ISFP

0.333 0.333 0.470 0.195 0.333 0.091 0.195 0.333 0.333 0.195 0.195 0.470 0.091 0.570 0.470 0.164

ISTP

0.570 0.333 0.470 0.195 0.333 0.091 0.195 0.333 0.333 0.195 0.470 0.195 0.091 0.570 0.072 0.470

0.470 0.470 0.333 0.091 0.195 0.038 0.091 0.195 0.470 0.333 0.333 0.333 0.195 0.030 0.570 0.570

ESFP

0.195 0.470 0.091 0.333 0.195 0.470 0.333 0.195 0.470 0.570 0.333 0.333 0.072 0.195 0.091 0.091

ESTP

0.333 0.333 0.195 0.470 0.091 0.333 0.195 0.091 0.333 0.470 0.195 0.164 0.333 0.333 0.195 0.470

INFP

INFJ

0.333 0.570 0.470 0.195 0.333 0.333 0.195 0.333 0.091 0.470 0.303 0.195 0.333 0.333 0.470 0.195

0.333 0.570 0.195 0.195 0.091 0.333 0.195 0.091 0.333 0.447 0.470 0.470 0.570 0.333 0.195 0.195

ENFP

0.470 0.195 0.091 0.333 0.195 0.195 0.333 0.195 0.553 0.333 0.091 0.333 0.470 0.470 0.333 0.333

ENFJ

ISFJ

0.195 0.195 0.570 0.333 0.952 0.470 0.570 0.618 0.195 0.091 0.333 0.091 0.195 0.195 0.333 0.333

ISTJ

0.333 0.091 0.470 0.470 0.570 0.570 0.618 0.570 0.333 0.195 0.195 0.195 0.333 0.091 0.195 0.195

0.195 0.195 0.333 0.570 0.470 0.553 0.570 0.470 0.195 0.333 0.333 0.333 0.470 0.038 0.091 0.091

ESFJ

0.195 0.195 0.570 0.333 0.447 0.470 0.570 0.952 0.195 0.091 0.333 0.091 0.195 0.195 0.333 0.333

ESTJ

0.333 0.091 0.195 0.303 0.333 0.570 0.470 0.333 0.333 0.195 0.195 0.470 0.333 0.091 0.195 0.195

INTP

0.333 0.333 0.170 0.195 0.570 0.333 0.470 0.570 0.091 0.195 0.470 0.195 0.091 0.333 0.470 0.470

INTJ

0.195 0.072 0.333 0.091 0.195 0.195 0.091 0.195 0.195 0.570 0.570 0.333 0.470 0.470 0.333 0.333

ENTP

0.040 0.195 0.333 0.333 0.195 0.195 0.333 0.195 0.470 0.333 0.333 0.333 0.195 0.470 0.570 0.333

ENTJ

2 1 1 5 1 3 3 2 3 3 0 1 2 2 2 3

Sizes

Block

201

Table C.11 Bernard & Killworth Office network preference matrix for SBM

ISFP

0.338 0.338 0.476 0.197 0.338 0.093 0.197 0.338 0.338 0.197 0.197 0.476 0.093 0.578 0.476 0.167

ISTP

0.578 0.338 0.476 0.197 0.338 0.093 0.197 0.338 0.338 0.197 0.476 0.197 0.093 0.578 0.073 0.476

0.476 0.476 0.338 0.093 0.197 0.038 0.093 0.197 0.476 0.338 0.338 0.338 0.197 0.030 0.578 0.578

ESFP

0.197 0.476 0.093 0.338 0.197 0.476 0.338 0.197 0.476 0.578 0.338 0.338 0.073 0.197 0.093 0.093

ESTP

0.338 0.338 0.197 0.476 0.093 0.338 0.197 0.093 0.338 0.476 0.197 0.167 0.338 0.338 0.197 0.476

INFP

INFJ

0.338 0.578 0.476 0.197 0.338 0.338 0.197 0.338 0.093 0.476 0.307 0.197 0.338 0.338 0.476 0.197

0.338 0.578 0.197 0.197 0.093 0.338 0.197 0.093 0.338 0.454 0.476 0.476 0.578 0.338 0.197 0.197

ENFP

0.476 0.197 0.093 0.338 0.197 0.197 0.338 0.197 0.560 0.338 0.093 0.338 0.476 0.476 0.338 0.338

ENFJ

ISFJ

0.197 0.197 0.578 0.338 0.952 0.476 0.578 0.627 0.197 0.093 0.338 0.093 0.197 0.197 0.338 0.338

ISTJ

0.338 0.093 0.476 0.476 0.578 0.578 0.627 0.578 0.338 0.197 0.197 0.197 0.338 0.093 0.197 0.197

0.197 0.197 0.338 0.578 0.476 0.560 0.578 0.476 0.197 0.338 0.338 0.338 0.476 0.038 0.093 0.093

ESFJ

0.197 0.197 0.578 0.338 0.454 0.476 0.578 0.952 0.197 0.093 0.338 0.093 0.197 0.197 0.338 0.338

ESTJ

0.338 0.093 0.197 0.307 0.338 0.578 0.476 0.338 0.338 0.197 0.197 0.476 0.338 0.093 0.197 0.197

INTP

0.338 0.338 0.173 0.197 0.578 0.338 0.476 0.578 0.093 0.197 0.476 0.197 0.093 0.338 0.476 0.476

INTJ

0.197 0.073 0.338 0.093 0.197 0.197 0.093 0.197 0.197 0.578 0.578 0.338 0.476 0.476 0.338 0.338

ENTP

0.040 0.197 0.338 0.338 0.197 0.197 0.338 0.197 0.476 0.338 0.338 0.338 0.197 0.476 0.578 0.338

ENTJ

3 6 3 2 1 1 1 2 3 3 1 3 2 0 5 4

Sizes

Block

202

Table C.12 Krebs IT Department Advice network preference matrix for SBM

ISFP

0.143 0.143 0.202 0.084 0.143 0.037 0.084 0.143 0.143 0.084 0.084 0.202 0.037 0.245 0.202 0.071

ISTP

0.245 0.143 0.202 0.084 0.143 0.037 0.084 0.143 0.143 0.084 0.202 0.084 0.037 0.245 0.039 0.202

0.202 0.202 0.143 0.037 0.084 0.038 0.037 0.084 0.202 0.143 0.143 0.143 0.084 0.030 0.245 0.245

ESFP

0.084 0.202 0.037 0.143 0.084 0.202 0.143 0.084 0.202 0.245 0.143 0.143 0.039 0.084 0.037 0.037

ESTP

0.143 0.143 0.084 0.202 0.037 0.143 0.084 0.037 0.143 0.202 0.084 0.071 0.143 0.143 0.084 0.202

INFP

INFJ

0.143 0.245 0.202 0.084 0.143 0.143 0.084 0.143 0.037 0.202 0.130 0.084 0.143 0.143 0.202 0.084

0.143 0.245 0.084 0.084 0.037 0.143 0.084 0.037 0.143 0.192 0.202 0.202 0.245 0.143 0.084 0.084

ENFP

0.202 0.084 0.037 0.143 0.084 0.084 0.143 0.084 0.238 0.143 0.037 0.143 0.202 0.202 0.143 0.143

ENFJ

ISFJ

0.084 0.084 0.245 0.143 0.952 0.202 0.245 0.266 0.084 0.037 0.143 0.037 0.084 0.084 0.143 0.143

ISTJ

0.143 0.037 0.202 0.202 0.245 0.245 0.266 0.245 0.143 0.084 0.084 0.084 0.143 0.037 0.084 0.084

0.084 0.084 0.143 0.245 0.202 0.238 0.245 0.202 0.084 0.143 0.143 0.143 0.202 0.038 0.037 0.037

ESFJ

0.084 0.084 0.245 0.143 0.192 0.202 0.245 0.952 0.084 0.037 0.143 0.037 0.084 0.084 0.143 0.143

ESTJ

0.143 0.037 0.084 0.130 0.143 0.245 0.202 0.143 0.143 0.084 0.084 0.202 0.143 0.037 0.084 0.084

INTP

0.143 0.143 0.073 0.084 0.245 0.143 0.202 0.245 0.037 0.084 0.202 0.084 0.037 0.143 0.202 0.202

INTJ

0.084 0.039 0.143 0.037 0.084 0.084 0.037 0.084 0.084 0.245 0.245 0.143 0.202 0.202 0.143 0.143

ENTP

0.040 0.084 0.143 0.143 0.084 0.084 0.143 0.084 0.202 0.143 0.143 0.143 0.084 0.202 0.245 0.143

ENTJ

7 5 4 4 0 3 2 4 3 7 1 6 3 3 2 2

Sizes

Block

203

Table C.13 Krebs IT Department Business network preference matrix for SBM

ISFP

0.273 0.273 0.385 0.160 0.273 0.075 0.160 0.273 0.273 0.160 0.160 0.385 0.075 0.468 0.385 0.135

ISTP

0.468 0.273 0.385 0.160 0.273 0.075 0.160 0.273 0.273 0.160 0.385 0.160 0.075 0.468 0.059 0.385

0.385 0.385 0.273 0.075 0.160 0.038 0.075 0.160 0.385 0.273 0.273 0.273 0.160 0.030 0.468 0.468

ESFP

0.160 0.385 0.075 0.273 0.160 0.385 0.273 0.160 0.385 0.468 0.273 0.273 0.059 0.160 0.075 0.075

ESTP

0.273 0.273 0.160 0.385 0.075 0.273 0.160 0.075 0.273 0.385 0.160 0.135 0.273 0.273 0.160 0.385

INFP

INFJ

0.273 0.468 0.385 0.160 0.273 0.273 0.160 0.273 0.075 0.385 0.248 0.160 0.273 0.273 0.385 0.160

0.273 0.468 0.160 0.160 0.075 0.273 0.160 0.075 0.273 0.367 0.385 0.385 0.468 0.273 0.160 0.160

ENFP

0.385 0.160 0.075 0.273 0.160 0.160 0.273 0.160 0.453 0.273 0.075 0.273 0.385 0.385 0.273 0.273

ENFJ

ISFJ

0.160 0.160 0.468 0.273 0.952 0.385 0.468 0.507 0.160 0.075 0.273 0.075 0.160 0.160 0.273 0.273

ISTJ

0.273 0.075 0.385 0.385 0.468 0.468 0.507 0.468 0.273 0.160 0.160 0.160 0.273 0.075 0.160 0.160

0.160 0.160 0.273 0.468 0.385 0.453 0.468 0.385 0.160 0.273 0.273 0.273 0.385 0.038 0.075 0.075

ESFJ

0.160 0.160 0.468 0.273 0.367 0.385 0.468 0.952 0.160 0.075 0.273 0.075 0.160 0.160 0.273 0.273

ESTJ

0.273 0.075 0.160 0.248 0.273 0.468 0.385 0.273 0.273 0.160 0.160 0.385 0.273 0.075 0.160 0.160

INTP

0.273 0.273 0.140 0.160 0.468 0.273 0.385 0.468 0.075 0.160 0.385 0.160 0.075 0.273 0.385 0.385

INTJ

0.160 0.059 0.273 0.075 0.160 0.160 0.075 0.160 0.160 0.468 0.468 0.273 0.385 0.385 0.273 0.273

ENTP

0.040 0.160 0.273 0.273 0.160 0.160 0.273 0.160 0.385 0.273 0.273 0.273 0.160 0.385 0.468 0.273

ENTJ

7 5 4 4 0 3 2 4 3 7 1 6 3 3 2 2

Sizes

Block

204

Table C.14 Lazega Law Firm preference matrix for SBM

ISFP

0.314 0.314 0.442 0.183 0.314 0.086 0.183 0.314 0.314 0.183 0.183 0.442 0.086 0.537 0.442 0.155

ISTP

0.537 0.314 0.442 0.183 0.314 0.086 0.183 0.314 0.314 0.183 0.442 0.183 0.086 0.537 0.039 0.442

0.442 0.442 0.314 0.086 0.183 0.038 0.086 0.183 0.442 0.314 0.314 0.314 0.183 0.030 0.537 0.537

ESFP

0.183 0.442 0.086 0.314 0.183 0.442 0.314 0.183 0.442 0.537 0.314 0.314 0.039 0.183 0.086 0.086

ESTP

0.314 0.314 0.183 0.442 0.086 0.314 0.183 0.086 0.314 0.442 0.183 0.155 0.314 0.314 0.183 0.442

INFP

INFJ

0.314 0.537 0.442 0.183 0.314 0.314 0.183 0.314 0.086 0.442 0.285 0.183 0.314 0.314 0.442 0.183

0.314 0.537 0.183 0.183 0.086 0.314 0.183 0.086 0.314 0.421 0.442 0.442 0.537 0.314 0.183 0.183

ENFP

0.442 0.183 0.086 0.314 0.183 0.183 0.314 0.183 0.521 0.314 0.086 0.314 0.442 0.442 0.314 0.314

ENFJ

ISFJ

0.183 0.183 0.537 0.314 0.952 0.442 0.537 0.583 0.183 0.086 0.314 0.086 0.183 0.183 0.314 0.314

ISTJ

0.314 0.086 0.442 0.442 0.537 0.537 0.583 0.537 0.314 0.183 0.183 0.183 0.314 0.086 0.183 0.183

0.183 0.183 0.314 0.537 0.442 0.521 0.537 0.442 0.183 0.314 0.314 0.314 0.442 0.038 0.086 0.086

ESFJ

0.183 0.183 0.537 0.314 0.421 0.442 0.537 0.952 0.183 0.086 0.314 0.086 0.183 0.183 0.314 0.314

ESTJ

0.314 0.086 0.183 0.285 0.314 0.537 0.442 0.314 0.314 0.183 0.183 0.442 0.314 0.086 0.183 0.183

INTP

0.314 0.314 0.161 0.183 0.537 0.314 0.442 0.537 0.086 0.183 0.442 0.183 0.086 0.314 0.442 0.442

INTJ

0.183 0.039 0.314 0.086 0.183 0.183 0.086 0.183 0.183 0.537 0.537 0.314 0.442 0.442 0.314 0.314

ENTP

0.040 0.183 0.314 0.314 0.183 0.183 0.314 0.183 0.442 0.314 0.314 0.314 0.183 0.442 0.537 0.314

ENTJ

9 3 9 5 0 4 1 4 4 7 6 1 2 1 3

12

Sizes

Block

205

RANDOMIZED METHODS REALISM COMPARISON TABLES

For all tables in Appendix D:

A green background indicates values < |MT – 푀R| < values with a pink background.

Table D.1 Randomized methods results for the Robins Australian Bank social network

|

S

T

0

M

|

0.400 0.019 0.625 0.450 0.007 0.073 0.477 0.084 0.091 0.741 0.659 0.007 0.010 0.079 0.093 0.200

10.065

S

11

0.224 1.375 Stochastic Block 1.450 0.298 2.982 1.391 0.291 0.314 2.759 4.432 0.044 0.029 0.571 2.998 3.800

16.400 15.101

|

S

T

0

M

|

0.000 0.043 0.050 0.200 0.000 0.000 0.498 0.102 0.106 0.338 0.105 8.118 0.005 0.007 0.094 0.107 0.025

Genetic

S

11

0.199 2.050 1.200 0.291 2.909 1.371 0.273 0.299 2.356 4.986 0.047 0.031 0.586 3.198 4.025

16.000 17.049

|

S

T

0

M

|

0.000 0.044 0.025 0.150 0.000 0.000 0.504 0.099 0.096 0.287 0.109 8.152 0.004 0.006 0.082 0.114 0.025

S

Monte Carlo

11

0.198 2.025 1.150 0.291 2.909 1.364 0.276 0.308 2.305 4.982 0.048 0.033 0.574 3.205 4.025

16.000 17.015

|

R

T

0

M

|

0.425 0.032 0.375 0.375 0.008 0.077 0.501 0.108 0.096 0.677 0.730 9.852 0.007 0.009 0.092 0.139 0.350

Random

R

11

0.211 1.625 1.375 0.299 2.986 1.368 0.267 0.309 2.695 4.361 0.045 0.029 0.584 2.952 3.650

16.425 15.314

T

2 1 4

16 11

M

0.242 0.291 2.909 1.868 0.375 0.405 2.018 5.091 0.052 0.038 0.492 3.091

25.167

Metric

Links Ginicoefficient Cluster radius Network Nodes Components Network density degreeAverage Standard deviation degree Global coefficient clustering clusterAverage coefficient Mean path length betweennessAverage betweennessMaximum closenessAverage closenessMinimum eigencentralityAverage eccentricityAverage Network diameter

206

Table D.2 Randomized methods results for Roethlisberger & Dickson network

|

S

0

T

0.250 0.084 0.500 2.875 0.003 0.036 0.398 0.549 0.597 2.830 3.913 0.006 0.002 0.040 1.109 0.425

10.494

M

|

S

14

Stochastic Block

0.292 0.500 3.125 0.146 1.893 1.212 0.094 0.111 6.510 7.055 0.016 0.007 0.422 3.680 5.425

13.250 26.494

|

S

0

T

0.000 0.101 0.300 3.050 0.000 0.000 0.424 0.522 0.568 2.885 4.696 0.005 0.000 0.005 1.379 0.750

12.004

M

|

Genetic

S

14

0.275 0.300 2.950 0.143 1.857 1.186 0.121 0.141 6.456 7.839 0.014 0.006 0.417 3.950 5.750

13.000 28.004

|

S

0

T

0.000 0.100 0.050 2.625 0.000 0.000 0.314 0.498 0.538 2.465 3.457 9.903 0.004 0.000 0.004 0.889 0.100

M

|

S

Monte Carlo

14

0.276 0.050 3.375 0.143 1.857 1.296 0.144 0.170 6.876 6.600 0.014 0.006 0.408 3.461 5.100

13.000 25.903

|

R

0

T

0.150 0.079 0.550 2.450 0.002 0.021 0.476 0.547 0.598 2.175 2.970 5.915 0.006 0.003 0.013 0.825 0.150

M

|

Random

R

14

0.297 0.550 3.550 0.141 1.836 1.135 0.095 0.111 7.165 6.112 0.015 0.008 0.395 3.396 5.150

12.850 21.915

T

0 6 5

13 14 16

M

1.61

0.376 0.143 1.857 0.643 0.708 9.341 3.143 0.009 0.005 0.382 2.571

Metric

cluster coefficient

Links Ginicoefficient Cluster radius Network Nodes Components Network density degreeAverage Standard deviation degree Global coefficient clustering Average Mean path length betweennessAverage betweennessMaximum closenessAverage closenessMinimum eigencentralityAverage eccentricityAverage Network diameter

207

Table D.3 Randomized methods results for Thurman Office social network

|

S

0

T

1.800 0.075 0.017 0.240 0.788 0.167 0.113 0.031 0.121 0.415 0.000 0.002 0.074 0.075 0.028 0.425

17.714

M

|

S

15

Stochastic Block

1.075 0.331 4.640 1.742 0.349 0.364 0.236 1.997 5.718 0.039 0.029 0.602 1.925 2.828 3.425

34.800 19.534

|

S

0

T

1.000 0.025 0.010 0.133 0.807 0.190 0.137 0.020 0.004 0.300 0.000 0.001 0.066 0.000 0.085 0.400

17.216

M

|

Genetic

S

15

1.025 0.324 4.533 1.723 0.326 0.341 0.225 1.880 5.833 0.039 0.029 0.594 2.000 2.885 3.400

34.000 20.032

|

S

0

T

1.000 0.000 0.010 0.133 0.745 0.169 0.103 0.009 0.026 0.183 0.000 0.001 0.058 0.025 0.110 0.400

17.519

M

|

S

Monte Carlo

15

1.000 0.324 4.533 1.785 0.347 0.374 0.214 1.850 5.950 0.039 0.029 0.586 2.025 2.910 3.400

34.000 19.728

|

R

0

T

1.200 0.075 0.011 0.160 0.876 0.209 0.157 0.012 0.086 0.378 0.000 0.001 0.081 0.000 0.045 0.375

18.746

M

|

Random

R

15

1.075 0.326 4.560 1.654 0.307 0.320 0.217 1.962 5.755 0.039 0.029 0.609 2.000 2.845 3.375

34.200 18.502

T

1 2 3

33 15

M

0.314 4.400 2.530 0.516 0.477 0.205 1.876 6.133 0.039 0.030 0.528 2.800

37.248

Metric

Links Nodes Components Network density degreeAverage Standard deviation degree Global coefficient clustering clusterAverage coefficient Cluster Ginicoefficient Mean path length betweennessAverage betweennessMaximum closenessAverage closenessMinimum eigencentralityAverage Network radius eccentricityAverage Network diameter

208

Table D.4 Randomized methods results for Sampson Monastery social network

|

S

0

T

0.050 0.065 1.600 0.010 0.178 0.318 0.009 0.004 0.004 0.251 0.000 0.002 0.107 0.175 0.090 0.275

12.211

M

|

S

18

Stochastic Block

1.050 2.032 0.278 4.733 1.775 0.271 0.281 0.228 7.971 0.030 0.022 0.587 2.175 3.090 3.725

42.600 25.411

|

S

0

T

0.050 0.094 0.000 0.000 0.000 0.230 0.000 0.009 0.004 0.003 8.063 0.001 0.003 0.084 0.275 0.144 0.125

M

|

Genetic

S

18

1.050 2.061 0.268 4.556 1.862 0.263 0.276 0.236 8.219 0.030 0.021 0.565 2.275 3.144 3.875

41.000 29.560

|

S

0

T

0.100 0.166 0.000 0.000 0.000 0.212 0.006 0.001 0.001 0.199 9.813 0.001 0.003 0.081 0.000 0.100 0.100

M

|

S

Monte Carlo

18

1.100 2.133 0.268 4.556 1.881 0.268 0.284 0.231 8.024 0.029 0.021 0.562 2.000 3.100 3.900

41.000 27.809

|

R

0

T

0.100 0.200 1.300 0.008 0.144 0.398 0.011 0.018 0.011 0.121 0.001 0.003 0.105 0.175 0.111 0.250

10.579

M

|

Random

R

18

1.100 2.168 0.259 4.411 1.695 0.252 0.267 0.243 8.343 0.029 0.021 0.585 2.175 3.111 3.750

39.700 27.044

T

1 2 4

18 41

M

1.967 0.268 4.556 2.093 0.262 0.285 0.232 8.222 0.030 0.024 0.481 3.000

37.623

Metric

Components length path Mean Nodes Links Network density degreeAverage Standard deviation degree Global coefficient clustering clusterAverage coefficient Cluster Ginicoefficient betweennessAverage betweennessMaximum closenessAverage closenessMinimum eigencentralityAverage Network radius eccentricityAverage Network diameter

209

Table D.5 Randomized methods results for the Krackhardt Office CSS network

|

S

0

T

0.425 1.275 0.002 0.040 0.195 0.044 0.063 0.046 1.248 3.730 0.001 0.000 0.009 0.000 0.879 0.975

14.525

M

|

S

21

Stochastic Block

7.725 0.069 1.374 1.195 0.081 0.096 0.371 7.396 0.004 0.002 0.256 0.000 3.212 5.975

14.425 14.594 37.025

|

S

0

T

0.000 1.075 0.000 0.000 0.256 0.045 0.063 0.058 0.033 1.269 6.475 0.000 0.000 0.103 0.000 0.502 0.700

M

|

Genetic

S

21

7.925 0.067 1.333 1.134 0.080 0.095 0.359 4.936 0.003 0.002 0.256 0.000 2.836 5.700

14.000 15.810 28.975

|

S

0

T

0.000 1.100 0.000 0.000 0.281 0.057 0.071 0.056 0.142 1.938 5.679 0.000 0.000 0.097 0.000 0.693 0.975

M

|

S

Monte Carlo

21

7.900 0.067 1.333 1.110 0.068 0.088 0.361 5.605 0.003 0.002 0.263 0.000 3.026 5.975

14.000 15.701 28.179

|

R

0

T

0.050 0.675 0.000 0.005 0.293 0.032 0.058 0.044 0.152 1.624 3.996 0.000 0.000 0.001 0.125 0.396 0.300

M

|

Random

R

21

8.325 0.067 1.338 1.097 0.093 0.101 0.373 5.290 0.004 0.003 0.246 0.125 2.730 5.300

14.050 15.691 26.496

T

9 0 5

21 14

M

0.067 1.333 1.390 0.125 0.158 0.417 3.667 0.003 0.002 0.246 2.333

15.843 22.500

Metric

Nodes Links Components Network density degreeAverage Standard deviation degree Global coefficient clustering clusterAverage coefficient Cluster Ginicoefficient Mean path length betweennessAverage betweennessMaximum closenessAverage closenessMinimum eigencentralityAverage Network radius eccentricityAverage Network diameter

210

Table D.6 Randomized methods results for the Krackhardt High-Tech Managers network

|

S

0

T

3.300 5.243 0.350 0.002 0.033 0.473 0.306 0.393 0.052 4.073 3.474 0.010 0.006 0.090 1.475 0.458 0.150

M

|

S

21

Stochastic Block

1.700 3.733 0.170 3.395 1.662 0.190 0.192 0.263 0.017 0.008 0.454 1.475 3.839 4.850

35.650 13.358 48.141

|

S

0

T

3.350 5.387 0.000 0.000 0.000 0.352 0.330 0.401 0.056 3.608 3.958 0.010 0.006 0.103 1.525 0.464 0.000

M

|

Genetic

S

21

1.650 3.589 0.171 3.429 1.783 0.167 0.184 0.260 0.017 0.008 0.475 1.525 3.845 5.000

36.000 12.894 48.625

|

S

0

T

3.350 5.416 0.000 0.000 0.000 0.480 0.324 0.416 0.056 3.671 0.343 0.010 0.006 0.104 1.625 0.437 0.050

M

|

S

Monte Carlo

21

1.650 3.560 0.171 3.429 1.655 0.172 0.169 0.260 0.017 0.009 0.476 1.625 3.818 4.950

36.000 12.957 44.324

|

R

0

T

3.425 5.494 0.025 0.000 0.002 0.499 0.337 0.410 0.044 4.300 6.789 0.011 0.007 0.114 1.750 0.592 0.025

M

|

Random

R

21

1.575 3.482 0.171 3.426 1.636 0.159 0.174 0.272 0.018 0.010 0.478 1.750 3.973 5.025

35.975 13.586 51.456

T

5 0 5

21 36

M

8.976 0.171 3.429 2.135 0.496 0.585 0.315 9.286 0.007 0.002 0.364 3.381

44.667

betweenness

Metric

Components length path Mean Nodes Links Network density degreeAverage Standard deviation degree Global coefficient clustering clusterAverage coefficient Cluster Ginicoefficient betweennessAverage Maximum closenessAverage closenessMinimum eigencentralityAverage Network radius eccentricityAverage Network diameter

211

Table D.7 Randomized methods results for the Schwimmer Taro Exchange network

|

S

0

T

0.023 0.650 0.275 0.325 0.001 0.025 0.688 0.079 0.133 0.633 0.683 0.001 0.005 0.143 0.042 0.150

10.150

M

|

S

22

Stochastic Block

0.269 2.350 1.325 0.170 3.570 1.650 0.196 0.207 3.126 0.018 0.011 0.472 4.049 5.150

39.275 14.999 56.533

|

S

0

T

0.030 0.750 0.000 0.300 0.000 0.000 0.678 0.115 0.166 0.552 1.102 5.303 0.001 0.006 0.133 0.097 0.050

M

|

Genetic

S

22

0.262 2.250 1.300 0.169 3.545 1.641 0.161 0.173 3.045 0.018 0.010 0.482 3.994 5.050

39.000 14.580 51.686

|

S

0

T

0.022 0.725 0.000 0.275 0.000 0.000 0.640 0.109 0.164 0.423 1.049 3.720 0.001 0.005 0.113 0.100 0.050

M

|

S

Monte Carlo

22

0.269 2.275 1.275 0.169 3.545 1.603 0.166 0.175 2.916 0.018 0.011 0.502 3.991 5.050

39.000 14.633 50.103

|

R

0

T

0.020 1.450 0.675 0.650 0.003 0.061 0.714 0.110 0.165 1.144 1.301 3.809 0.003 0.008 0.141 0.201 0.025

M

|

Random

R

22

0.272 1.550 1.650 0.166 3.484 1.676 0.166 0.175 3.638 0.016 0.008 0.474 3.890 5.025

38.325 14.381 50.192

T

3 1 5

22 39

M

0.292 0.169 3.545 0.963 0.275 0.339 2.494 0.019 0.016 0.615 4.091

15.682 46.383

Metric

eigencentrality

Cluster Ginicoefficient Cluster radius Network Nodes Links Components Network density degreeAverage Standard deviation degree Global coefficient clustering clusterAverage coefficient Mean path length betweennessAverage betweennessMaximum closenessAverage closenessMinimum Average eccentricityAverage Network diameter

212

Table D.8 Randomized methods results for the Webster Accounting Firm network

|

S

T

0

M

|

0.725 1.000 0.003 0.060 3.029 0.260 0.221 0.013 2.028 1.278 0.014 0.024 0.116 2.000 0.787 1.950

14.870

S

24

1.000 0.546 2.480 0.551 0.557 0.172 1.454 5.222 0.030 0.026 0.742 2.000 2.004 2.050

Stochastic Block

12.560 11.934

150.725

|

S

T

0

M

|

0.000 1.000 0.000 0.000 2.973 0.260 0.222 0.034 2.025 1.244 0.014 0.024 0.110 2.000 0.780 1.875

15.314

Genetic

S

24

1.000 0.543 2.536 0.551 0.556 0.194 1.457 5.256 0.030 0.026 0.736 2.000 2.011 2.125

12.500 11.489

150.000

|

S

T

0

M

|

0.000 1.000 0.000 0.000 3.029 0.264 0.226 0.030 2.025 1.245 0.014 0.024 0.109 2.000 0.782 1.900

15.524

S

Monte Carlo

24

1.000 0.543 2.480 0.547 0.551 0.189 1.457 5.255 0.030 0.026 0.735 2.000 2.009 2.100

12.500 11.280

150.000

|

R

T

0

M

|

0.300 1.000 0.001 0.025 3.262 0.271 0.233 0.034 2.026 1.261 0.014 0.025 0.133 2.000 0.790 1.975

15.943

Random

R

24

1.000 0.545 2.247 0.540 0.545 0.193 1.456 5.239 0.030 0.027 0.759 2.000 2.002 2.025

12.525 10.860

150.300

T

2 0 4

24

M

150

0.543 5.509 0.811 0.778 0.160 3.482 6.500 0.016 0.002 0.627 2.792

12.500 26.803

coefficient

Metric

Links Nodes Components Network density degreeAverage Standard deviation degree Global coefficient clustering clusterAverage Cluster Ginicoefficient Mean path length betweennessAverage betweennessMaximum closenessAverage closenessMinimum eigencentralityAverage Network radius eccentricityAverage Network diameter

213

Table D.9 Randomized methods results for the Zachary Karate Club network

|

S

0

T

0.275 0.506 0.195 0.025 0.000 0.001 1.895 0.117 0.440 0.005 0.460 0.001 0.002 0.083 0.750 0.275

148.598

M

|

S

34

Stochastic Block

1.275 2.914 3.835 0.139 4.587 1.983 0.139 0.148 0.290 0.012 0.007 0.475 2.250 4.725

77.975 22.775 82.474

|

S

0

T

0.125 0.242 0.102 0.000 0.000 0.000 1.886 0.110 0.437 0.011 0.015 0.001 0.001 0.079 0.375 0.100

148.709

M

|

Genetic

S

34

1.125 2.650 3.927 0.139 4.588 1.992 0.145 0.151 0.274 0.012 0.008 0.471 2.625 4.900

78.000 23.220 82.362

|

S

0

T

0.100 0.200 0.101 0.000 0.000 0.000 1.861 0.114 0.436 0.001 0.089 0.001 0.000 0.063 0.300 0.150

141.704

M

|

S

Monte Carlo

34

1.100 2.608 3.928 0.139 4.588 2.017 0.142 0.152 0.284 0.012 0.008 0.455 2.700 4.850

78.000 23.324 89.367

|

R

0

T

0.375 0.707 0.163 0.900 0.002 0.053 1.936 0.116 0.444 0.008 0.301 0.002 0.002 0.070 0.925 0.200

147.247

M

|

Random

R

34

1.375 3.115 3.866 0.137 4.535 1.942 0.140 0.144 0.294 0.011 0.006 0.462 2.075 4.800

77.100 22.935 83.824

T

1 3 5

34 78

M

2.408 4.029 0.139 4.588 3.878 0.256 0.588 0.285 0.013 0.009 0.392

23.235

231.071

Metric

Components length path Mean eccentricityAverage Nodes Links Network density degreeAverage Standard deviation degree Global coefficient clustering clusterAverage coefficient Cluster Ginicoefficient betweennessAverage betweennessMaximum closenessAverage closenessMinimum eigencentralityAverage Network radius Network diameter

214

Table D.10 Randomized methods results for the Bernard & Killworth Technical network

|

S

T

0

M

|

1.150 0.040 0.000 0.000 0.002 0.068 1.894 0.146 0.143 0.091 1.499 0.001 0.003 0.115 0.389 1.000

32.446

S

34

0.228 2.000 1.000 0.314 2.735 0.330 0.331 1.717 0.018 0.015 0.642 2.493 3.000

Stochastic Block

10.362 11.825 30.845

176.150

|

S

T

0

M

|

0.000 0.048 0.000 0.000 0.000 0.000 1.890 0.153 0.151 0.089 1.470 0.001 0.003 0.105 0.385 1.000

32.607

Genetic

S

34

0.236 2.000 1.000 0.312 2.739 0.323 0.323 1.718 0.018 0.015 0.632 2.498 3.000

10.294 11.854 30.684

175.000

|

S

T

0

M

|

0.000 0.040 0.050 0.025 0.000 0.000 1.801 0.153 0.150 0.043 1.509 0.000 0.003 0.099 0.387 0.975

31.251

S

Monte Carlo

34

0.228 1.950 1.025 0.312 2.828 0.323 0.324 1.765 0.018 0.015 0.627 2.496 3.025

10.294 11.815 32.040

175.000

|

R

T

0

M

|

0.425 0.061 0.000 0.000 0.001 0.025 1.998 0.166 0.160 0.094 1.546 0.001 0.003 0.125 0.447 1.000

31.928

Random

R

34

0.249 2.000 1.000 0.313 2.630 0.310 0.314 1.714 0.018 0.015 0.653 2.435 3.000

10.319 11.778 31.362

175.425

T

2 1 4

34

M

175

0.188 0.312 4.629 0.476 0.474 1.807 0.017 0.012 0.527 2.882

10.294 13.324 63.291

Metric

Ginicoefficient

Links Cluster radius Network Nodes Components Network density degreeAverage Standard deviation degree Global coefficient clustering clusterAverage coefficient Mean path length betweennessAverage betweennessMaximum closenessAverage closenessMinimum eigencentralityAverage eccentricityAverage Network diameter

215

Table D.11 Randomized methods results for the Bernard & Killworth Office network

|

S

T

0

M

|

1.611 0.000 1.500 0.000 0.002 0.075 0.100 0.118 0.062 0.054 1.051 0.003 0.073 0.000 0.436 1.000

10.647

S

40

2.866 0.015 1.000 0.307 0.309 0.312 0.242 1.710 0.013 0.655 2.000 2.389 3.000

Stochastic Block

11.975 13.849 35.481

239.500

|

S

T

0

M

|

1.075 0.000 1.000 0.000 0.001 0.050 0.095 0.112 0.046 0.048 0.939 3.433 0.002 0.035 0.000 0.378 0.975

Genetic

S

40

3.402 0.015 1.000 0.306 0.314 0.318 0.227 1.716 0.013 0.618 2.000 2.447 3.025

11.950 13.961 42.695

239.000

|

S

T

0

M

|

1.419 0.000 1.000 0.025 0.001 0.050 0.097 0.115 0.055 0.004 1.032 8.993 0.002 0.056 0.050 0.419 1.000

S

Monte Carlo

40

3.058 0.015 1.025 0.306 0.312 0.315 0.235 1.760 0.013 0.638 1.950 2.406 3.000

11.950 13.868 37.135

239.000

|

R

T

0

M

|

1.633 0.000 0.200 0.000 0.000 0.010 0.105 0.122 0.062 0.053 1.042 8.996 0.003 0.073 0.000 0.476 1.000

Random

R

40

2.844 0.015 1.000 0.305 0.304 0.308 0.242 1.711 0.013 0.655 2.000 2.349 3.000

11.910 13.858 37.132

238.200

T

1 2 4

40

M

238

4.477 0.015 0.305 0.409 0.430 0.180 1.764 0.010 0.583 2.825

11.900 14.900 46.128

Metric

eigencentrality

Standard deviation degree deviation Standard closenessAverage Nodes Links Components Network density degreeAverage Global coefficient clustering clusterAverage coefficient Cluster Ginicoefficient Mean path length betweennessAverage betweennessMaximum closenessMinimum Average Network radius eccentricityAverage Network diameter

216

Table D.12 Randomized methods results for Krebs IT Department Advice network

|

S

T

0

M

|

1.690 0.003 0.500 0.950 0.000 0.018 0.217 0.291 0.034 1.976 3.027 0.006 0.197 2.825 0.400 1.000

154.134

S

56

2.489 0.008 1.050 0.132 7.268 0.133 0.133 0.306 2.309 0.006 0.507 2.825 3.260 4.000

Stochastic Block

33.294

203.500 108.008

|

S

T

0

M

|

1.551 0.003 0.000 0.975 0.000 0.000 0.214 0.287 0.041 2.015 2.746 0.006 0.184 2.925 0.294 0.900

147.149

Genetic

S

56

2.628 0.008 1.025 0.132 7.250 0.136 0.138 0.299 2.270 0.006 0.494 2.925 3.367 4.100

33.575

203.000 114.992

|

S

T

0

M

|

1.608 0.003 0.000 0.950 0.000 0.000 0.211 0.283 0.059 1.969 2.825 0.006 0.188 2.850 0.350 0.900

148.339

S

Monte Carlo

56

2.571 0.008 1.050 0.132 7.250 0.139 0.141 0.280 2.316 0.006 0.498 2.850 3.311 4.100

33.496

203.000 113.802

|

R

T

0

M

|

1.734 0.003 1.550 1.000 0.001 0.055 0.221 0.295 0.045 2.063 2.701 0.006 0.204 3.000 0.351 0.950

154.841

Random

R

56

2.445 0.008 1.000 0.131 7.195 0.129 0.129 0.295 2.223 0.006 0.514 3.000 3.310 4.050

33.620

201.450 107.300

T

2 0 5

56

M

203

4.179 0.005 0.132 7.250 0.350 0.424 0.340 4.285 0.000 0.310 3.661

36.321

262.141

coefficient

Metric

Standard deviation degree deviation Standard closenessAverage Nodes Links Components Network density degreeAverage Global coefficient clustering clusterAverage Cluster Ginicoefficient Mean path length betweennessAverage betweennessMaximum closenessMinimum eigencentralityAverage Network radius eccentricityAverage Network diameter

217

Table D.13 Randomized methods results for Krebs IT Department Business network

|

S

T

0

M

|

1.125 0.000 0.001 0.040 2.130 0.238 0.304 0.020 0.129 3.545 0.001 0.002 0.125 0.000 0.324 0.000

62.924

S

56

1.000 0.251 3.069 0.255 0.256 0.258 1.772 0.010 0.009 0.642 2.000 2.569 3.000

Stochastic Block

13.781 21.223 53.401

385.875

|

S

T

0

M

|

0.000 0.000 0.000 0.000 1.710 0.233 0.300 0.011 0.126 3.453 0.001 0.001 0.099 0.000 0.285 0.000

58.347

Genetic

S

56

1.000 0.251 3.488 0.260 0.260 0.250 1.775 0.010 0.009 0.615 2.000 2.608 3.000

13.821 21.315 57.978

387.000

|

S

T

0

M

|

0.000 0.000 0.000 0.000 1.833 0.236 0.302 0.004 0.128 3.531 0.001 0.001 0.109 0.000 0.323 0.000

59.933

S

Monte Carlo

56

1.000 0.251 3.365 0.257 0.258 0.242 1.772 0.010 0.009 0.626 2.000 2.570 3.000

13.821 21.237 56.392

387.000

|

R

T

0

M

|

6.800 0.000 0.004 0.243 2.088 0.247 0.313 0.019 0.122 3.350 0.001 0.001 0.119 0.000 0.285 0.000

60.989

Random

R

56

1.000 0.247 3.110 0.245 0.247 0.258 1.779 0.010 0.009 0.635 2.000 2.608 3.000

13.579 21.418 55.337

380.200

T

1 2 3

56

M

387

0.251 5.198 0.493 0.560 0.238 1.901 0.010 0.008 0.517 2.893

13.821 24.768

116.325

betweenness

Metric

Nodes Links Components Network density degreeAverage Standard deviation degree Global coefficient clustering clusterAverage coefficient Cluster Ginicoefficient Mean path length betweennessAverage Maximum closenessAverage closenessMinimum eigencentralityAverage Network radius eccentricityAverage Network diameter

218

Table D.14 Randomized methods results for the Lazega Law Firm network

|

S

T

0

M

|

4.186 0.000 0.050 0.000 0.000 0.001 0.143 0.152 0.002 0.042 1.464 0.001 0.221 0.000 0.650 0.050

50.339

S

71

3.910 0.008 1.000 0.292 0.298 0.297 0.237 1.709 0.008 0.669 2.000 2.096 2.950

Stochastic Block

20.452 24.832 56.355

726.050

|

S

T

0

M

|

4.181 0.000 0.000 0.000 0.000 0.000 0.144 0.152 0.017 0.042 1.454 0.001 0.240 0.000 0.638 0.000

51.648

Genetic

S

71

3.914 0.008 1.000 0.292 0.297 0.298 0.256 1.710 0.008 0.688 2.000 2.109 3.000

20.451 24.842 55.046

726.000

|

S

T

0

M

|

4.119 0.000 0.000 0.000 0.000 0.000 0.143 0.151 0.024 0.042 1.461 0.001 0.232 0.000 0.649 0.025

51.457

S

Monte Carlo

71

3.977 0.008 1.000 0.292 0.298 0.298 0.214 1.710 0.008 0.680 2.000 2.098 2.975

20.451 24.835 55.236

726.000

|

R

T

0

M

|

4.336 0.000 3.375 0.000 0.001 0.095 0.150 0.158 0.018 0.040 1.414 0.001 0.250 0.000 0.651 0.025

53.479

Random

R

71

3.759 0.008 1.000 0.291 0.291 0.291 0.256 1.711 0.008 0.697 2.000 2.096 2.975

20.356 24.882 53.214

722.625

T

1 2 3

71

M

726

8.095 0.008 0.292 0.441 0.449 0.239 1.751 0.006 0.448 2.746

20.451 26.296

106.694

Metric

Standard deviation degree deviation Standard closenessAverage Nodes Links Components Network density degreeAverage Global coefficient clustering clusterAverage coefficient Cluster Ginicoefficient Mean path length betweennessAverage betweennessMaximum closenessMinimum eigencentralityAverage Network radius eccentricityAverage Network diameter

219

HEURISTIC METHODS REALISM COMPARISON TABLES

For all tables in Appendix E, bold indicates the lowest values. Table E.1 Heuristic methods results for the Robins Australian Bank network.

)

M

0.00 0.00 3.74 0.00 0.00 1.70 0.61 1.48 5.83 4.47 0.62 6.08 0.04 0.04 0.44 0.43 1.41 2.56 3.87

34.81

L2(

M)

0.00 0.00 0.00 0.00 7.46 2.76 7.73 2.85 0.18 0.16 2.11 2.05 2.00

14.00 23.69 14.00 29.00 11.55 13.00

L1(

159.33

|

̅

-

0.00 0.00 0.47 0.00 0.00 0.05 0.07 0.26 0.74 0.13 0.04 0.57 2.99 0.00 0.00 0.07 0.06 0.00 0.23 0.43

T

|

̅

1.47 0.29 2.91 1.82 0.45 0.66 2.75 3.13 0.20 4.52 0.06 0.04 0.56 0.20 2.00 2.86 3.57

11.00 16.00 22.18

)

P

0.00 0.00 1.41 0.00 0.00 1.93 0.32 1.29 2.27 4.24 0.76 4.75 0.03 0.03 0.43 0.54 1.73 2.32 4.12

L2( 31.75

)

P

0.00 0.00 2.00 0.00 0.00 8.79 1.27 6.80 6.55 3.72 0.11 0.11 1.96 2.66 3.00

L1( 18.00 21.09 10.73 17.00

140.33

|

̅

-

0.00 0.00 0.07 0.00 0.00 0.12 0.01 0.23 0.11 0.53 0.12 0.04 1.09 0.00 0.00 0.06 0.08 0.10 0.09 0.43

T

|

̅

1.07 0.29 2.91 1.75 0.38 0.63 2.13 2.47 0.12 5.06 0.05 0.04 0.55 0.22 2.10 3.01 3.57

11.00 16.00 26.26

)

F

0.00 2.00 0.34 3.35 3.49 1.48 1.55 6.28 5.10 0.66 0.05 0.05 0.46 0.46 4.69 5.26 6.48

L2(

18.41 11.35 35.15

F)

0.00 4.00 1.73 7.29 7.46 2.91 0.24 0.25 1.91 2.08

L1(

95.00 17.27 17.92 21.20 18.00 53.64 20.00 23.73 26.00

153.90

|

̅

-

0.00 3.17 0.13 0.06 0.58 0.60 0.24 0.24 0.71 0.47 0.10 1.35 3.56 0.01 0.01 0.05 0.01 0.67 0.69 0.73

|T

̅

1.13 0.23 2.33 1.27 0.14 0.17 2.73 3.47 0.15 6.44 0.05 0.03 0.54 0.15 2.67 3.78 4.73

11.00 12.83 21.60

T

1.00 0.29 2.91 1.87 0.38 0.41 2.02 3.00 0.24 5.09 0.05 0.04 0.49 0.14 2.00 3.09 4.00

11.00 16.00 25.17

Metrics

Nodes Links Components Network density Average degree Standard deviation degree Global cluster coefficient Average cluster coefficient Mean path length Communities Gini coefficient Average betweenness Maximum betweenness Average closeness Minimum closeness Average eigencentrality Minimum eigencentrality Network radius Average eccentricity Network diameter

220

Table E.2 Heuristic methods results for Roethlisberger & Dickson network.

)

M

0.00 0.00 1.73 0.00 0.00 1.07 1.14 0.71 1.66 4.90 0.39 5.22 0.07 0.06 0.52 0.73 5.48 3.85 8.31

25.69

L2(

M)

0.00 0.00 3.00 0.00 0.00 4.38 5.80 3.29 5.84 1.55 0.33 0.30 2.44 3.21

18.00 25.93 30.00 20.00 43.00

L1(

117.17

|

̅

-

0.00 0.00 0.10 0.00 0.00 0.13 0.19 0.07 0.02 0.60 0.05 0.86 0.04 0.01 0.01 0.06 0.10 1.00 0.67 1.43

T

|

̅

6.10 0.14 1.86 1.74 0.45 0.64 9.32 7.60 0.32 2.28 0.07 0.05 0.65 0.30 2.00 1.91 3.57

14.00 13.00 15.96

)

P

0.00 0.00 3.61 0.00 0.00 1.63 1.24 0.95 3.10 4.69 0.26 8.28 0.13 0.13 0.69 0.93 6.71 5.87

L2( 35.77 11.40

)

P

0.00 0.00 0.00 0.00 7.30 5.88 4.89 0.85 0.64 0.65 2.67 4.15

L1( 13.00 13.51 18.00 42.50 35.00 30.71 60.00

159.33

|

̅

-

0.00 0.00 0.43 0.00 0.00 0.24 0.20 0.08 0.28 0.60 0.03 1.42 2.69 0.02 0.02 0.08 0.14 1.17 1.02 2.00

T

|

̅

6.43 0.14 1.86 1.85 0.45 0.63 9.62 7.60 0.34 1.73 0.08 0.06 0.68 0.34 1.83 1.55 3.00

14.00 13.00 13.31

)

F

0.00 1.73 0.16 2.13 1.65 2.76 3.08 2.23 5.29 0.34 4.12 0.05 0.05 0.43 0.54 3.46 2.37 4.58

L2(

14.93 25.42

F)

0.00 3.00 0.85 7.90 5.78 1.40 0.20 0.17 1.98 2.22

L1(

77.00 11.00 14.44 16.02 22.00 18.57 12.00 10.50 19.00

101.42

|

̅

-

0.00 2.57 0.10 0.03 0.37 0.26 0.48 0.53 0.15 0.73 0.04 0.01 3.16 0.00 0.00 0.05 0.01 0.33 0.19 0.43

|T

̅

6.10 0.12 1.49 1.35 0.16 0.17 9.49 7.73 0.32 3.16 0.06 0.04 0.65 0.21 2.67 2.38 4.57

14.00 10.43 12.84

T

6.00 0.14 1.86 1.61 0.64 0.71 9.34 7.00 0.37 3.14 0.06 0.04 0.59 0.20 3.00 2.57 5.00

14.00 13.00 16.00

deviation

Metrics

Nodes Links Components Network density Average degree Standard degree Global cluster coefficient Average cluster coefficient Mean path length Communities Gini coefficient Average betweenness Maximum betweenness Average closeness Minimum closeness Average eigencentrality Minimum eigencentrality Network radius Average eccentricity Network diameter

221

Table E.3 Heuristic methods results for Thurman Office social network.

)

M

0.00 0.00 1.00 0.00 0.00 2.39 0.32 1.29 1.63 8.43 0.76 3.67 0.01 0.01 0.24 0.32 0.00 1.31 1.73

65.38

L2(

M)

0.00 0.00 1.00 0.00 0.00 1.46 6.90 3.67 3.48 0.06 0.04 1.21 1.68 0.00 5.93 3.00

12.43 35.00 16.73

L1(

307.51

|

̅

-

0.00 0.00 0.03 0.00 0.00 0.41 0.05 0.23 0.02 0.63 0.11 0.54 0.00 0.00 0.04 0.04 0.00 0.15 0.10

T

10.03

|

̅

1.03 0.31 4.40 2.94 0.47 0.71 1.86 3.63 0.28 5.59 0.04 0.03 0.49 0.15 2.00 2.65 3.10

15.00 33.00 47.27

)

P

0.00 0.00 0.00 0.00 0.00 2.91 0.32 1.38 0.50 0.80 3.53 0.01 0.01 0.24 0.29 0.00 1.45 2.83

L2( 10.10 64.49

)

P

0.00 0.00 0.00 0.00 0.00 1.47 7.49 2.47 4.03 0.06 0.06 1.21 1.46 0.00 7.13 8.00

L1( 15.07 48.00 17.27

312.71

|

̅

-

0.00 0.00 0.00 0.00 0.00 0.50 0.05 0.25 0.08 0.87 0.11 0.53 9.97 0.00 0.00 0.04 0.04 0.00 0.13 0.27

T

|

̅

1.00 0.31 4.40 3.03 0.47 0.73 1.80 3.87 0.28 5.61 0.04 0.03 0.49 0.15 2.00 2.67 3.27

15.00 33.00 47.22

)

F

0.00 1.41 0.40 5.61 4.29 1.51 1.28 3.16 7.42 0.57 0.03 0.04 0.24 0.29 4.47 4.09 7.68

L2(

42.07 11.24 57.01

F)

0.00 2.00 2.13 7.96 6.34 2.44 0.14 0.18 1.08 1.39

L1(

29.87 22.57 11.83 31.00 56.67 20.00 20.33 39.00

224.00 282.45

|

̅

-

0.00 7.47 0.07 0.07 1.00 0.75 0.27 0.20 0.39 0.77 0.05 1.89 8.61 0.01 0.01 0.01 0.01 0.67 0.68 1.30

|T

̅

1.07 0.24 3.40 1.78 0.25 0.28 2.27 3.77 0.22 8.02 0.03 0.02 0.54 0.12 2.67 3.48 4.30

15.00 25.53 28.64

T

1.00 0.31 4.40 2.53 0.52 0.48 1.88 3.00 0.18 6.13 0.04 0.03 0.53 0.11 2.00 2.80 3.00

15.00 33.00 37.25

Metrics

Nodes Links Components Network density Average degree Standard deviation degree Global cluster coefficient Average cluster coefficient Mean path length Communities Gini coefficient Average betweenness Maximum betweenness Average closeness Minimum closeness Average eigencentrality Minimum eigencentrality Network radius Average eccentricity Network diameter

222 Table E.4 Heuristic methods results for Sampson Monastery network.

)

M

0.00 0.00 0.00 0.00 0.00 5.12 0.60 2.10 0.87 5.75 0.72 7.39 0.02 0.02 0.38 0.40 1.41 2.67 5.66

L2(

250.19

M)

0.00 0.00 0.00 0.00 0.00 3.20 4.54 3.46 0.08 0.10 2.06 1.92 2.00

26.91 11.36 25.00 38.56 12.39 28.00

L1(

1348.16

|

̅

-

0.00 0.00 0.00 0.00 0.00 0.90 0.11 0.38 0.15 0.83 0.11 1.29 0.00 0.00 0.07 0.05 0.07 0.40 0.93

T

44.94

|

̅

1.00 0.27 4.56 2.99 0.37 0.66 1.82 3.83 0.19 6.94 0.03 0.03 0.41 0.22 1.93 2.60 3.07

18.00 41.00 82.56

)

P

0.00 0.00 0.00 0.00 0.00 4.35 0.56 1.92 0.77 5.10 0.66 6.54 0.01 0.02 0.36 0.37 1.00 2.20 5.66

L2(

228.02

)

P

0.00 0.00 0.00 0.00 0.00 2.93 4.07 3.21 0.07 0.09 1.89 1.79 1.00

L1( 22.36 10.22 22.00 34.56 10.50 30.00

1219.35

|

̅

-

0.00 0.00 0.00 0.00 0.00 0.75 0.10 0.34 0.14 0.60 0.10 1.15 0.00 0.00 0.06 0.06 0.03 0.35 1.00

T

40.65

|

̅

1.00 0.27 4.56 2.84 0.36 0.63 1.83 3.60 0.18 7.07 0.03 0.03 0.42 0.23 1.97 2.65 3.00

18.00 41.00 78.27

)

F

0.00 0.00 0.26 4.36 3.25 0.50 0.59 1.06 7.07 0.82 9.00 0.02 0.02 0.31 0.40 4.90 2.56 2.45

L2(

39.19 39.48

F)

0.00 0.00 1.36 2.31 2.75 5.33 3.52 0.08 0.07 1.32 1.78 6.00

L1(

23.11 16.35 28.00 45.33 24.00 12.50

208.00 170.51

|

̅

-

0.00 6.93 0.00 0.05 0.77 0.55 0.07 0.07 0.18 0.67 0.12 1.51 0.81 0.00 0.00 0.02 0.01 0.80 0.40 0.07

|T

̅

1.00 0.22 3.79 1.55 0.20 0.21 2.15 3.67 0.19 9.73 0.03 0.02 0.50 0.16 2.80 3.40 4.07

18.00 34.07 36.81

T

1.00 0.27 4.56 2.09 0.26 0.29 1.97 3.00 0.07 8.22 0.03 0.02 0.48 0.17 2.00 3.00 4.00

18.00 41.00 37.62

deviation

Metrics

Nodes Links Components Network density Average degree Standard degree Global cluster coefficient Average cluster coefficient Mean path length Communities Gini coefficient Average betweenness Maximum betweenness Average closeness Minimum closeness Average eigencentrality Minimum eigencentrality Network radius Average eccentricity Network diameter

223 Table E.5 Heuristic methods results for the Krackhardt Office CSS network.

)

M

0.00 0.00 2.24 0.00 0.00 4.06 0.13 3.01 8.38 4.36 0.18 5.23 0.05 0.06 0.48 0.81 6.71 3.99

11.40

L2(

182.25

M)

0.00 0.00 5.00 0.00 0.00 0.59 0.80 0.17 0.21 2.54 4.19

21.70 16.38 41.18 17.00 22.48 966.0 35.00 20.10 60.00

L1(

|

̅

-

0.00 0.00 0.17 0.00 0.00 0.72 0.01 0.55 1.31 0.50 0.02 0.16 0.00 0.01 0.09 0.14 1.17 0.67 2.00

T

32.20

|

̅

9.17 0.07 1.33 2.11 0.13 0.70 0.42 3.51 0.05 0.04 0.39 0.25 1.83 1.66 3.00

21.00 14.00 14.53 10.50 54.70

)

P

0.00 0.00 2.00 0.00 0.00 3.93 0.14 2.90 8.77 3.74 0.17 5.73 0.04 0.05 0.45 0.82 6.71 3.68

L2( 10.58

186.61

)

P

0.00 0.00 4.00 0.00 0.00 0.69 0.77 0.17 0.18 2.36 4.30

L1( 20.55 15.78 44.27 14.00 27.43 35.00 17.05 54.00

1004.0

|

̅

-

0.00 0.00 0.13 0.00 0.00 0.69 0.01 0.53 1.48 0.47 0.02 0.23 0.00 0.00 0.08 0.14 1.17 0.57 1.80

T

33.47

|

̅

9.13 0.07 1.33 2.08 0.14 0.68 0.42 3.90 0.04 0.04 0.39 0.25 1.83 1.77 3.20

21.00 14.00 14.37 10.47 55.97

)

F

0.00 9.75 4.58 0.05 0.93 1.38 0.60 0.76 8.83 8.72 0.43 0.24 0.19 0.56 0.64 4.00 3.07 6.16

L2(

12.11 71.60

F)

0.00 0.21 4.29 6.23 2.91 3.80 2.04 0.66 0.52 2.33 2.62

L1(

45.00 17.00 41.12 40.00 53.33 14.00 13.00 26.00

315.33

|

̅

-

0.00 1.50 0.43 0.01 0.14 0.21 0.05 0.05 0.61 1.33 0.07 0.20 2.64 0.01 0.01 0.05 0.07 0.13 0.08 0.20

|T

̅

9.43 0.06 1.19 1.18 0.08 0.11 0.33 3.86 0.06 0.04 0.52 0.17 2.87 2.41 5.20

21.00 12.50 16.45 11.33 25.14

T

9.00 0.07 1.33 1.39 0.13 0.16 0.40 3.67 0.04 0.03 0.47 0.11 3.00 2.33 5.00

21.00 14.00 15.84 10.00 22.50

Metrics

Nodes Links Components Network density Average degree Standard deviation degree Global cluster coefficient Average cluster coefficient Mean path length Communities Gini coefficient Average betweenness Maximum betweenness Average closeness Minimum closeness Average eigencentrality Minimum eigencentrality Network radius Average eccentricity Network diameter

224 Table E.6 Heuristic methods results for the Krackhardt High-Tech Managers network.

)

M

0.00 0.00 0.00 0.00 0.00 3.37 0.60 0.41 1.68 4.69 0.26 0.04 0.03 0.16 0.58 5.29 5.24 8.19

16.82 127.4

L2(

M)

0.00 0.00 0.00 0.00 0.00 3.12 1.79 9.13 1.17 0.19 0.16 0.80 2.91

17.60 20.00 91.33 28.00 28.10 41.00

L1(

636.21

|

̅

-

0.00 0.00 0.00 0.00 0.00 0.59 0.10 0.05 0.30 0.53 0.03 3.04 0.01 0.01 0.01 0.10 0.93 0.94 1.37

T

20.54

|

̅

5.00 0.17 3.43 2.72 0.39 0.63 8.67 7.53 0.41 6.24 0.03 0.02 0.44 0.14 2.07 2.44 3.63

21.00 36.00 65.20

)

P

0.00 0.00 1.73 0.00 0.00 3.18 0.72 0.32 2.45 5.75 0.27 0.04 0.04 0.23 0.63 5.29 5.22 7.42

L2( 17.67 87.09

)

P

0.00 0.00 3.00 0.00 0.00 3.83 1.44 1.09 0.20 0.18 0.96 3.21

L1( 16.83 11.42 23.00 95.29 28.00 27.95 37.00

394.28

|

̅

-

0.00 0.00 0.10 0.00 0.00 0.56 0.13 0.02 0.17 0.63 0.02 3.18 0.01 0.01 0.02 0.11 0.93 0.93 1.23

T

12.88

|

̅

5.10 0.17 3.43 2.70 0.37 0.60 8.81 7.63 0.42 6.11 0.03 0.03 0.47 0.15 2.07 2.45 3.77

21.00 36.00 57.55

)

F

0.00 0.00 0.13 2.52 1.56 1.70 2.14 1.10 6.48 0.34 0.02 0.03 0.76 0.91 2.24 3.62 5.29

L2(

26.42 10.97

104.29

F)

0.00 0.00 0.65 7.82 9.23 5.71 1.48 0.10 0.14 3.83 4.56 5.00

L1(

12.95 11.63 24.00 57.10 544.5 19.29 28.00

136.00

|

̅

-

0.00 4.53 0.00 0.02 0.43 0.26 0.31 0.39 0.18 0.73 0.03 1.83 0.00 0.00 0.13 0.15 0.17 0.59 0.80

|T

17.20

̅

5.00 0.15 3.00 1.88 0.19 0.20 8.79 7.73 0.41 7.46 0.03 0.02 0.58 0.19 2.83 2.79 4.20

21.00 31.47 27.47

T

5.00 0.17 3.43 2.14 0.50 0.59 8.98 7.00 0.44 9.29 0.03 0.02 0.45 0.04 3.00 3.38 5.00

21.00 36.00 44.67

encentrality

Metrics

Nodes Links Components Network density Average degree Standard deviation degree Global cluster coefficient Average cluster coefficient Mean path length Communities Gini coefficient Average betweenness Maximum betweenness Average closeness Minimum closeness Average eig Minimum eigencentrality Network radius Average eccentricity Network diameter

225 Table E.7 Heuristic methods results for the Schwimmer Taro Exchange network.

)

M

0.00 0.00 2.65 0.00 0.00 9.49 0.29 2.08 5.91 5.10 0.55 0.02 0.01 1.58 1.28 4.90 4.45 4.90

17.09

L2(

615.46

M)

0.00 0.00 7.00 0.00 0.00 1.35 2.48 0.09 0.02 8.67 7.00

51.78 11.29 18.58 20.00 84.64 24.00 23.00 24.00

L1(

3343.28

|

̅

-

0.00 0.00 0.23 0.00 0.00 1.73 0.05 0.38 0.29 0.33 0.08 2.82 0.00 0.00 0.29 0.23 0.80 0.77 0.80

T

|

111.44

̅

1.23 0.17 3.55 2.69 0.32 0.72 2.78 5.33 0.21 0.02 0.02 0.33 0.08 2.20 3.32 4.20

22.00 39.00 12.86

157.83

)

P

0.00 0.00 3.32 0.00 0.00 9.33 0.37 2.10 6.67 4.36 0.51 0.02 0.01 1.59 1.34 5.00 4.59 5.00

L2( 17.33

613.21

)

P

0.00 0.00 9.00 0.00 0.00 1.77 2.29 0.09 0.03 8.69 7.28

L1( 50.71 11.43 20.68 17.00 81.91 25.00 23.32 25.00

3322.93

|

̅

-

0.00 0.00 0.30 0.00 0.00 1.69 0.06 0.38 0.42 0.23 0.08 2.73 0.00 0.00 0.29 0.24 0.83 0.77 0.83

T

|

110.76

̅

1.30 0.17 3.55 2.65 0.33 0.72 2.91 5.23 0.20 0.02 0.02 0.33 0.07 2.17 3.32 4.17

22.00 39.00 12.95

157.15

)

F

0.00 0.00 0.09 1.91 0.69 0.96 1.30 1.22 4.58 0.53 0.01 0.02 0.64 1.01 3.46 2.86 4.47

L2(

20.98 12.79 80.14

F)

0.00 0.00 0.46 9.64 2.90 4.99 6.88 5.03 2.43 0.03 0.07 3.08 4.92

L1(

106.0 17.00 52.77 12.00 11.14 16.00

319.93

|

̅

-

0.00 3.53 0.00 0.02 0.32 0.00 0.17 0.23 0.16 0.03 0.07 1.73 7.38 0.00 0.00 0.10 0.16 0.40 0.31 0.40

|T

̅

1.00 0.15 3.22 0.96 0.11 0.11 2.66 4.97 0.20 0.02 0.01 0.51 0.15 3.40 4.40 5.40

22.00 35.47 17.41 53.76

T

1.00 0.17 3.55 0.96 0.28 0.34 2.49 5.00 0.13 0.02 0.02 0.62 0.32 3.00 4.09 5.00

22.00 39.00 15.68 46.38

cluster

Metrics

Nodes Links Components Network density Average degree Standard deviation degree Global cluster coefficient Average coefficient Mean path length Communities Gini coefficient Average betweenness Maximum betweenness Average closeness Minimum closeness Average eigencentrality Minimum eigencentrality Network radius Average eccentricity Network diameter

226 Table E.8 Heuristic methods results for the Webster Accounting Firm network.

)

M

0.00 0.00 0.00 0.00 0.00 1.81 0.52 0.21 1.03 7.07 0.39 0.02 0.04 0.26 0.92 1.73 4.41 7.75

11.83 51.33

L2(

M)

0.00 0.00 0.00 0.00 0.00 8.81 2.84 1.06 5.63 1.71 0.11 0.22 1.31 4.98 3.00

28.00 64.79 24.08 40.00

L1(

260.14

|

̅

-

0.00 0.00 0.00 0.00 0.00 0.29 0.09 0.04 0.19 0.07 0.04 2.16 8.63 0.00 0.01 0.04 0.17 0.10 0.80 1.33

T

|

̅

2.00 0.54 5.22 0.72 0.74 3.29 5.07 0.48 4.34 0.03 0.02 0.70 0.18 1.90 1.99 2.67

24.00 150.0 12.50 18.18

)

P

0.00 0.00 0.00 0.00 0.00 1.88 0.55 0.20 1.03 7.14 0.30 0.02 0.04 0.31 0.90 1.41 4.36 7.94

L2( 11.80 55.33

)

P

0.00 0.00 0.00 0.00 0.00 9.19 2.96 0.96 5.62 1.29 0.11 0.21 1.58 4.80 2.00

L1( 27.00 64.63 23.79 41.00

286.41

|

̅

-

0.00 0.00 0.00 0.00 0.00 0.31 0.10 0.03 0.19 0.03 0.03 2.15 9.45 0.00 0.01 0.05 0.16 0.07 0.79 1.37

T

|

̅

2.00 0.54 5.20 0.71 0.75 3.30 4.97 0.49 4.35 0.03 0.02 0.71 0.18 1.93 2.00 2.63

24.00 150.0 12.50 17.36

)

F

0.00 0.00 0.90 1.95 1.72 0.13 8.78 0.72 1.46 0.01 0.03 0.23 1.07 0.00 2.98 5.48

L2(

20.79 11.18 45.64

249.42

F)

0.00 0.00 4.92 9.32 0.54 3.36 6.25 0.02 0.15 1.09 5.69 0.00

L1(

60.90 10.67 37.00 16.13 30.00

1359.0 113.25 228.43

|

̅

-

0.00 0.00 0.16 3.78 2.03 0.36 0.31 0.00 0.30 0.11 0.01 7.38 0.00 0.01 0.03 0.19 0.00 0.54 1.00

|T

45.30

̅

2.00 0.38 8.73 3.48 0.46 0.47 3.48 4.70 0.41 6.49 0.03 0.02 0.69 0.21 2.00 2.25 3.00

24.00 104.7 19.42

T

2.00 0.54 5.51 0.81 0.78 3.48 5.00 0.52 6.50 0.03 0.02 0.65 0.02 2.00 2.79 4.00

24.00 150.0 12.50 26.80

Metrics

Nodes Links Components Network density Average degree Standard deviation degree Global cluster coefficient Average cluster coefficient Mean path length Communities Gini coefficient Average betweenness Maximum betweenness Average closeness Minimum closeness Average eigencentrality Minimum eigencentrality Network radius Average eccentricity Network diameter

227 Table E.9 Heuristic methods results for the Zachary Karate Club network.

)

M

0.00 0.00 2.45 0.00 0.00 1.63 0.44 0.48 9.70 0.86 0.01 0.01 0.53 0.21 3.16 2.97 4.24

14.09 17.73

L2(

293.17

M)

0.00 0.00 6.00 0.00 0.00 7.49 2.25 2.49 4.37 0.02 0.03 2.84 1.07

36.32 42.00 72.18 10.00 12.88 16.00

L1(

1321.97

|

̅

-

0.00 0.00 0.20 0.00 0.00 0.21 0.08 0.08 1.19 1.27 0.15 0.10 0.00 0.00 0.10 0.04 0.27 0.21 0.13

T

22.09

|

̅

1.20 0.14 4.59 3.67 0.33 0.67 3.60 6.27 0.31 0.01 0.01 0.30 0.03 2.73 3.82 4.87

34.00 78.00 23.14

208.98

)

P

0.00 0.00 2.45 0.00 0.00 1.38 0.33 0.51 0.96 0.01 0.01 0.51 0.20 3.46 3.18 3.87

L2( 12.16 12.41 18.05

277.04

)

P

0.00 0.00 4.00 0.00 0.00 6.10 1.63 2.71 5.04 0.03 0.03 2.73 1.01

L1( 22.98 58.00 72.03 12.00 14.50 15.00

1271.06

|

̅

-

0.00 0.00 0.13 0.00 0.00 0.00 0.05 0.09 0.64 1.80 0.17 1.11 3.30 0.00 0.00 0.09 0.03 0.40 0.33 0.30

T

|

̅

1.13 0.14 4.59 3.87 0.31 0.68 3.05 6.80 0.33 0.01 0.01 0.30 0.03 2.60 3.70 4.70

34.00 78.00 22.13

234.37

)

F

0.00 2.00 0.12 4.04 6.63 0.56 2.18 4.99 0.87 0.00 0.01 0.36 0.20 1.00 1.79 3.16

L2(

68.61 18.55 14.83

409.13

F)

0.00 4.00 0.66 2.96 4.28 0.02 0.02 1.79 0.99 1.00 6.94 8.00

L1(

21.82 35.61 11.85 13.27 82.00 68.62

371.00

2032.19

|

̅

-

0.00 0.13 0.02 0.73 1.19 0.10 0.40 0.44 2.53 0.14 2.03 0.00 0.00 0.06 0.03 0.03 0.09 0.13

|T

12.37 67.74

̅

1.13 0.12 3.86 2.69 0.16 0.19 2.85 7.53 0.31 0.01 0.01 0.33 0.04 3.03 4.12 5.13

34.00 65.63 25.26

163.33

7

T

1.00 0.14 4.59 3.88 0.26 0.59 2.41 5.00 0.17 0.01 0.01 0.39 0.06 3.00 4.03 5.00

34.00 78.00 23.24 231.0

rk radius

Metrics

Nodes Links Components Network density Average degree Standard deviation degree Global cluster coefficient Average cluster coefficient Mean path length Communities Gini coefficient Average betweenness Maximum betweenness Average closeness Minimum closeness Average eigencentrality Minimum eigencentrality Netwo Average eccentricity Network diameter

228 Table E.10 Heuristic methods results for the Bernard & Killworth Technical network.

)

M

0.00 0.00 2.45 0.00 0.00 1.63 0.44 0.48 9.70 0.86 0.01 0.01 0.53 0.21 3.16 2.97 4.24

14.09 17.73

L2(

293.17

M)

0.00 0.00 6.00 0.00 0.00 7.49 2.25 2.49 4.37 0.02 0.03 2.84 1.07

36.32 42.00 72.18 10.00 12.88 16.00

L1(

1321.97

|

̅

̅

̅

̅

-

0.00 0.00 0.20 0.00 0.00 0.21 0.08 0.08 1.19 1.27 0.15 0.10 0.00 0.00 0.10 0.04 0.27 0.21 0.13

22.09

T

|

̅

1.20 0.14 4.59 3.67 0.33 0.67 3.60 6.27 0.31 0.01 0.01 0.30 0.03 2.73 3.82 4.87

34.00 78.00 23.14

208.98

)

P

0.00 0.00 2.45 0.00 0.00 1.38 0.33 0.51 0.96 0.01 0.01 0.51 0.20 3.46 3.18 3.87

L2( 12.16 12.41 18.05

277.04

)

P

0.00 0.00 4.00 0.00 0.00 6.10 1.63 2.71 5.04 0.03 0.03 2.73 1.01

L1( 22.98 58.00 72.03 12.00 14.50 15.00

1271.06

|

̅

-

0.00 0.00 0.13 0.00 0.00 0.00 0.05 0.09 0.64 1.80 0.17 1.11 3.30 0.00 0.00 0.09 0.03 0.40 0.33 0.30

T

|

̅

1.13 0.14 4.59 3.87 0.31 0.68 3.05 6.80 0.33 0.01 0.01 0.30 0.03 2.60 3.70 4.70

34.00 78.00 22.13

234.37

)

F

0.00 2.00 0.12 4.04 6.63 0.56 2.18 4.99 0.87 0.00 0.01 0.36 0.20 1.00 1.79 3.16

L2(

68.61 18.55 14.83

409.13

F)

0.00 4.00 0.66 2.96 4.28 0.02 0.02 1.79 0.99 1.00 6.94 8.00

L1(

21.82 35.61 11.85 13.27 82.00 68.62

371.00

2032.19

|

̅

-

0.00 0.13 0.02 0.73 1.19 0.10 0.40 0.44 2.53 0.14 2.03 0.00 0.00 0.06 0.03 0.03 0.09 0.13

12.37 67.74

|T

̅

1.13 0.12 3.86 2.69 0.16 0.19 2.85 7.53 0.31 0.01 0.01 0.33 0.04 3.03 4.12 5.13

34.00 65.63 25.26

163.33

T

1.00 0.14 4.59 3.88 0.26 0.59 2.41 5.00 0.17 0.01 0.01 0.39 0.06 3.00 4.03 5.00

34.00 78.00 23.24

231.07

Metrics

Nodes Links Components Network density Average degree Standard deviation degree Global cluster coefficient Average cluster coefficient Mean path length Communities Gini coefficient Average betweenness Maximum betweenness Average closeness Minimum closeness Average eigencentrality Minimum eigencentrality Network radius Average eccentricity Network diameter

229 Table E.11 Heuristic methods results for the Bernard & Killworth Office network.

)

M

0.00 0.00 0.00 0.00 0.00 3.61 0.08 0.19 0.17 0.60 3.26 0.00 0.01 0.76 0.19 0.00 1.48 5.48

13.64

L2(

439.83

M)

0.00 0.00 0.00 0.00 0.00 0.34 0.87 0.84 2.75 0.01 0.04 3.93 0.81 0.00 7.10

17.59 54.00 16.45 30.00

L1(

2183.96

|

̅

-

0.00 0.00 0.00 0.00 0.00 0.57 0.01 0.03 0.03 1.13 0.01 0.49 0.00 0.00 0.13 0.02 0.00 0.24 1.00

72.69

T

|

̅

1.00 0.31 5.05 0.42 0.46 1.74 5.13 0.34 0.02 0.01 0.45 0.10 2.00 2.59 3.00

40.00 11.90 14.41

238.00 118.82

)

P

0.00 0.00 0.00 0.00 0.00 3.83 0.06 0.20 0.19 0.60 3.65 0.00 0.01 0.76 0.13 0.00 1.76 5.39

L2( 10.86

456.14

)

P

0.00 0.00 0.00 0.00 0.00 0.30 0.96 0.96 2.42 0.01 0.04 3.97 0.56 0.00 8.88

L1( 19.40 42.00 18.70 29.00

2353.65

|

̅

-

0.00 0.00 0.00 0.00 0.00 0.63 0.00 0.03 0.03 0.80 0.03 0.62 0.00 0.00 0.13 0.01 0.00 0.29 0.97

T

78.46

|

̅

1.00 0.31 5.11 0.41 0.46 1.73 4.80 0.32 0.02 0.01 0.45 0.11 2.00 2.53 3.03

40.00 11.90 14.28

238.00 124.58

)

F

0.00 0.00 0.29 6.03 0.75 0.84 0.36 0.35 7.10 0.00 0.00 0.20 0.24 0.00 0.39 5.29

L2(

11.16 12.12 42.15

223.29

F)

0.00 0.00 1.55 4.08 4.58 1.93 1.51 0.02 0.02 0.87 1.17 0.00 1.50

L1(

60.60 32.72 55.00 37.63 28.00

188.93

1212.00

|

̅

-

0.00 0.00 0.05 2.02 1.09 0.14 0.15 0.06 1.50 0.01 1.25 1.27 0.00 0.00 0.02 0.03 0.00 0.02 0.93

40.40

|T

̅

1.00 0.25 9.88 3.39 0.27 0.28 1.83 5.50 0.36 0.01 0.01 0.60 0.15 2.00 2.84 3.07

40.00 16.15 47.39

197.60

T

1.00 0.31 4.48 0.41 0.43 1.76 4.00 0.35 0.02 0.01 0.58 0.12 2.00 2.83 4.00

40.00 11.90 14.90 46.13

238.00

Metrics

Nodes Links Components Network density Average degree Standard deviation degree Global cluster coefficient Average cluster coefficient Mean path length Communities Gini coefficient Average betweenness Maximum betweenness Average closeness Minimum closeness Average eigencentrality Minimum eigencentrality Network radius Average eccentricity Network diameter

230 Table E.12 Heuristic methods results for Krebs IT Department Advice network.

)

M

0.00 0.00 0.00 0.00 0.00 3.34 0.34 0.10 0.78 0.30 0.00 0.00 0.15 0.07 2.45 1.43 4.24

15.75 21.49

L2(

1355.01

M)

0.00 0.00 0.00 0.00 0.00 1.81 0.42 4.20 1.33 0.02 0.01 0.69 0.32 6.00 6.84

17.88 68.00 18.00

L1(

115.46

7160.13

|

̅

-

0.00 0.00 0.00 0.00 0.00 0.60 0.06 0.00 0.14 0.67 0.02 3.85 0.00 0.00 0.01 0.01 0.20 0.19 0.60

T

|

238.67

̅

2.00 0.13 7.25 4.78 0.29 0.43 4.15 8.67 0.45 0.01 0.01 0.31 0.03 2.80 3.48 4.40

56.00 32.47

203.00 500.81

)

P

0.00 0.00 0.00 0.00 0.00 3.04 0.40 0.11 0.77 0.38 0.00 0.00 0.17 0.06 2.45 1.47 4.00

L2( 13.53 21.17

1154.39

)

P

0.00 0.00 0.00 0.00 0.00 2.14 0.46 4.12 1.71 0.02 0.01 0.75 0.28 6.00 6.93

L1( 15.57 61.00 16.00

113.30

5846.64

|

̅

-

0.00 0.00 0.00 0.00 0.00 0.52 0.07 0.00 0.14 1.30 0.05 3.78 0.00 0.00 0.01 0.01 0.20 0.21 0.53

T

|

194.89

̅

2.00 0.13 7.25 4.70 0.28 0.43 4.15 9.30 0.48 0.01 0.01 0.32 0.03 2.80 3.45 4.47

56.00 32.55

203.00 457.03

)

F

0.00 0.00 0.07 4.06 4.12 1.09 1.47 0.37 0.34 0.00 0.00 0.48 0.17 0.00 0.89 3.46

L2(

22.96 10.04

113.61 524.32

F)

0.00 0.00 0.40 5.95 7.99 1.87 1.63 0.01 0.01 2.47 0.80 0.00 4.09

L1(

21.93 22.02 51.38 12.00

614.00 107.00

2778.80

|

̅

-

0.00 0.00 0.01 0.73 0.73 0.20 0.27 0.06 3.10 0.03 1.71 0.00 0.00 0.08 0.03 0.00 0.10 0.40

|T

20.47 92.63

̅

2.00 0.12 6.52 3.45 0.15 0.16 4.22 0.46 0.01 0.01 0.40 0.05 3.00 3.56 4.60

56.00 11.10 34.61

182.53 169.51

T

2.00 0.13 7.25 4.18 0.35 0.42 4.29 8.00 0.43 0.01 0.01 0.32 0.02 3.00 3.66 5.00

56.00 36.32

203.00 262.14

density

m

Metrics

Nodes Links Components Network Average degree Standard deviation degree Global cluster coefficient Average cluster coefficient Mean path length Communities Gini coefficient Average betweenness Maximum betweenness Average closeness Minimum closeness Average eigencentrality Minimu eigencentrality Network radius Average eccentricity Network diameter

231 Table E.13 Heuristic methods results for Krebs IT Department Business network.

)

M

0.00 0.00 0.00 0.00 0.00 3.30 0.74 1.01 0.59 1.18 0.00 0.00 0.76 0.28 0.00 1.10 1.00

14.14 16.31

L2(

594.40

M)

0.00 0.00 0.00 0.00 0.00 4.06 5.55 3.24 5.65 0.02 0.01 4.11 1.44 0.00 5.41 1.00

16.88 60.00 89.00

L1(

3037.07

|

̅

-

0.00 0.00 0.00 0.00 0.00 0.56 0.14 0.19 0.11 1.80 0.18 2.97 0.00 0.00 0.14 0.05 0.00 0.18 0.03

T

|

101.24

̅

1.00 0.25 5.76 0.36 0.38 1.79 4.80 0.27 0.01 0.01 0.38 0.09 2.00 2.71 3.03

56.00 13.82 21.80

387.00 217.56

)

P

0.00 0.00 0.00 0.00 0.00 3.50 0.76 1.02 0.56 1.63 0.00 0.00 0.65 0.30 0.00 0.86 0.00

L2( 16.85 15.37

497.39

)

P

0.00 0.00 0.00 0.00 0.00 4.15 5.55 3.05 8.71 0.02 0.01 3.39 1.54 0.00 4.20 0.00

L1( 17.62 78.00 83.77

2401.24

|

̅

-

0.00 0.00 0.00 0.00 0.00 0.59 0.14 0.19 0.10 2.60 0.29 2.79 0.00 0.00 0.11 0.05 0.00 0.14 0.00

T

79.65

|

̅

1.00 0.25 5.79 0.35 0.38 1.80 5.60 0.39 0.01 0.01 0.40 0.09 2.00 2.75 3.00

56.00 13.82 21.98

387.00 195.97

)

F

0.00 0.00 0.20 6.71 1.41 1.75 0.23 1.27 6.42 0.00 0.00 0.28 0.23 0.00 0.29 1.00

L2(

10.94 16.16

306.33 215.07

F)

0.00 0.00 1.08 7.69 9.57 1.21 6.05 0.01 0.01 1.34 1.08 0.00 1.21 1.00

L1(

59.57 36.43 67.00 33.36

1668.00 1145.89

|

̅

-

0.00 0.00 0.04 1.99 1.21 0.26 0.32 0.04 2.03 0.20 1.11 0.00 0.00 0.04 0.03 0.00 0.03 0.03

|T

55.60 38.20

̅

1.00 0.22 3.98 0.24 0.24 1.86 5.03 0.29 0.01 0.01 0.56 0.17 2.00 2.86 3.03

56.00 11.84 23.66 78.13

331.40

T

1.00 0.25 5.20 0.49 0.56 1.90 3.00 0.10 0.01 0.01 0.52 0.14 2.00 2.89 3.00

56.00 13.82 24.77

387.00 116.33

Metrics

Nodes Links Components Network density Average degree Standard deviation degree Global cluster coefficient Average cluster coefficient Mean path length Communities Gini coefficient Average betweenness Maximum betweenness Average closeness Minimum closeness Average eigencentrality Minimum eigencentrality Network radius Average eccentricity Network diameter

232 Table E.14 Heuristic methods results for the Lazega Law Firm network.

)

M

0.00 0.00 0.00 0.00 0.00 1.25 0.20 0.25 0.12 1.67 4.22 0.00 0.00 0.18 0.08 0.00 1.21 0.00

23.41

L2(

519.36

M)

0.00 0.00 0.00 0.00 0.00 4.85 1.10 1.34 0.65 8.55 0.00 0.01 0.84 0.37 0.00 6.30 0.00

22.65

L1(

102.00

2610.11

|

̅

-

0.00 0.00 0.00 0.00 0.00 0.03 0.04 0.05 0.02 3.27 0.28 0.76 0.00 0.00 0.03 0.01 0.00 0.21 0.00

T

87.00

|

0

̅

1.00 0.29 8.12 0.40 0.41 1.73 6.27 0.40 0.01 0.01 0.42 0.08 2.00 2.54 3.00

71.00 726. 20.45 25.54 193.7

)

P

0.00 0.00 0.00 0.00 0.00 1.41 0.20 0.23 0.14 1.47 4.73 0.00 0.00 0.20 0.07 0.00 1.57 0.00

L2( 16.22

577.96

)

P

0.00 0.00 0.00 0.00 0.00 5.76 1.06 1.25 0.73 7.27 0.00 0.01 0.94 0.29 0.00 8.18 0.00

L1( 65.00 25.37

2814.09

|

̅

-

0.00 0.00 0.00 0.00 0.00 0.12 0.04 0.04 0.02 2.03 0.23 0.85 0.00 0.00 0.03 0.00 0.00 0.27 0.00

T

93.80

|

̅

1.00 0.29 8.21 0.41 0.41 1.73 5.03 0.35 0.01 0.01 0.42 0.09 2.00 2.47 3.00

71.00 726.0 20.45 25.45 200.5

)

F

0.00 0.00 0.27 0.88 0.90 0.21 1.49 7.45 0.00 0.00 0.54 0.33 0.00 0.56 0.00

L2(

19.02 12.16 20.95 86.78

675.18

F)

0.00 0.00 1.49 4.80 4.93 1.15 7.74 0.01 0.01 2.86 1.65 0.00 2.56 0.00

L1(

66.40 99.00 40.28

3690.0 103.94 431.28

|

̅

-

0.00 0.00 0.05 3.47 2.21 0.16 0.16 0.04 3.30 0.26 1.34 7.25 0.00 0.00 0.10 0.06 0.00 0.09 0.00

|T

123.0

̅

1.00 0.24 5.88 0.28 0.29 1.79 6.30 0.37 0.01 0.01 0.54 0.15 2.00 2.66 3.00

71.00 603.0 16.99 27.64 99.45

T

1.00 0.29 8.10 0.44 0.45 1.75 3.00 0.11 0.01 0.01 0.45 0.09 2.00 2.75 3.00

71.00 726.0 20.45 26.30

106.69

Metrics

Nodes Links Components Network density Average degree Standard deviation degree Global cluster coefficient Average cluster coefficient Mean path length Communities Gini coefficient Average betweenness Maximum betweenness Average closeness Minimum closeness Average eigencentrality Minimum eigencentrality Network radius Average eccentricity Network diameter

233

HEURISTIC RESULTS FROM USING IPD GENERATED COMPATIBILITY TABLE For all tables in Appendix F: bold indicates the lowest values. Table F.1 Heuristic with IPD for Robins Australian Bank Metrics T F |T -F| L1(F) L2(F) M |T - M| L1(M) L2(M) Nodes 11 11 0 0 0 11 0 0 0 Links 16 12.57 3.433 103 20.761 16 0 0 0 Components 1 1.267 0.267 8 3.464 1.333 0.333 10 3.464 Network density 0.291 0.228 0.062 1.873 0.377 0.291 0 0 0 Average degree 2.909 2.285 0.624 18.727 3.775 2.909 0 0 0 Standard deviation degree 1.868 1.288 0.58 17.4 3.464 1.94 0.072 8.665 1.932 Global cluster coefficient 0.375 0.158 0.217 6.615 1.37 0.407 0.032 1.752 0.46 Average cluster coefficient 0.405 0.182 0.223 6.832 1.456 0.695 0.29 8.707 1.653 Mean path length 2.018 2.887 0.869 26.291 7.48 2.553 0.535 20.436 6.177 Communities 3 3.3 0.3 17 4.796 3.1 0.1 27 7.141 Gini coefficient 0.242 0.166 0.077 2.805 0.608 0.213 0.03 2.728 0.639 Average betweenness 5.091 6.618 1.527 56.545 12.153 4.339 0.752 33.636 7.467 Maximum betweenness 25.17 23.55 1.619 155.75 33.666 23.87 1.294 159.17 35.939 Average closeness 0.052 0.046 0.006 0.25 0.052 0.057 0.006 0.223 0.062 Minimum closeness 0.038 0.032 0.006 0.237 0.048 0.044 0.006 0.216 0.054 Average eigencentrality 0.492 0.526 0.034 1.498 0.345 0.548 0.056 1.757 0.426 Minimum eigencentrality 0.138 0.143 0.005 1.975 0.416 0.214 0.076 2.504 0.5 Network radius 2 2.767 0.767 23 5 2 0 2 1.414 Average eccentricity 3.091 3.855 0.764 26.182 5.592 2.739 0.352 15.636 3.298 Network diameter 4 4.9 0.9 31 7.141 3.333 0.667 22 4.899 Table F.2 Heuristic with IPD results for Roethlisberger & Dickson Wiring Room Metrics T F |T -F| L1(F) L2(F) M |T - M| L1(M) L2(M) Nodes 14 14 0 0 0 14 0 0 0 Links 13 10.73 2.267 68 15.1 13 0 0 0 Components 6 6.233 0.233 7 3 6.167 0.167 5 2.236 Network density 0.143 0.118 0.025 0.747 0.166 0.143 0 0 0 Average degree 1.857 1.533 0.324 9.714 2.157 1.857 0 0 0 Standard deviation degree 1.61 1.38 0.231 6.925 1.616 1.841 0.23 7.39 1.59 Global cluster coefficient 0.643 0.211 0.432 13.185 2.545 0.449 0.194 5.864 1.146 Average cluster coefficient 0.708 0.226 0.482 14.828 2.841 0.665 0.043 3.631 0.718 Mean path length 9.341 9.714 0.373 13.802 5.178 9.362 0.021 7.901 2.057 Communities 7 7.667 0.667 20 4.899 7.6 0.6 18 4.243 Gini coefficient 0.367 0.326 0.041 1.239 0.322 0.323 0.044 1.536 0.367 Average betweenness 3.143 2.657 0.486 26.571 6.192 2.04 1.102 33.071 6.552 Maximum betweenness 16 10.54 5.464 168.92 36.99 16.26 0.261 132.17 30.785 Average closeness 0.058 0.074 0.016 0.583 0.249 0.073 0.015 0.447 0.094 Minimum closeness 0.042 0.055 0.013 0.486 0.208 0.055 0.014 0.408 0.087 Average eigencentrality 0.594 0.674 0.08 2.796 0.651 0.629 0.035 2.38 0.531 Minimum eigencentrality 0.199 0.25 0.052 2.934 0.769 0.303 0.104 3.188 0.718 Network radius 3 2.467 0.533 18 4.472 1.967 1.033 31 5.745 Average eccentricity 2.571 2.181 0.39 15.714 3.537 1.76 0.812 24.357 4.681 Network diameter 5 4.267 0.733 30 6.928 3.3 1.7 51 9.747

234 Table F.3 Heuristic with IPD results for Thurman Office

Metrics T F |T -F| L1(F) L2(F) M |T - M| L1(M) L2(M) Nodes 15 15 0 0 0 15 0 0 0 Links 33 26.23 6.767 203 38.092 33 0 0 0 Components 1 1.1 0.1 3 1.732 1.067 0.067 2 2 Network density 0.314 0.25 0.064 1.933 0.363 0.314 0 0 0 Average degree 4.4 3.498 0.902 27.067 5.079 4.4 0 0 0 Standard deviation degree 2.53 1.827 0.703 21.089 3.997 2.94 0.41 12.305 2.396 Global cluster coefficient 0.516 0.272 0.244 7.329 1.381 0.446 0.07 2.109 0.428 Average cluster coefficient 0.477 0.304 0.173 5.271 1.08 0.682 0.205 6.15 1.153 Mean path length 1.876 2.334 0.458 13.743 4.417 1.9 0.024 5.6 3.192 Communities 3 3.8 0.8 34 8.246 3.733 0.733 44 9.592 Gini coefficient 0.178 0.254 0.076 2.592 0.586 0.285 0.108 4.43 0.873 Average betweenness 6.133 7.66 1.527 49.667 9.981 5.462 0.671 20.267 4.713 Maximum betweenness 37.25 27.73 9.517 300.42 59.319 45.55 8.302 272.09 57.57 Average closeness 0.039 0.035 0.004 0.125 0.025 0.042 0.003 0.075 0.019 Minimum closeness 0.03 0.025 0.005 0.178 0.035 0.031 0.001 0.055 0.017 Average eigencentrality 0.528 0.543 0.015 1.026 0.212 0.506 0.022 1.093 0.213 Minimum eigencentrality 0.106 0.119 0.014 1.247 0.289 0.167 0.061 1.899 0.353 Network radius 2 2.567 0.567 17 4.123 1.967 0.033 1 1 Average eccentricity 2.8 3.398 0.598 18.6 3.829 2.571 0.229 8.333 1.829 Network diameter 3 4.3 1.3 39 7.681 3.1 0.1 5 2.236

Table F.4 Heuristic with IPD results for Sampson Monastery

Metrics T F |T -F| L1(F) L2(F) M |T - M| L1(M) L2(M) Nodes 18 18 0 0 0 18 0 0 0 Links 41 34.47 6.533 196 37.256 41 0 0 0 Components 1 1.033 0.033 1 1 1 0 0 0 Network density 0.268 0.225 0.043 1.281 0.244 0.268 0 0 0 Average degree 4.556 3.83 0.726 21.778 4.14 4.556 0 0 0 Standard deviation degree 2.093 1.532 0.561 16.829 3.312 3.182 1.089 32.678 6.131 Global cluster coefficient 0.262 0.17 0.093 2.854 0.578 0.364 0.102 3.056 0.575 Average cluster coefficient 0.285 0.193 0.093 2.955 0.628 0.675 0.39 11.687 2.162 Mean path length 1.967 2.171 0.204 6.118 2.157 1.792 0.176 5.275 0.987 Communities 3 3.833 0.833 29 7.416 3.8 0.8 26 6.633 Gini coefficient 0.074 0.201 0.127 3.887 0.818 0.247 0.173 5.194 1.029 Average betweenness 8.222 9.42 1.198 35.944 7.418 6.728 1.494 44.833 8.388 Maximum betweenness 37.62 38.27 0.645 187.32 40.612 81.52 43.9 1317 244.34 Average closeness 0.03 0.028 0.002 0.061 0.012 0.034 0.003 0.092 0.017 Minimum closeness 0.024 0.022 0.002 0.072 0.018 0.028 0.004 0.124 0.024 Average eigencentrality 0.481 0.526 0.045 1.717 0.415 0.416 0.064 1.926 0.369 Minimum eigencentrality 0.174 0.189 0.015 2.123 0.47 0.197 0.023 0.907 0.235 Network radius 2 2.5 0.5 15 3.873 1.867 0.133 4 2 Average eccentricity 3 3.313 0.313 9.722 2.287 2.437 0.563 16.889 3.366 Network diameter 4 4.1 0.1 7 2.646 2.867 1.133 34 6.481

235

Table F.5 Heuristic with IPD results for Krackhardt Office CSS

Metrics T F |T -F| L1(F) L2(F) M |T - M| L1(M) L2(M) Nodes 21 21 0 0 0 21 0 0 0 Links 14 12.57 1.433 43 9.434 14 0 0 0 Components 9 9.233 0.233 21 4.583 9.3 0.3 9 3 Network density 0.067 0.06 0.007 0.205 0.045 0.067 0 0 0 Average degree 1.333 1.197 0.137 4.095 0.898 1.333 0 0 0 Standard deviation degree 1.39 1.202 0.188 5.639 1.265 2.118 0.727 21.814 4.103 Global cluster coefficient 0.125 0.054 0.071 3.117 0.608 0.141 0.016 0.878 0.209 Average cluster coefficient 0.158 0.06 0.098 3.948 0.766 0.699 0.54 16.214 2.971 Mean path length 15.84 15.93 0.085 42.486 9.299 14.46 1.386 43.129 8.506 Communities 10 11.4 1.4 42 8.832 10.77 0.767 23 5.385 Gini coefficient 0.395 0.326 0.069 2.156 0.441 0.406 0.011 0.653 0.155 Average betweenness 3.667 5.057 1.39 76.857 18.69 3.586 0.081 18.619 4.244 Maximum betweenness 22.5 31.06 8.556 422.33 97.031 55.4 32.9 987 185.06 Average closeness 0.043 0.046 0.003 0.49 0.12 0.045 0.001 0.126 0.032 Minimum closeness 0.03 0.033 0.003 0.383 0.092 0.035 0.005 0.151 0.038 Average eigencentrality 0.471 0.489 0.018 1.946 0.449 0.392 0.079 2.368 0.465 Minimum eigencentrality 0.106 0.156 0.049 2.536 0.61 0.245 0.138 4.149 0.805 Network radius 3 3.067 0.067 16 4.243 1.933 1.067 32 6 Average eccentricity 2.333 2.676 0.343 19.714 4.774 1.683 0.651 19.524 3.757 Network diameter 5 5.633 0.633 35 8.426 3.067 1.933 58 10.863

Table F.6 Heuristic with IPD results for Krackhardt High-Tech Managers

Metrics T F |T -F| L1(F) L2(F) M |T - M| L1(M) L2(M) Nodes 21 21 0 0 0 21 0 0 0 Links 36 31.13 4.867 146 27.893 36 0 0 0 Components 5 5.067 0.067 2 1.414 5.067 0.067 2 1.414 Network density 0.171 0.148 0.023 0.695 0.133 0.171 0 0 0 Average degree 3.429 2.965 0.463 13.905 2.656 3.429 0 0 0 Standard deviation degree 2.135 1.883 0.251 7.54 1.548 2.689 0.554 16.629 3.147 Global cluster coefficient 0.496 0.189 0.307 9.215 1.722 0.38 0.116 3.493 0.665 Average cluster coefficient 0.585 0.217 0.368 11.032 2.08 0.604 0.019 1.507 0.34 Mean path length 8.976 8.885 0.092 7.31 1.893 8.843 0.133 12.89 3.891 Communities 7 7.733 0.733 26 6.481 8 1 34 7.874 Gini coefficient 0.435 0.401 0.034 1.604 0.356 0.396 0.04 1.774 0.38 Average betweenness 9.286 7.354 1.932 57.952 11.303 6.113 3.173 95.19 17.851 Maximum betweenness 44.67 26.91 17.76 532.83 102.08 60.65 15.98 507.33 102.32 Average closeness 0.026 0.03 0.003 0.103 0.021 0.033 0.007 0.204 0.04 Minimum closeness 0.019 0.024 0.005 0.151 0.03 0.024 0.006 0.177 0.036 Average eigencentrality 0.45 0.588 0.138 4.133 0.837 0.463 0.013 0.983 0.24 Minimum eigencentrality 0.041 0.214 0.173 5.182 1.041 0.15 0.108 3.254 0.666 Network radius 3 2.9 0.1 3 1.732 2.033 0.967 29 5.385 Average eccentricity 3.381 2.744 0.637 19.095 3.662 2.444 0.937 28.095 5.263 Network diameter 5 4.167 0.833 25 5 3.7 1.3 39 7.681

236

Table F.7 Heuristic with IPD results for Schwimmer Taro Exchange

Metrics T F |T -F| L1(F) L2(F) M |T - M| L1(M) L2(M) Nodes 22 22 0 0 0 22 0 0 0 Links 39 35.73 3.267 98 19.391 39 0 0 0 Components 1 1 0 0 0 1.733 0.733 22 5.831 Network density 0.169 0.155 0.014 0.424 0.084 0.169 0 0 0 Average degree 3.545 3.248 0.297 8.909 1.763 3.545 0 0 0 Standard deviation degree 0.963 0.955 0.007 3.185 0.704 2.692 1.73 51.899 9.527 Global cluster coefficient 0.275 0.097 0.178 5.344 1.01 0.332 0.057 1.706 0.352 Average cluster coefficient 0.339 0.104 0.236 7.075 1.329 0.719 0.38 11.386 2.088 Mean path length 2.494 2.594 0.1 3.814 0.931 3.536 1.043 36.043 9.013 Communities 5 4.3 0.7 25 6.083 5.433 0.433 23 5.745 Gini coefficient 0.127 0.166 0.038 1.964 0.447 0.249 0.122 3.764 0.791 Average betweenness 15.68 16.74 1.053 40.045 9.772 12.12 3.561 107.18 22.035 Maximum betweenness 46.38 48.66 2.278 206.43 47.335 148.2 101.8 3055 564.75 Average closeness 0.019 0.019 0.001 0.025 0.006 0.023 0.004 0.112 0.023 Minimum closeness 0.016 0.014 0.002 0.061 0.013 0.017 0 0.038 0.009 Average eigencentrality 0.615 0.522 0.093 2.838 0.594 0.328 0.287 8.598 1.572 Minimum eigencentrality 0.315 0.179 0.136 4.26 0.866 0.084 0.231 6.922 1.271 Network radius 3 3.167 0.167 5 2.236 2.167 0.833 25 5 Average eccentricity 4.091 4.238 0.147 8.318 2.069 3.253 0.838 25.136 5.018 Network diameter 5 5.133 0.133 12 3.742 4.133 0.867 26 5.292

Table F.8 Heuristic with IPD results for Webster Accounting Firm

Metrics T F |T -F| L1(F) L2(F) M |T - M| L1(M) L2(M) Nodes 24 24 0 0 0 24 0 0 0 Links 150 106.2 43.77 1313 240.46 150 0 0 0 Components 2 2 0 0 0 2 0 0 0 Network density 0.543 0.385 0.159 4.757 0.871 0.543 0 0 0 Average degree 12.5 8.853 3.647 109.42 20.039 12.5 0 0 0 Standard deviation degree 5.509 3.519 1.99 59.703 10.947 5.282 0.227 6.89 1.504 Global cluster coefficient 0.811 0.458 0.353 10.593 1.941 0.718 0.092 2.775 0.509 Average cluster coefficient 0.778 0.466 0.312 9.357 1.725 0.754 0.024 0.751 0.161 Mean path length 3.482 3.472 0.01 0.551 0.122 3.296 0.185 5.562 1.016 Communities 5 4.767 0.233 29 7.416 4.7 0.3 31 7.28 Gini coefficient 0.517 0.416 0.101 3.023 0.624 0.505 0.011 1.359 0.316 Average betweenness 6.5 6.386 0.114 6.333 1.399 4.368 2.132 63.958 11.681 Maximum betweenness 26.8 18.16 8.643 259.28 50.418 16.47 10.33 310.02 58.435 Average closeness 0.029 0.029 0.001 0.017 0.004 0.033 0.004 0.111 0.02 Minimum closeness 0.017 0.022 0.005 0.154 0.029 0.024 0.007 0.203 0.037 Average eigencentrality 0.654 0.685 0.031 1.16 0.244 0.711 0.057 1.724 0.332 Minimum eigencentrality 0.018 0.21 0.192 5.755 1.071 0.185 0.167 5.015 0.947 Network radius 2 2 0 0 0 2 0 0 0 Average eccentricity 2.792 2.228 0.564 16.917 3.129 2.028 0.764 22.917 4.198 Network diameter 4 3 1 30 5.477 2.867 1.133 34 6.481

237

Table F.9 Heuristic with IPD results for Zachary Karate Club

Metrics T F |T -F| L1(F) L2(F) M |T - M| L1(M) L2(M) Nodes 34 34 0 0 0 34 0 0 0 Links 78 65.93 12.07 362 67.246 78 0 0 0 Components 1 1.067 0.067 2 1.414 1.067 0.067 2 1.414 Network density 0.139 0.118 0.022 0.645 0.12 0.139 0 0 0 Average degree 4.588 3.878 0.71 21.294 3.956 4.588 0 0 0 Standard deviation degree 3.878 2.729 1.149 34.466 6.398 3.901 0.024 5.923 1.271 Global cluster coefficient 0.256 0.171 0.085 2.559 0.506 0.302 0.047 1.402 0.282 Average cluster coefficient 0.588 0.213 0.375 11.253 2.077 0.679 0.091 2.727 0.524 Mean path length 2.408 2.668 0.26 7.852 2.968 2.693 0.285 12.496 7.172 Communities 5 6.233 1.233 45 10.536 6.467 1.467 46 10.1 Gini coefficient 0.165 0.304 0.14 4.223 0.856 0.329 0.164 4.919 0.964 Average betweenness 23.24 25.39 2.15 65.5 14.042 21.91 1.322 52.941 13.105 Maximum betweenness 231.1 168.7 62.41 1872.2 365.87 244.2 13.11 1089.5 244.32 Average closeness 0.013 0.012 0.001 0.02 0.004 0.013 0.001 0.019 0.005 Minimum closeness 0.009 0.009 0 0.018 0.004 0.009 0.001 0.025 0.006 Average eigencentrality 0.392 0.335 0.057 1.727 0.341 0.296 0.096 2.871 0.531 Minimum eigencentrality 0.063 0.046 0.017 0.678 0.151 0.035 0.028 0.895 0.184 Network radius 3 3.033 0.033 3 1.732 2.667 0.333 10 3.162 Average eccentricity 4.029 4.011 0.019 6.676 1.59 3.666 0.364 11.735 2.634 Network diameter 5 5 0 10 3.162 4.633 0.367 11 3.317

Table F.10 Heuristic with IPD results for Bernard & Killworth Technical

Metrics T F |T -F| L1(F) L2(F) M |T - M| L1(M) L2(M) Nodes 34 34 0 0 0 34 0 0 0 Links 175 140.4 34.63 1039 191.12 175 0 0 0 Components 1 1 0 0 0 1 0 0 0 Network density 0.312 0.25 0.062 1.852 0.341 0.312 0 0 0 Average degree 10.29 8.257 2.037 61.118 11.242 10.29 0 0 0 Standard deviation degree 4.629 3.455 1.173 35.204 6.485 5.072 0.443 13.285 2.672 Global cluster coefficient 0.476 0.295 0.181 5.425 0.996 0.439 0.037 1.101 0.208 Average cluster coefficient 0.474 0.295 0.179 5.379 0.995 0.525 0.051 1.532 0.308 Mean path length 1.807 1.914 0.107 3.196 0.599 1.766 0.042 1.25 0.239 Communities 4 7.167 3.167 97 19.672 6.367 2.367 77 16.583 Gini coefficient 0.485 0.482 0.003 1.28 0.311 0.479 0.006 1.002 0.251 Average betweenness 13.32 15.08 1.758 52.735 9.884 12.64 0.687 20.618 3.947 Maximum betweenness 63.29 52.12 11.18 419.9 85.915 103.5 40.25 1207.5 239.34 Average closeness 0.017 0.016 0.001 0.029 0.005 0.017 0 0.011 0.002 Minimum closeness 0.012 0.011 0.001 0.029 0.006 0.013 0.001 0.02 0.004 Average eigencentrality 0.527 0.589 0.062 1.856 0.38 0.5 0.027 1.102 0.257 Minimum eigencentrality 0.057 0.064 0.007 0.547 0.125 0.075 0.018 0.556 0.108 Network radius 2 2.4 0.4 12 3.464 2 0 0 0 Average eccentricity 2.882 3.129 0.247 7.412 1.567 2.712 0.171 5.294 1.072 Network diameter 4 4 0 2 1.414 3.2 0.8 24 4.899

238 Table F.11 Heuristic with IPD results for Bernard & Killworth Office

Metrics T F |T -F| L1(F) L2(F) M |T - M| L1(M) L2(M) Nodes 40 40 0 0 0 40 0 0 0 Links 238 196.7 41.33 1240 227.39 238 0 0 0 Components 1 1 0 0 0 1 0 0 0 Network density 0.305 0.252 0.053 1.59 0.292 0.305 0 0 0 Average degree 11.9 9.833 2.067 62 11.369 11.9 0 0 0 Standard deviation degree 4.477 3.428 1.048 31.451 5.817 5.169 0.692 21.247 4.178 Global cluster coefficient 0.409 0.27 0.139 4.157 0.764 0.399 0.01 0.402 0.094 Average cluster coefficient 0.43 0.276 0.154 4.621 0.85 0.461 0.031 1.006 0.211 Mean path length 1.764 1.833 0.069 2.081 0.392 1.73 0.035 1.036 0.2 Communities 4 5.967 1.967 71 15.652 4.9 0.9 49 11.705 Gini coefficient 0.35 0.357 0.007 2.534 0.627 0.357 0.007 3.077 0.65 Average betweenness 14.9 16.25 1.353 40.575 7.651 14.23 0.673 20.2 3.892 Maximum betweenness 46.13 47.52 1.396 170.43 39.745 127.2 81.04 2431.3 474.86 Average closeness 0.015 0.014 0.001 0.018 0.003 0.015 0 0.008 0.002 Minimum closeness 0.01 0.011 0 0.015 0.003 0.012 0.001 0.04 0.008 Average eigencentrality 0.583 0.606 0.023 1.028 0.241 0.46 0.122 3.917 0.753 Minimum eigencentrality 0.121 0.147 0.026 1.063 0.226 0.113 0.009 0.457 0.106 Network radius 2 2.033 0.033 1 1 2 0 0 0 Average eccentricity 2.825 2.862 0.037 2.15 0.462 2.508 0.317 9.55 1.868 Network diameter 4 3.167 0.833 25 5 3 1 30 5.477

Table F.12 Heuristic with IPD results for Krebs IT Department (Advice)

Metrics T F |T -F| L1(F) L2(F) M |T - M| L1(M) L2(M) Nodes 56 56 0 0 0 56 0 0 0 Links 203 181.4 21.63 649 119.99 203 0 0 0 Components 2 2 0 0 0 2.167 0.167 5 2.236 Network density 0.132 0.118 0.014 0.421 0.078 0.132 0 0 0 Average degree 7.25 6.477 0.773 23.179 4.285 7.25 0 0 0 Standard deviation degree 4.179 3.421 0.758 22.738 4.24 4.696 0.517 15.515 2.948 Global cluster coefficient 0.35 0.154 0.196 5.87 1.077 0.266 0.084 2.509 0.461 Average cluster coefficient 0.424 0.164 0.26 7.805 1.432 0.421 0.004 0.485 0.101 Mean path length 4.285 4.236 0.049 1.464 0.303 4.433 0.148 12.665 3.914 Communities 8 9.833 1.833 69 15.264 9 1 36 8.367 Gini coefficient 0.429 0.484 0.055 1.91 0.411 0.465 0.036 1.452 0.339 Average betweenness 36.32 34.98 1.342 40.25 8.345 31.56 4.767 143 26.464 Maximum betweenness 262.1 165.8 96.32 2889.6 540.39 403.8 141.7 4371.1 878.96 Average closeness 0.008 0.008 0 0.005 0.001 0.009 0.001 0.019 0.004 Minimum closeness 0.006 0.006 0 0.008 0.002 0.006 0.001 0.018 0.004 Average eigencentrality 0.316 0.399 0.084 2.51 0.48 0.33 0.014 0.92 0.209 Minimum eigencentrality 0.022 0.046 0.024 0.735 0.162 0.045 0.023 0.7 0.153 Network radius 3 3 0 0 0 2.667 0.333 10 3.162 Average eccentricity 3.661 3.625 0.036 3.536 0.782 3.29 0.371 11.125 2.191 Network diameter 5 4.633 0.367 11 3.317 4.167 0.833 25 5

239 Table F.13 Heuristic with IPD results for Krebs Fortune 500 IT Department (Business)

Metrics T F |T -F| L1(F) L2(F) M |T - M| L1(M) L2(M) Nodes 56 56 0 0 0 56 0 0 0 Links 387 330.5 56.47 1694 311.34 387 0 0 0 Components 1 1 0 0 0 1 0 0 0 Network density 0.251 0.215 0.037 1.1 0.202 0.251 0 0 0 Average degree 13.82 11.81 2.017 60.5 11.119 13.82 0 0 0 Standard deviation degree 5.198 4.047 1.152 34.548 6.398 5.721 0.522 15.843 3.143 Global cluster coefficient 0.493 0.233 0.26 7.788 1.423 0.341 0.152 4.558 0.834 Average cluster coefficient 0.56 0.237 0.323 9.684 1.769 0.374 0.185 5.561 1.017 Mean path length 1.901 1.859 0.042 1.249 0.237 1.794 0.107 3.197 0.585 Communities 3 5.567 2.567 79 18.248 5.067 2.067 66 15.166 Gini coefficient 0.095 0.307 0.211 6.63 1.299 0.293 0.198 6.174 1.247 Average betweenness 24.77 23.62 1.145 34.357 6.528 21.84 2.931 87.929 16.078 Maximum betweenness 116.3 82.46 33.86 1015.9 198.01 160.4 44.11 1590 332.49 Average closeness 0.01 0.01 0 0.005 0.001 0.01 0.001 0.016 0.003 Minimum closeness 0.008 0.008 0 0.01 0.002 0.008 0 0.014 0.003 Average eigencentrality 0.517 0.551 0.034 1.167 0.261 0.437 0.08 2.553 0.503 Minimum eigencentrality 0.136 0.162 0.026 1.017 0.218 0.109 0.027 0.889 0.189 Network radius 2 2 0 0 0 2 0 0 0 Average eccentricity 2.893 2.869 0.024 1.321 0.304 2.699 0.194 5.857 1.121 Network diameter 3 3.033 0.033 1 1 3.033 0.033 1 1

Table F.14 Heuristic with IPD results for Lazega Law Firm

Metrics T F |T -F| L1(F) L2(F) M |T - M| L1(M) L2(M) Nodes 71 71 0 0 0 71 0 0 0 Links 726 608.7 117.3 3520 644.17 726 0 0 0 Components 1 1 0 0 0 1 0 0 0 Network density 0.292 0.245 0.047 1.416 0.259 0.292 0 0 0 Average degree 20.45 17.15 3.305 99.155 18.146 20.45 0 0 0 Standard deviation degree 8.095 5.961 2.134 64.023 11.744 8.324 0.229 8.632 1.829 Global cluster coefficient 0.441 0.283 0.158 4.73 0.865 0.389 0.052 1.559 0.287 Average cluster coefficient 0.449 0.286 0.163 4.892 0.895 0.413 0.037 1.101 0.206 Mean path length 1.751 1.786 0.035 1.038 0.192 1.725 0.026 0.777 0.144 Communities 3 5.733 2.733 88 19.698 5.6 2.6 78 16.248 Gini coefficient 0.113 0.368 0.255 7.824 1.582 0.409 0.296 8.952 1.748 Average betweenness 26.3 27.51 1.211 36.338 6.722 25.39 0.907 27.211 5.034 Maximum betweenness 106.7 98.22 8.475 446.94 92.919 180.3 73.58 2207.4 455.31 Average closeness 0.008 0.008 0 0.006 0.001 0.008 0 0.003 0.001 Minimum closeness 0.006 0.007 0 0.007 0.001 0.007 0 0.013 0.002 Average eigencentrality 0.448 0.535 0.087 2.615 0.506 0.442 0.005 0.801 0.189 Minimum eigencentrality 0.092 0.148 0.056 1.689 0.326 0.103 0.011 0.487 0.111 Network radius 2 2 0 0 0 2 0 0 0 Average eccentricity 2.746 2.646 0.1 3.014 0.638 2.446 0.301 9.028 1.695 Network diameter 3 3 0 0 0 3 0 0 0

240

NETWORK DIAGRAMS FOR THE FIRST TURING TEST

Figure G.1West Scotland teenage girls

241

Figure G.2 Dutch school class

242

Figure G.3 Sociology class Winter 1996

243

Figure G.4 Sociology cohort

244

Figure G.5 University freshman

245

NETWORK DIAGRAMS FOR THE SECOND TURING TEST

Figure H.1 Human generated evolving network of teenage girls

246

Figure H.2 Synthesized evolving network of teenage girls

247

Figure H.3 Synthesized evolving network in a Dutch school class

248

Figure H.4 Human generated evolving network in a Dutch school class

249

Figure H.5 Synthesized evolving network of a sociology freshmen students

250

Figure H.6 Human generated evolving network of sociology freshmen students

251

Figure H.7 Human generated evolving network of university freshmen

252

Figure H.8 Synthesized evolving network of university freshmen

253

INSTITUTIONAL REVIEW BOARD APPROVAL FOR THE TURING TEST

April 10th 2019 Expedited (see pg 2) Mikel Petty Department of Computer Science Exempted (see pg 3) University of Alabama in Huntsville Full Review

Extension of Approval Dear Dr. Petty,

The UAH Institutional Review Board of Human Subjects Committee has reviewed your proposal, Synthesizing Realistic Social Networks Using Personality Compatibility, and found it meets the necessary criteria for approval. Your proposal seems to be in compliance with this institutions Federal Wide Assurance (FWA) 00019998 and the DHHS Regulations for the Protection of Human Subjects (45 CFR 46).

Please note that this approval is good for one year from the date on this letter. If data collection continues past this period, you are responsible for processing a renewal application a minimum of 60 days prior to the expiration date.

No changes are to be made to the approved protocol without prior review and approval from the UAH IRB. All changes (e.g. a change in procedure, number of subjects, personnel, study locations, new recruitment materials, study instruments, etc) must be prospectively reviewed and approved by the IRB before they are implemented. You should report any unanticipated problems involving risks to the participants or others to the IRB Chair.

If you have any questions regarding the IRB’s decision, please contact me.

Sincerely,

Bruce Stallsmith IRB Chair Professor, Biological Sciences

254

Expedited: Clinical studies of drugs and medical devices only when condition (a) or (b) is met. (a) Research on drugs for which an investigational new drug application (21 CFR Part 312) is not required. (Note: Research on marketed drugs that significantly increases the risks or decreases the acceptability of the risks associated with the use of the product is not eligible for expedited review. (b) Research on medical devices for which (i) an investigational device exemption application (21 CFR Part 812) is not required; or (ii) the medical device is cleared/approved for marketing and the medical device is being used in accordance with its cleared/approved labeling.

Collection of blood samples by finger stick, heel stick, ear stick, or venipuncture as follows: (a) from healthy, nonpregnant adults who weigh at least 110 pounds. For these subjects, the amounts drawn may not exceed 550 ml in an 8 week period and collection may not occur more frequently than 2 times per week; or (b) from other adults and children, considering the age, weight, and health of the subjects, the collection procedure, the amount of blood to be collected, and the frequency with which it will be collected. For these subjects, the amount drawn may not exceed the lesser of 50 ml or 3 ml per kg in an 8 week period and collection may not occur more frequently than 2 times per week.

Prospective collection of biological specimens for research purposes by noninvasive means. Examples: (a) hair and nail clippings in a nondisfiguring manner; (b) deciduous teeth at time of exfoliation or if routine patient care indicates a need for extraction; (c) permanent teeth if routine patient care indicates a need for extraction; (d) excreta and external secretions (including sweat); (e) uncannulated saliva collected either in an unstimulated fashion or stimulated by chewing gumbase or wax or by applying a dilute citric solution to the tongue; (f) placenta removed at delivery; (g) amniotic fluid obtained at the time of rupture of the membrane prior to or during labor; (h) supra- and subgingival dental plaque and calculus, provided the collection procedure is not more invasive than routine prophylactic scaling of the teeth and the process is accomplished in accordance with accepted prophylactic techniques; (i) mucosal and skin cells collected by buccal scraping or swab, skin swab, or mouth washings; (j) sputum collected after saline mist nebulization.

Collection of data through noninvasive procedures (not involving general anesthesia or sedation) routinely employed in clinical practice, excluding procedures involving x-rays or microwaves. Where medical devices are employed, they must be cleared/approved for marketing. (Studies intended to evaluate the safety and effectiveness of the medical device are not generally eligible for expedited review, including studies of cleared medical devices for new indications).

Research involving materials (data, documents, records, or specimens) that have been collected, or will be collected solely for nonresearch purposes (such as medical treatment or diagnosis).

Collection of data from voice, video, digital, or image recordings made for research purposes.

Research on individual or group characteristics or behavior (including, but not limited to, research on perception, cognition, motivation, identity, language, communication, cultural beliefs or practices, and social behavior) or research employing survey, interview, oral history, focus group, program evaluation, human factors evaluation, or quality assurance methodologies.

255 Exempt

Research conducted in established or commonly accepted educational settings, involving normal educational practices, such as (a) research on regular and special education instructional strategies, or (b) research on the effectiveness of or the comparison among instructional techniques, curricula, or classroom management methods. The research is not FDA regulated and does not involve prisoners as participants.

Research involving the use of educational tests (cognitive, diagnostic, aptitude, achievement), survey procedures, interviews, or observation of public behavior 1 in which information is obtained in a manner that human subjects cannot be identified directly or through identifiers linked to the subjects and any disclosure of the human subject’s responses outside the research would NOT place the subjects at risk of criminal or civil liability or be damaging to the subject’s financial standing, employability, or reputation. The research is not FDA regulated and does not involve prisoners as participants.

Research involving the use of educational tests (cognitive, diagnostic, aptitude, achievement) survey procedures, interview procedures, or observation of public behavior if (a) the human subjects are elected or appointed public officials or candidates for public office, or (b) Federal statute(s) require(s) without exception that the confidentiality of the personally identifiable information will be maintained throughout the research and thereafter. The research is not FDA regulated and does not involve prisoners as participants.

Research involving the collection or study of existing data, documents, records, pathological specimens, or diagnostic specimens, if these sources are publicly available or if the information is recorded by the investigator in such a manner that subjects cannot be identified, directly or through identifiers linked to the subjects. The research is not FDA regulated and does not involve prisoners as participants.

Research and demonstration projects which are conducted by or subject to the approval of department or agency heads, and which are designed to study, evaluate, or otherwise examine: (i) public benefit or service programs; (ii) procedures for obtaining benefits or services under those programs;(iii) possible changes in or alternatives to those programs or procedures; or (iv) possible changes in methods or levels of payment for benefits or services under those programs. The protocol will be conducted pursuant to specific federal statutory authority; has no statutory requirement for IRB review; does not involve significant physical invasions or intrusions upon the privacy interests of the participant; has authorization or concurrent by the funding agency and does not involve prisoners as participants.

Taste and food quality evaluation and consumer acceptance studies, (i) if wholesome foods without additives are consumed or (ii) if a food is consumed that contains a food ingredient at or below the level and for a use found to be safe, or agricultural chemical or environmental contaminant at or below the level found to be safe, by the Food and Drug Administration or approved by the Environmental Protection Agency or the Food Safety and Inspection Service of the U.S. Department of Agriculture. The research does not involve prisoners as participants.

1 Surveys, interviews, or observation of public behavior involving children cannot be exempt.

256

REFERENCES

1 E. Abbe, "Community detection and stochastic block models: recent developments," The Journal of Machine Learning Research, 18, 6446 (2017).

2 L.M. Aiello, A. Barrat, C. Cattuto, R. Schifanella, and G. Ruffo, "Link creation and information spreading over social and communication ties in an interest-based online social network," EPJ Data Science, 1, 12 (2012).

3 J.T. Alander, "On optimal population size of genetic algorithms," 1992 Proceedings Computer Systems and Software Engineering, The Hague, Netherlands (Washington DC:IEEE, 1992), pp. 65–70.

4 E.C. Anania, T. Disher, K.M. Anglin, and J.P. Kring, "Selecting for long-duration space exploration: Implications of personality," 2017 IEEE Aerospace Conference (IEEE, 2017), pp. 1–8.

5 C.J. Anderson, S. Wasserman, and K. Faust, "Building stochastic blockmodels," Social networks, 14, 137 (1992).

6 R. Axelrod, The Evolution of Cooperation (Basic Books, Inc, New York, 1984).

7 R. Axelrod, The Complexity of Cooperation. Princeton University Press (Princeton, NJ, 1997).

8 M. Back, "Opening the process black box: Mechanisms underlying the social consequences of personality," European Journal of Personality, 29, (2015).

9 L. Backstrom, C. Dwork, and J. Kleinberg, "Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography," Proceedings of the 16th International Conference on World Wide Web. Association of Computing Machinery (2007), pp. 181–190.

10 J. Bang-Jensen and G.Z. Gutin, "Digraphs: theory, algorithms and applications," Springer Science & Business, (2008).

11 A.L. Barabási, Linked: How Everything Is Connected to Everything Else and What It Means. Basic Books a Member of the Perseus Books Group (New York, NY, 2003).

12 A.L. Barabási, "The origin of bursts and heavy tails in human dynamics," Nature, 435, 207 (2005).

13 A.L. Barabási and R. Albert, "Emergence of scaling in random networks," Science, 286, 509 (1999).

257 14 F. Beato, M. Conti, and B. Preneel, "Friend in the middle (fim): Tackling de- anonymization in social networks," 2013 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom) (IEEE, San Diego, 2013), pp. 279–284.

15 M.A. Beauchamp, "An improved index of centrality," Behavioral science, 10, 161 (1965).

16 B. Beaufils, J.-P. Delahaye, and and P. Mathieu, "Our meeting with gradual, a good strategy for the Iterated Prisoner’s Dilemma," Proceedings of the Fifth International Workshop on the Synthesis and Simulation of Living Systems (MIT Press, Nara, Japan, 1997), pp. 202–209.

17 S. Behrendt, A. Richter, and M. Trier, "Mixed methods analysis of enterprise social networks," Computer Networks, 75, 560 (2014).

18 E.A. Bender and E.R. Canfield, "The asymptotic number of labeled graphs with given degree sequences," Journal of Combinatorial Theory, Series A, 24, 296 (1978).

19 H.R. Bernard, P.D. Killworth, and L. Sailer, "Informant accuracy in social-network data V. An experimental attempt to predict actual communication from recall data," Social Science Research, 11, 30 (1982).

20 P.J. Bickel and A. Chen, "A nonparametric view of network models and Newman–Girvan and other modularities," Proceedings of the National Academy of Sciences, 106, 21068 (2009).

21 B. Bollobás, "A probabilistic proof of an asymptotic formula for the number of labelled regular graphs," European Journal of Combinatorics, 1, 311 (1980).

22 B. Bollobás, "Random graphs," Modern Graph Theory (Springer, New York, NY, 1998), pp. 215–252.

23 B. Bollobás, O. Riordan, J. Spencer, and G. Tusnády, "The degree sequence of a scale-free random graph process: Degree Sequence of a Random," Graph. Random Structures & Algorithms, 18, 279 (2001).

24 P. Bonacich, "Some unique properties of eigenvector centrality," Social networks, 29, 555 (2007).

25 P. Bonacich, "Power and Centrality: A Family of Measures," American Journal of Sociology, 92, 1170 (1987).

26 S.P. Borgatti, M.G. Everett, and L.C. Freeman, Encyclopedia of Social Network Analysis and Mining (Springer, New York, NY, 2014).

27 S.P. Borgatti, "Centrality and Network Flow," Social Networks, 27, 55 (2005).

258 28 Y. Bouanan, G. Zacharewicz, J. Ribault, and B. Vallespir, "Discrete Event System Specification-based framework for modeling and simulation of propagation phenomena in social networks: application to the information spreading in a multi-layer social network," SIMULATION, 95, 411 (2019).

29 P. Bourdieu, "What Makes a Social Class? On the Theoretical and Practical Existence of Groups.," Berkeley Journal of Sociology, 32, 1 (1987).

30 J.H. Bradley and F.J. Hebert, "The effect of personality type on team performance," Journal of Management Development, 16, 337 (1997).

31 C.H. Brase and C.P. Brase, Understandable Statistics: Concepts and Methods (Cengage Learning, Boston, MA, 2015).

32 J. Bruggeman, Social Networks: An Introduction (Routledge, New York, NY, 2008).

33 T.S. Bullington, Followers That Lead: Relating Leadership Emergence through Follower Commitment, Engagement, and Connectedness (University of Central Arkansas, Conway, AR, 2016).

34 C.T. Butts, Sna: Tools for Social Network Analysis (2016).

35 L.F. Capretz, "Is there an engineering type?," World Transactions on Engineering and Technology Education, Monash, 1, 169 (2002).

36 S.A. Catanese, P. De Meo, and E. Ferrara, "Crawling facebook for social network analysis purposes," Proceedings of the International Conference on Web Intelligence, Mining and Semantics (ACM, Sogndal, Norway, 2011), p. 52.

37 D. Chakrabarti, Y. Zhan, and C. Faloutsos, "R-MAT: A recursive model for graph mining," Proceedings of the 2004 SIAM International Conference on Data Mining (SIAM, Lake Buena Vista, Florida, 2004), pp. 442–446.

38 C. Chen, "Social networks at Sempra Energy’s IT division are key to building strategic capabilities," Global Business and Organizational Excellence, 26, 16 (2007).

39 K. Cherven, Network Graph Analysis and Visualization with Gephi (Packt Publishing Ltd, Birmingham, England, 2013).

40 S. Chester, B.M. Kapron, and G. Srivastava, "Anonymization and De-anonymization of Social Network Data," edited by A.J. R.Rokne (Springer Publishing Company, New York, NY, 2014), pp. 48–56.

41 P.K. Choo, Z.N. Lou, and B.A. Camburn, "Ideation methods: a first study on measured outcomes with personality type," AASME 2014 International Design Engineering Technical Conferences and Computers and Information in Engineering (American Society of Mechanical Engineers, Houston, TX, 2014), pp. 007–07.

259 42 F. Chung and L. Lu, "Connected components in random graphs with given expected degree sequences," Annals of combinatorics, 6, 125 (2002).

43 F. Chung and L. Lu, "The average distances in random graphs with given expected degrees," Proceedings of the National Academy of Sciences (National Academy of Sciences of the United States of America, Washington, DC, 2002), pp. 15879–15882.

44 Y. Cohen, H. Ornoy, and B. Keren, "MBTI personality types of project managers and their success: A field survey," Project Management Journal, 44, 78 (2013).

45 P.T. Costa and R.R. McCrae, Revised NEO Personality Inventory and Five-Factor Inventory Professional Manual (Psychological Assessment Resources, Odessa, FL, 1985).

46 P.T. Costa and R.R. McCrae, NEO Personality Inventory-3 (Psychological Assessment Resources, Odessa, FL, 2010).

47 D. Crandall, D. Cosley, and D. Huttenlocher, "Feedback effects between similarity and social influence in online communities," Proceedings of the 14th ACM SIGKDD International Conference on Knowledge (ACM, Las Vegas, NV, 2008), pp. 160–168.

48 I.M. Cron and S. Stabile, The Road Back to You: An Enneagram Journey to Self-Discovery (InterVarsity Press, 2016).

49 G. Csardi, T. Nepusz, and et al., "The igraph software package for research," InterJournal, Complex Systems, 1695, 1 (2006).

50 L. Davis, Handbook of genetic algorithms (Van Nostrand Reinhold, New York, NY, 1991).

51 A. Decelle, F. Krzakala, C. Moore, and L. Zdeborová, "Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications," Physical Review E, 84, (2011).

52 L.A. DeChurch, J.R. Mesmer-Magnus, and J.S. Center, "Maintaining shared mental models over long-duration exploration missions," NASA Technical Memorandum, NASA/TM-2015-218590. NASA Johnson Space Center, Houston, TX, (2015).

53 P. Dias and T. Ramadier, "Social Trajectory and Socio-Spatial Representation of Urban Space: The Relation between Social and Cognitive Structures.," Journal of Environmental Psychology, 41, 135 (2015).

54 D. Easley and J. Kleinberg, Networks, Crowds, and Markets: Reasoning about a Highly Connected World (Cambridge University Press, New York, NY, 2010).

55 R.C. Emanuel, "Do certain personality types have a particular communication style," International Journal of Social Science and Humanities, 2, 4 (2013).

260 56 P. Erdős and A. Rényi, "On the evolution of random graphs," Publ. Math. Inst. Hungar. Acad. Sci, 5, 17 (1960).

57 S. Even, Graph Algorithms (Cambridge University Press, New York, NY, 2011).

58 K. Faust and S. Wasserman, "Blockmodels: Interpretation and evaluation," Social networks, 14, 5 (1992).

59 R.M. Felder and R. Brent, "Understanding student differences," Journal of Engineering Education, 94, 57 (2005).

60 R.M. Felder, G.N. Felder, and E.J. Dietz, "The effects of personality type on engineering student performance and attitudes," Journal of engineering education, 91, 3 (2002).

61 S. Fortunato, "Community detection in graphs," Physics Reports, 486, 75 (2010).

62 O. Frank and D. Strauss, "Markov graphs," Journal of the American Statistical Association, 81, 832 (1986).

63 B. Freeman, Personality type and medical specialty (University of Chicago Hospital, Chicago, IL, 2009).

64 L. Freeman, "Computer Programs and Social Network Analysis," Connections, 11, 26 (1988).

65 L. Freeman, "Datasets," School of Social Sciences, (2008).

66 L.C. Freeman, "A set of measures of centrality based on betweenness," Sociometry, 35 (1977).

67 L.C. Freeman, "Centrality in social networks conceptual clarification," Social networks, 1, 215 (1978).

68 B.D. Friedman, M.J. Burns, and J. Cao, "Enterprise social networking data analytics within Alcatel-Lucent," Bell Labs Technical Journal, 18, 89 (2014).

69 A. Furnham and J. Crump, "Personality and management level: Traits that differentiate leadership levels," Psychology, 6, (2015).

70 A. Furnham and J. Crump, "The Myers-Briggs Type Indicator (MBTI) and Promotion at Work," Psychology, 6, 1510 (2015).

71 A. Furnham, J. Moutafi, and J. Crump, "The relationship between the revised NEO- personality inventory and the Myers-Briggs type indicator," Social Behavior and Personality: an international journal, 31, 577 (2003).

72 A. Gajewar and A. Das Sarma, "Multi-skill collaborative teams based on densest subgraphs," Proceedings of the 2012 SIAM International Conference on Data Mining (SIAM, Anaheim, CA, 2012), pp. 165–176.

261 73 M. Gaudesi, E. Piccolo, and G. Squillero, "Exploiting evolutionary modeling to prevail in Iterated Prisoner’s Dilemma tournaments," Intelligence and AI in Games, 8, 288 (2016).

74 J.L. Gersting, Mathematical Structures for Computer Science: Discrete Mathematics and Its Applications (W. H. Freeman and Company, New York, NY, 2014).

75 C.J. Geyer and E.A. Thompson, "Constrained Monte Carlo maximum likelihood for dependent data," Journal of the Royal Statistical Society, Series B (Methodological), 657 (1992).

76 R.O. Gilbert, Statistical Methods for Environmental Pollution Monitoring (Van Nostrand Reinhold Co, New York, NY, 1987).

77 P.A. Gloor, K. Fischbach, and H. Fuehres, "Towards “Honest Signals” of Creativity – Identifying Personality Characteristics Through Microscopic Social Network Analysis," Procedia - Social and Behavioral Sciences, 26, 166 (2011).

78 L.R. Goldberg, "An Alternative Description of personality: The Big-five factor structure," Journal of Personality and Social psychology, 59, 1216 (1990).

79 M.C. Golumbic, Algorithmic graph theory and perfect graphs, Second edition (Elsevier, Amsterdam, The Netherlands, 2004).

80 J. Goodman, "Why Is the “i” in Disc® in Lowercase?," (2002).

81 M. Grandjean, "A social network analysis of Twitter: Mapping the digital humanities community," Cogent Arts & Humanities, 3, (2016).

82 A. Grant, "Goodbye to MBTI: The fad that won’t die," Psychology Today, (2013).

83 J.L. Gross, J. Yellen, and and P. Zhang, editors , Handbook of Graph Theory. Second Edition. Discrete Mathematics and Its Applications (CRC Press, Taylor & Francis Group, Boca Raton, 2014).

84 E. Gunderson and D. Ryman, Group Homogeneity, Compatibility and Accomplishment (NAVY MEDICAL NEUROPSYCHIATRIC RESEARCH UNIT SAN DIEGO CA, 1967).

85 P. Hage and F. Harary, Structural Models in Anthropology (Cambridge University Press, New York, NY, 1983).

86 L. Hamill and N. Gilbert, "Simulating large social networks in agent-based models: A social circle model," Emergence: Complexity and Organization, 12, 78 (2010).

87 A. Hammer, editor , MBTI® Applications (Consulting Psychologists Press, Palo Alto, CA, n.d.).

262 88 M. Hay, G. Miklau, and D. Jensen, Anonymizing social networks (Computer Science Department Faculty, Publication Series 180, 2007).

89 J.H. Holland, Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. (University of Michigan Press, Oxford, England, 1975).

90 P.W. Holland, K.B. Laskey, and S. Leinhardt, "Stochastic blockmodels: First steps," Social networks, 5, 109 (1983).

91 P.W. Holland and S. Leinhardt, "A method for detecting structure in sociometric data," Social Networks, 411 (1977).

92 P.W. Holland and S. Leinhardt, "An exponential family of probability distributions for directed graphs," Journal of the American Statistical Association, 76, 33 (1981).

93 M. Hollander, D.A. Wolfe, and E. Chicken, Nonparametric Statistical Methods, Third edition (John Wiley & Sons, Inc, Hoboken, NJ, 2014).

94 P. Holme and J. Saramäki, "Temporal Networks," Physics Reports, 519, 97 (2012).

95 D. Hunter, "Curved exponential family models for social networks," Social networks, 29, 216 (2007).

96 L.A. Imhof, D. Fudenberg, and and M.A. Nowak, "Evolutionary cycles of cooperation and defection," Proceedings of the National Academy of Sciences, 102, 10797 (2005).

97 S. Jafrani, N. Zehra, and M. Zehra, "Assessment of personality type and medical specialty choice among medical students from Karachi; using Myers-Briggs Type Indicator (MBTI) tool," Journal of Pakistan Medical Association, 67, 520 (2017).

98 S. Ji, W. Li, and P. Mittal, "SecGraph: A Uniform and Open-source Evaluation System for Graph Data Anonymization and De-anonymization," USENIX Security Symposium (USENIX, Washington, DC, 2015), pp. 303–318.

99 O.P. John and S. Srivastava, The Big Five Trait Taxonomy: History, Measurement, and Theoretical Perspectives. Handbook of Personality: Theory and Research, Second edition (Elsevier, New York, NY, 1999).

100 C.G. Jung, Psychological Types. In: Volume 6 of The Collected Works of CG Jung (Princeton University Press, Princeton, NJ, 1971).

101 N.A. Kanas and W.E. Fedderson, Behavioral, psychiatric, and sociological problems of long-duration space missions (National Aeronautics and Space Administration, Houston, TX, 1971).

263 102 N. Katira, L. Williams, E. Wiebe, C. Miller, S. Balik, and E. Gehringer, "On understanding compatibility of student pair programmers," ACM SIGCSE Bulletin (ACM, 2004), pp. 7–11.

103 D. Keirsey, Please Understand Me II (Prometheus Nemesis Book Company, Del Mar, CA, 1998).

104 F. Kelly, "On stochastic population models in genetics," Journal of Applied Probability, 13, 127 (1976).

105 M. Kiss, A. Kun, A. Kapitány, and P. Erdei, "Regression Analysis of the Effect of Personality-Career Match on the Academic Performance in Business Higher Education: An Evidence from the University of Debrecen," Tudás – Tanulás – Szabadság Neveléstudományi Konferencia, 223 (2014).

106 A. Knecht, Networks and Actor Attributes in Early Adolescence [2003/04] (The Netherlands Research School ICS, Department of Sociology, Utrecht University, Utrecht, 2006).

107 V. Knight, "Axelrod Documentation," (2018).

108 D. Knoke and S. Yang, Social Network Analysis, Second edition (SAGE Publications, Thousand Oaks, CA, 2008).

109 A. Konak, D.W. Coit, and A.E. Smith, "Multi-objective optimization using genetic algorithms: A tutorial," Reliability Engineering and System Safety, 91, 992 (2006).

110 J. Koza, "Genetic programming as a means for programming computers by natural selection," Statistics and Computing (1994), pp. 87–112.

111 D. Krackhardt, "Cognitive social structures," Social networks, 9, 109 (1987).

112 V. Krebs, "Social capital: the key to success for the 21st century organization," IHRIM Journal, 12, 38 (2008).

113 W.G. Kropatsch, Y. Haxhimusa, and A. Ion, "Multiresolution image segmentations in graph pyramids," Applied Graph Theory in Computer Vision and Pattern Recognition (Springer, New York, NY, 2007), pp. 3–41.

114 M. Kuperman and G. Abramson, "Small world effect in an epidemiological model," Physical Review Letters, 86, 2909 (2001).

115 H. Kwak, C. Lee, H. Park, and S. Moon, "What is Twitter a social network or a news media?," Proceedings of the 19th International Conference on World Wide Web (ACM, New York, NY, 2010), pp. 591–600.

116 L.B. Landon, K.J. Slack, and J.D. Barrett, "Teamwork and collaboration in long-duration space missions: Going to extremes," American Psychologist, 73, 563 (2018).

264 117 L.B. Landon, W.B. Vessey, and J.D. Barrett, Risk of Performance and Behavioral Health Decrements Due to Inadequate Cooperation Coordination, Communication, and Psychosocial Adaptation within a Team (National Aeronautics and Space Administration, Houston, TX, 2015).

118 Lazega, E, The Collegial Phenomenon: The Social Mechanisms of Cooperation among Peers in a Corporate Law Partnership (Oxford University Press, Oxford, England, 2001).

119 J. Leskovec, D. Chakrabarti, and J. Kleinberg, "Kronecker graphs: An approach to modeling networks," Journal of Machine Learning Research, 11, 985 (2010).

120 J. Leskovec and A. Krevl, "SNAP Datasets: Stanford Large Network Dataset Collection," Stanford University, (2014).

121 J. Leskovec and R. Sosic, "SNAP: A general-purpose network analysis and graph-mining library," ACM Transactions on Intelligent Systems and Technology (TIST), 8, (2016).

122 J. Li, P. Hingston, and and G. Kendall, "Engineering design of strategies for winning Iterated Prisoner’s Dilemma competitions," IEEE Transactions on Computational Intelligence and AI in Games, 3, 348 (2011).

123 Y. Li, H. Cao, and G. Wen, "Simulation study on opinion formation models of heterogenous agents based on game theory and complex networks," Simulation, 11, (2018).

124 F. Littauer, Personality Plus. Fleming H. Revell a Division of Baker (Book House Company, Grand Rapids, MI, 1983).

125 D.A. Loffredo, S.K. Opt, and R. Harrington, "Communicator style and MBTI extraversion-introversion domains," Journal of Psychological Type, 68, 29 (2008).

126 A.J. Lotka, "The frequency distribution of scientific productivity," Journal of the Washington Academy of Sciences, 6, 317 (1926).

127 I. Lykourentzou, A. Antoniou, Y. Naudet, and S.P. Dow, "Personality matters: Balancing for personality types leads to better outcomes for crowd teams," Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (ACM, New York, NY, 2016), pp. 260–273.

128 D.A. MacDonald, P.E. Anderson, C.I. Tsagarakis, and C.J. Holland, "Examination of the relationship between the Myers-Briggs Type Indicator and the NEO Personality Inventory," Psychological Reports, 74, 339 (1994).

129 P. Mahadevan, D. Krioukov, K. Fall, and A. Vahdat, "Systematic topology analysis and generation using degree correlations," ACM SIGCOMM Computer Communication Review (ACM, 2006), pp. 135–146.

265 130 M. Malik and S. Zamir, "The Relationship between Myers Briggs Type Indicator (MBTI) and Emotional Intelligence among University Students," Journal of Education and Practice, 5, 35 (2014).

131 B. Manso and M. Manso, "Know the network, knit the network: applying SNA to N2C2 maturity model experiments," (Santa Monica, CA, 2010).

132 A.C. Martins, "Mobility and social network effects on extremist opinions," Physical Review E, 78, 036104 (2008).

133 M.L. Mauldin, "Chatterbots, tinymuds, and the Turing test: Entering the Loebner prize competition," AAAI, 94, 16 (1994).

134 M.H. McCaulley, Application of the Myers-Briggs Type Indicator to Medicine and Other Health Professions (Center for Applications of Psychological Type, Gainesville, Florida, 1977).

135 R.R. McCrae and P.T. Costa, "Validation of the Five-Factor Model of Personality across Instruments and Observers," Journal of Personality and Social Psychology, 52, (1987).

136 R.R. McCrae and P.T. Costa Jr, "Reinterpreting the Myers-Briggs type indicator from the perspective of the five-factor model of personality," Journal of personality, 57, 17 (1989).

137 R.R. McCrae, P.T. Costa Jr, and T.A. Martin, "The NEO–PI–3: A more readable revised NEO personality inventory," Journal of personality assessment, 84, 261 (2005).

138 R. Metzner, C. Burney, and A. Mahlberg, "Towards a reformulation of the typology of functions," Journal of Analytical Psychology, 26, 33 (1981).

139 L. Michell and A. Amos, "Girls, pecking order and smoking," Social Science and Medicine, 44, 1861 (1997).

140 R. Milo, N. Kashtan, S. Itzkovitz, M.E. Newman, and U. Alon, "On the uniform generation of random graphs with prescribed degree sequences," ArXiv preprint cond- mat/0312028, (2003).

141 A. Mislove, M. Marcon, and K.P. Gummadi, "Measurement and analysis of online social networks," Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (ACM, New York, NY, 2007), pp. 29–42.

142 W.D. Mitchell, "The Distribution of MBTI Types In the US by Gender and Ethnic Group," Journal of Psychological Type, 37, (1996).

143 S. Mittal and Deb, K., "Optimal strategies of the Iterated Prisoner’s Dilemma problem for multiple conflicting objectives," IEEE Transactions on Evolutionary Computation, 13, 554 (2009).

266 144 M. Molloy and B. Reed, editors , "A critical point for random graphs with a given degree sequence," Random Structures & Algorithms, 6, 161 (1995).

145 M. Molloy and B. Reed, "The size of the giant component of a random graph with a given degree sequence," Combinatorics, probability and computing, 7, 295 (1998).

146 S.M. Morgan, De-Anonymizing Social Network Neighborhoods Using Auxiliary and Semantic Information, Master’s Thesis, State University of New York Polytechnic Institute, 2015.

147 J. Moutafi, A. Furnham, and J. Crump, "Is managerial level related to personality?," British Journal of Management, 18, 272 (2007).

148 I.B. Myers, The Myers-Briggs Type Indicator: Manual (Consulting Psychologists Press, Palo Alto, CA, 1962).

149 I.B. Myers and M.H. McCauley, Manual: A Guide to the Development and Use of the Myers-Briggs Type Indicator (Consulting Psychologists Press, Palo Alto, California, 1985).

150 A. Narayanan, E. Shi, and B.I. Rubinstein, "Link prediction by de-anonymization: How we won the kaggle social network challenge," The 2011 International Joint Conference on Neural Networks Conference Proceedings (IEEE Computational Intelligence Society, Piscataway, NJ, 2011), pp. 1825–1834.

151 A. Narayanan and V. Shmatikov, "Robust de-anonymization of large sparse datasets," IEEE Symposium on Security and Privacy (IEEE Computer Society, Los Alamitos, CA, 2008), pp. 111–125.

152 A. Narayanan and V. Shmatikov, "De-anonymizing social networks," 30th IEEE Symposium on Security and Privacy (IEEE Computer Society Conference Publishing Services, Los Alamitos, CA, 2009), pp. 173–187.

153 P.D. Nelson, Compatibility among Work Associates in Isolated Groups (NAVY MEDICAL NEUROPSYCHIATRIC RESEARCH UNIT SAN DIEGO CA, 1964).

154 M. Newman, Networks: An Introduction (Oxford University Press, New York, 2010).

155 M.E. Newman, Who Is the Best Connected Scientist? A Study of Scientific Coauthorship Networks (Cornell University Library, Ithaca, NY, 2000).

156 M.E. Newman, "The structure and function of networks," Computer Physics Communications, 147, 40 (2002).

157 M.E. Newman, "The structure and function of complex networks," SIAM review, 45, 167 (2003).

267 158 M.E. Newman, S.H. Strogatz, and D.J. Watts, "Random graphs with arbitrary degree distributions and their applications," Physical review, 64, 026118 (2001).

159 M. Newman, Networks, Second Edition (Oxford University Press, Oxford, UK, 2018).

160 M.E. Newman and M. Girvan, "Finding and evaluating community structure in networks," Physical review E, 69, 026113 (2004).

161 H. Nordvik, "Type, vocation, and self-report personality variables: A validity study of a Norwegian translation of the MBTI, Form G," Journal of Psychological Type, 29, 32 (1994).

162 R. Norton, "Communicator style: Theory, applications, and measures," SAGE Series in Interpersonal Communication (SAGE Publications, Beverly Hills, CA, 1983), p. 319.

163 R.W. Norton, "Foundations of a communicator style construct," Human Communication Research, 4, 99 (1978).

164 M. Nowak and Sigmund, K., "A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner’s Dilemma game," Nature, 364, 56 (1993).

165 K. Nowicki and T.A.B. Snijders, "Estimation and prediction for stochastic blockstructures," Journal of the American Statistical Association, 96, 1077 (2001).

166 D.A. O’Neil and M.D. Petty, "Organizational simulation for model based systems engineering," Procedia Computer Science, 16, 323 (2013).

167 D.A. O’Neil and M.D. Petty, "A Monte Carlo Algorithm for Generating Synthetic Social Networks Based on Personality Compatibility," Proceedings of the 2019 IEEE SoutheastCon (IEEE, Huntsville, AL, 2019).

168 D.A. O’Neil and M.D. Petty, "Heuristic methods for synthesizing realistic social networks based on personality compatibility," Applied , 4, (2019).

169 D.A. O’Neil and M.D. Petty, "Synthesizing Social Networks with Iterated Prisoners’ Dilemma," Proceedings of the 16th International Conference on Modeling, Simulation, and Visualization Methods (Las Vegas NV, 2019).

170 F. Papadopoulos, M. Kitsak, M.Á. Serrano, M. Boguná, and D. Krioukov, "Popularity versus similarity in growing networks," Nature, 489, 537 (2012).

171 P. Pattison, S. Wasserman, G. Robins, and A.M. Kanfer, "Statistical Evaluation of Algebraic Constraints for Social Networks," Journal of Mathematical Psychology, 44, 536 (2000).

172 M.D. Petty, "The Turing test as an evaluation criterion for computer generated forces," Proceedings of the 4th Conference on Computer Generated Forces and Behavioral Representation (University of Central Florida, Orlando, FL, 1994), pp. 4–6.

268 173 M.D. Petty, "Verification, Validation, and Accreditation," Modeling and Simulation Fundamentals: Theoretical Underpinnings and Practical Domains, edited by J.A. Sokolowski and Banks, C.M. (John Wiley & Sons, Hoboken NJ, 2010), pp. 325–372.

174 A. Rapoport, "Contribution to the Theory of Random and Biased Nets," The Bulletin of Mathematical Biophysics, 19, 257 (1957).

175 A.J. Reagan, L. Mitchell, and D. Kiley, "The emotional arcs of stories are dominated by six basic shapes," EPJ Data Science, 5, (2016).

176 R.M. Ripley, T.A.B. Snijders, Z. Boda, A. Voros, and P. Preciado, Manual for Siena Version 4.0. R Package, version 1.2-12 (University of Oxford, Oxford, UK, 2018).

177 G. Robins, P. Pattison, Y. Kalish, and D. Lusher, "An introduction to exponential random graph (p*) models for social networks," Social networks, 29, 173 (2007).

178 F.J. Roethlisberger and W.J. Dickson, Management and the Worker (Harvard University Press, Cambridge, MA, 1939).

179 P. Rosati, "Student retention from first-year engineering related to personality type," Frontiers in Education Conference, 1993. Twenty-Third Annual Conference. “Engineering Education: Renewing America’s Technology”, Proceedings (IEEE, Piscataway, NJ, 1993), pp. 37–39.

180 J.P. Rushton and P. Irwing, "A General Factor of Personality (GFP) from two meta- analyses of the Big Five," Personality and Individual Differences, 45, 679 (2008).

181 S. Sampson, Crisis in a Cloister. Unpublished Doctoral Dissertation (Cornell University, Ithaca, NY, 1969).

182 W. Schutz, The Human Element: Productivity, Self-Esteem, and the Bottom Line (Jossey- Bass, 1994).

183 E. Schwimmer, Exchange in the Social Structure of the Orokaiva: Traditional and Emergent Ideologies in the Northern District of Papua (Hurst and Co, London, England, 1973).

184 E. Schwimmer, "Reciprocity and Structure: A Semiotic Analysis of Some Orokaiva Exchange Data," Man, 14, 271 (1979).

185 J. Scott, Social Network Analysis: A Handbook, Second edition (SAGE Publications, Thousand Oaks, CA, 2000).

186 J. Scott and P.J. Carrington, The SAGE Handbook of Social Network Analysis (SAGE Publications, Thousand Oaks, CA, 2011).

187 C. Seshadhri, T.G. Kolda, and A. Pinar, "Community structure and scale-free collections of Erdős-Rényi graphs," Physical Review E, 85, 056109 (2012).

269 188 S. Sharma and G.N. Purohit, "Evaluation of Alternative Centrality Measure Algorithm for Tracking Online Community in Social Network.," International Journal of Engineering Research, 1, 28 (2012).

189 M. Smith, D.L. Hansen, and E. Gleave, "Analyzing enterprise social media networks," 2009 International Conference on Computational Science and Engineering (IEEE, Vancouver, BC, 2009), pp. 705–710.

190 T.A. Snijders, "Markov chain Monte Carlo Estimation of Exponential Random Graph Models," Journal of Social Structure, 3, 1 (2002).

191 T.A. Snijders, G.G. Van de Bunt, and C.E. Steglich, "Introduction to stochastic actor- based models for network dynamics," Social networks, 32, 44 (2010).

192 R. Sosic and J. Leskovec, "Large Scale Network Analytics with SNAP: Tutorial at the World Wide Web Conference," Proceedings of the 24th International Conference on World Wide Web (ACM, New York, NY, 2015), pp. 1537–1538.

193 M. Srivatsa and M. Hicks, "Deanonymizing mobility traces: Using social network as a side-channel," Proceedings of the ACM Conference on Computer and Communications Security (ACM, New York, NY, 2012), pp. 628–637.

194 C.L. Staudt, M. Hamann, and A. Gutfraind, "Generating realistic scaled complex networks," Applied Network Science, 2, (2017).

195 S.H. Strogatz, "Exploring complex networks," Nature, 410, 268 (2001).

196 V.M. T, H. D, and C. AT, "Employing the Gini coefficient to measure participation inequality in treatment-focused Digital Health Social Networks," Network Modeling Analysis in Health Informatics and Bioinformatics, 5, (2016).

197 B. Thurman, "In the office: Networks and coalitions," Social Networks, 2, 47 (1979).

198 R.J. Trudeau, Introduction to Graph Theory (Dover, Mineola, NY, 1993).

199 M. Tsvetovat and K. Carley, Generation of Realistic Social Network Datasets for Testing of Analysis and Simulation Tools (Carnegie Mellon University, Pittsburgh, PA, 2005).

200 E.C. Tupes and R.E. Christal, "Recurrent personality factors based on trait ratings," Journal of Personality, 60, 225 (1992).

201 A.M. Turing, "Computing Machinery and Intelligence," Mind, 49, 433 (1950).

202 G.G. Van de Bunt, Friends by Choice. An Actor-Oriented Statistical Network Model for Friendship Networks through Time, Thesis Publishers, 1999.

270 203 G.G. Van de Bunt and M.A.J. Van Duijn, "Friendship networks through time: An actor- oriented statistical network model," Computational and Mathematical Organization Theory, 5, 167 (1999).

204 M.A. Van Duijn, E.P. Zeggelink, M. Huisman, F.N. Stokman, and F.W. Wasseur, "Evolution of sociology freshmen into a friendship network," Journal of Mathematical Sociology, 27, 153 (2003).

205 V.J. Varner, "Relationship Type Combinations." APTI Bulletin of Psychological Type, 30, (2007).

206 F. Viger and M. Latapy, "Efficient and simple generation of random simple connected graphs with prescribed degree sequence," International Computing and Combinatorics Conference (Springer, New York, NY, 2005), pp. 440–449.

207 S. Wasserman and P. Pattison, "Logit models and logistic regressions for social networks: I An introduction to Markov graphs and p*," Psychometrika, 61, 401 (1996).

208 D.J. Watts and S.H. Strogatz, "Collective dynamics of “small-world” networks," Nature, 393, (1998).

209 C.M. Webster, Task-Related and Context-Based Constraints in Observed and Reported Relational Data, PhD Thesis, University of California Irvine, 1993.

210 D.T. Weiler, The Effect of Role Assignment and Personality Subtypes in Simulation on Critical Thinking Development, Situation Awareness, and Perceived Self-Efficacy of Nursing Baccalaureate Students, Master’s Thesis, University of Louisville, 2017.

211 P. West and Sweeting, H., Background Rationale and Design of the West of Scotland 11- 16 Study (MRC Medical Sociology Unit, Glasgow, UK, 1995).

212 D. Whitley, "A genetic algorithm tutorial," Statistics and Computing, 4, 65 (1994).

213 U. Wilensky, NetLogo (Northwestern University, Evanston, IL, 1999).

214 U. Wilensky, NetLogo NW-Extension (2013).

215 U. Wilensky and W. Rand, An Introduction to Agent-Based Modeling: Modeling Natural, Social, and Engineered Complex Systems with NetLogo (The MIT Press, Cambridge, MA, 2015).

216 L. Williams, L. Layman, J. Osborne, and N. Katira, "Examining the compatibility of student pair programmers," AGILE 2006 (AGILE’06) (IEEE, 2006), pp. 10–pp.

217 L.H. Wong, P. Pattison, and G. Robins, "A spatial model for social networks," Physica A: Statistical Mechanics and its Applications, 360, 99 (2006).

271 218 J. Yang and J. Leskovec, "Defining and evaluating network communities based on ground-truth," Knowledge and Information Systems, 42, 181 (2015).

219 W.W. Zachary, "An information flow model for conflict and fission in small groups," Journal of Anthropological Research, 33, 452 (1977).

220 J.H. Zar, Biostatistical Analysis, Fifth edition (Pearson Education, Upper Saddle River, NJ, 2010).

221 B. Zhou and J. Pei, "Preserving privacy in Social Networks Against Neighborhood Attacks," Proceedings of the 2008 IEEE 24th International Conference on Data Engineering (IEEE, Cancun, Mexico, 2008), pp. 506–515.

222 B. Zhou, J. Pei, and W. Luk, "A brief survey on anonymization techniques for privacy preserving publishing of social network data," ACM Sigkdd Explorations Newsletter, 10, 12 (2008).

272