Object-Oriented Genetic Programming for the Automatic Inference of Graph Models for Complex Networks

Michael Richard Medland

Department of Computer Science

Submitted in partial fulfillment of the requirements for the degree of

Master of Science

Faculty of Mathematics and Science, Brock University St. Catharines, Ontario

c M.R. Medland, 2014 To my children, You have are the reason for my perseverance,

And

To my extraordinary wife, You have been my rock, my sounding board, and saving grace. Abstract

Complex networks are systems of entities that are interconnected through meaningful relationships. The result of the relations between entities forms a structure that has a statistical complexity that is not formed by random chance. In the study of complex networks, many graph models have been proposed to model the behaviours observed. However, constructing graph models manually is tedious and problematic. Many of the models proposed in the literature have been cited as having inaccuracies with respect to the complex networks they represent. However, recently, an approach that automates the inference of graph models was proposed by Bailey [10] The proposed methodology employs genetic programming (GP) to produce graph models that ap- proximate various properties of an exemplary graph of a targeted complex network. However, there is a great deal already known about complex networks, in general, and often specific knowledge is held about the network being modelled. The knowl- edge, albeit incomplete, is important in constructing a graph model. However it is difficult to incorporate such knowledge using existing GP techniques. Thus, this the- sis proposes a novel GP system which can incorporate incomplete expert knowledge that assists in the evolution of a graph model. Inspired by existing graph models, an abstract graph model was developed to serve as an embryo for inferring graph models of some complex networks. The GP system and abstract model were used to reproduce well-known graph models. The results indicated that the system was able to evolve models that produced networks that had structural similarities to the networks generated by the respective target models. Acknowledgements

Thank you to my supervisor Dr. Beatrice Ombuki-Berman, you have been there for me for a number of years now and have been the compass of my work. Thank you to my peers, the faculty of the department of Computer Science, and other mentors. You have been a wealth of knowledge from which to draw inspiration and have been pivotal in my successes. I especially would like to acknowledge the efforts of Dr. Mario Ventresca, Alex Bailey, and Kyle Harrison. Thank you, finally, to my supervisory committee for your efforts in the process of constructing this thesis, it is undoubtedly better as a result.

M.R.M Contents

1 Introduction 1 1.1 Main Goal ...... 3 1.2 Challenges and Contributions ...... 3 1.2.1 Abstracting a Graph Model ...... 4 1.2.2 Function Sets ...... 4 1.2.3 Contributions ...... 4 1.3 Thesis Structure ...... 5

2 Complex Networks 6 2.1 Social Networks ...... 7 2.2 Biological Networks ...... 8 2.3 Information Networks ...... 8 2.4 Representing Complex Networks ...... 9 2.5 Measuring Complex Networks ...... 10 2.5.1 Average Geodesic Path Length ...... 11 2.5.2 Transitivity ...... 12 2.5.3 Degree Distribution ...... 12

3 Graph Models 14 3.1 Erdos-Renyi Model ...... 14 3.2 Watts-Strogatz Small World Model ...... 16 3.3 Barabasi-Albert Model ...... 18 3.4 Growing Random Graph Model ...... 20 3.5 Ageing Preferential Attachment Model ...... 20 3.6 Forest Fire Model ...... 22

4 Modelling Techniques for Graph Models 25 4.1 P ∗ Graphs ...... 25

iv 4.2 Kronecker Graphs ...... 26 4.3 Chung-Lu graphs ...... 27 4.4 Evolutionary Modelling Strategies ...... 28 4.5 Discussion ...... 28

5 Genetic Programming 30 5.1 The Genetic Programming Algorithm ...... 30 5.1.1 Selection ...... 31 5.1.2 Genetic Operators ...... 31 5.2 Tree-Based Genetic Programming ...... 32 5.3 Linear Genetic Programming ...... 34 5.4 Object-Oriented Genetic Programming ...... 35

6 LinkableGP 37 6.1 Facilitation of Expert Knowledge ...... 37 6.2 Motivation ...... 38 6.3 Structure and Representation ...... 38 6.3.1 The Abstract Class ...... 39 6.3.2 The Genotype ...... 39 6.3.3 The Phenotype and the Language ...... 39 6.3.4 Mapping Genotype to Phenotype ...... 40 6.4 Operations ...... 45 6.4.1 Crossover ...... 45 6.4.2 Mutation ...... 46

7 Proposed Methodology 47 7.1 Abstract Graph Model ...... 48 7.2 GP Language ...... 49 7.3 Fitness Evaluation ...... 52

8 Experiments – Reproducing Graph Models 54 8.1 Experimental Setup ...... 55 8.2 Growing Random Model ...... 56 8.3 Barabasi-Albert Model ...... 62 8.4 Forest Fire Model ...... 66 8.5 Ageing Preferential Attachment Model ...... 70 8.6 Overall Performance ...... 74 8.7 Summary ...... 76

9 Comparison of Proposed Method 79 9.1 Incorporation of Expert Knowledge ...... 80 9.2 Performance ...... 82

10 Conclusion 84 10.1 Limitations ...... 86 10.2 Future Work ...... 87

Bibliography 89

Appendices 98

A Select Evolved Graph Models 98

B Graph Visualizations of Evolved Model vs. Real Model by Number of Vertices 103

C Empirical Cumulative Distributions of Fitness Results by Experi- ment and Objective 116

D GP Convergence Plots 141 List of Tables

5.1 Example of Byte to Instruction Mapping ...... 34

6.1 Initial Example Language ...... 42 6.2 Example Language after one instruction ...... 43 6.3 Example Language after two instructions ...... 43 6.4 Example Language after four instructions ...... 44

7.1 Accessible Collections of Functions and Terminals by Abstract Method 52 7.2 Example of Sum of Ranks ...... 53

8.1 LinkableGP Parameters ...... 56 8.2 Experiments ...... 57 8.3 Final evolved GR models results comparing 1000 graphs produce by the best-evolved model vs. its target graph ...... 59

8.4 Dcrit Values ...... 60 8.5 T-Values of the comparisons between GR evolved models and the grow- ing random model. The critical T-Statistic for a 0.01 significance test with 1000 degrees of freedom is 2.326...... 62 8.6 Final evolved BA models results comparing 1000 graphs produce by the evolved model vs. its target graph ...... 64 8.7 T-Values of the comparisons between BA evolved models and the Barabasi-Albert model. The critical T-Statistic for a 0.01 significance test with 1000 degrees of freedom is 2.326...... 65 8.8 Final evolved FF models results comparing 1000 graphs produced by the evolved model vs. its target graph ...... 68

vii 8.9 Results of 1000 KS-tests comparing degree distributions of the evolved models and the forest fire model. The values in this table are the proportion of tests that showed that each graph pair had the similar degree distributions. Values in brackets are the computed p-value of a one-sided Pearson’s chi-square test which tested the null hypothesis that the evolved model passed the KS test at least as often as the target model...... 69 8.10 Final evolved APA models results comparing 1000 graphs produce by the evolved model vs. its target graph ...... 72 8.11 T-Values of the comparisons between APA evolved models and the growing random model. The critical T-Statistic for a 0.01 significance test with 1000 degrees of freedom is 2.326...... 72 8.12 Overall Final Fitness Results of GR-100, GR-250, GR-500, and GR-1000 ...... 74 8.13 Overall Final Fitness Results of BA-100, BA-250, BA-500, and BA-1000 ...... 75 8.14 Overall Final Fitness Results of FF-100, FF-250, FF-500, and FF- 1000 ...... 76 List of Figures

2.1 Small-World Complex Network ...... 9

3.1 Erdos-Renyi G(n, p) Graph ...... 16 3.2 Erdos-Renyi G(n, M) Graph ...... 17 3.3 Watts-Strogatz Small World Graph ...... 18 3.4 Barabasi-Albert Preferential-Attachment Graph ...... 19 3.5 Growing Random Graph ...... 21 3.6 Ageing Preferential-Attachment Graph ...... 22 3.7 Forest Fire Graph ...... 23

5.1 Singly-typed Tree-based Program ...... 33 5.2 Tree-based GP Crossover Operation ...... 33 5.3 Chromosome Comprised of an Array of Bytes ...... 34

6.1 Visualization of the genotype to phenotype mapping for an individual in the LinkableGP system...... 39 6.2 Example Chromosome ...... 41 6.3 Visualization of the two-phase crossover operation in the LinkableGP system...... 46

8.1 Graphs from the GR model and the GR-100 model ...... 58 8.2 Box plots of the average geodesic path length (AGP) and network av- erage clustering coefficient (CC) for GR experiments (evolved) versus the growing random model(target). In each chart the four groups of box plots represent, in order from left to right, graphs with 100 vertices, 250 vertices, 500 vertices, and 1000 vertices ...... 61 8.3 Graphs from the BA model and the BA-250 model ...... 63

ix 8.4 Box plots of the average geodesic path length (AGP) and network av- erage clustering coefficient (CC) for BA experiments (evolved) versus the growing random model(target). In each chart the four groups of box plots represent, in order from left to right, graphs with 100 vertices, 250 vertices, 500 vertices, and 1000 vertices ...... 65 8.5 Graphs from the FF model and the FF-100 model ...... 67 8.6 Box plots of the average geodesic path length (AGP) and network average clustering coefficient (CC) for FF experiments (evolved) versus the growing random model(target) ...... 70 8.7 Box plots of the average geodesic path length (AGP) and network aver- age clustering coefficient (CC) for APA experiments (evolved) versus the growing random model(target) ...... 73 8.8 Convergence Plot for FF-500 ...... 77 8.9 Convergence Plot for BA-250 ...... 77

B.1 Growing Random Model with 100 Vertices ...... 104 B.2 Evolved Growing Random Model with 100 Vertices ...... 104 B.3 Growing Random Model with 250 Vertices ...... 105 B.4 Evolved Growing Random Model with 250 Vertices ...... 105 B.5 Growing Random Model with 500 Vertices ...... 106 B.6 Evolved Growing Random Model with 500 Vertices ...... 106 B.7 Growing Random Model with 1000 Vertices ...... 107 B.8 Evolved Growing Random Model with 1000 Vertices ...... 107 B.9 Barabasi-Albert Model with 100 Vertices ...... 108 B.10 Evolved Barabasi-Albert Model with 100 Vertices ...... 108 B.11 Barabasi-Albert Model with 250 Vertices ...... 109 B.12 Evolved Barabasi-Albert Model with 250 Vertices ...... 109 B.13 Barabasi-Albert Model with 500 Vertices ...... 110 B.14 Evolved Barabasi-Albert Model with 500 Vertices ...... 110 B.15 Barabasi-Albert Model with 1000 Vertices ...... 111 B.16 Evolved Barabasi-Albert Model with 1000 Vertices ...... 111 B.17 Forest Fire Model with 100 Vertices ...... 112 B.18 Evolved Forest Fire Model with 100 Vertices ...... 112 B.19 Forest Fire Model with 250 Vertices ...... 113 B.20 Evolved Forest Fire Model with 250 Vertices ...... 113 B.21 Forest Fire Model with 500 Vertices ...... 114 B.22 Evolved Forest Fire Model with 500 Vertices ...... 114 B.23 Forest Fire Model with 1000 Vertices ...... 115 B.24 Evolved Forest Fire Model with 1000 Vertices ...... 115

C.1 Distribution of Differences in Average Geodesic Path Lengths for GR- 100 ...... 117 C.2 Distribution of Differences in Clustering Coefficients for GR-100 .. 117 C.3 Distribution of Differences in In-Degree Distributions for GR-100 .. 118 C.4 Distribution of Differences in Out-Degree Distributions for GR-100 . 118 C.5 Distribution of Differences in Average Geodesic Path Lengths for GR- 250 ...... 119 C.6 Distribution of Differences in Clustering Coefficients for GR-250 .. 119 C.7 Distribution of Differences in In-Degree Distributions for GR-250 .. 120 C.8 Distribution of Differences in Out-Degree Distributions for GR-250 . 120 C.9 Distribution of Differences in Average Geodesic Path Lengths for GR- 500 ...... 121 C.10 Distribution of Differences in Clustering Coefficients for GR-500 .. 121 C.11 Distribution of Differences in In-Degree Distributions for GR-500 .. 122 C.12 Distribution of Differences in Out-Degree Distributions for GR-500 . 122 C.13 Distribution of Differences in Average Geodesic Path Lengths for GR- 1000 ...... 123 C.14 Distribution of Differences in Clustering Coefficients for GR-1000 .. 123 C.15 Distribution of Differences in In-Degree Distributions for GR-1000 . 124 C.16 Distribution of Differences in Out-Degree Distributions for GR-1000 124 C.17 Distribution of Differences in Average Geodesic Path Lengths for BA- 100 ...... 125 C.18 Distribution of Differences in Clustering Coefficients for BA-100 .. 125 C.19 Distribution of Differences in In-Degree Distributions for BA-100 .. 126 C.20 Distribution of Differences in Out-Degree Distributions for BA-100 . 126 C.21 Distribution of Differences in Average Geodesic Path Lengths for BA- 250 ...... 127 C.22 Distribution of Differences in Clustering Coefficients for BA-250 .. 127 C.23 Distribution of Differences in In-Degree Distributions for BA-250 .. 128 C.24 Distribution of Differences in Out-Degree Distributions for BA-250 . 128 C.25 Distribution of Differences in Average Geodesic Path Lengths for BA- 500 ...... 129 C.26 Distribution of Differences in Clustering Coefficients for BA-500 .. 129 C.27 Distribution of Differences in In-Degree Distributions for BA-500 .. 130 C.28 Distribution of Differences in Out-Degree Distributions for BA-500 . 130 C.29 Distribution of Differences in Average Geodesic Path Lengths for BA- 1000 ...... 131 C.30 Distribution of Differences in Clustering Coefficients for BA-1000 .. 131 C.31 Distribution of Differences in In-Degree Distributions for BA-1000 . 132 C.32 Distribution of Differences in Out-Degree Distributions for BA-1000 132 C.33 Distribution of Differences in Average Geodesic Path Lengths for FF-100133 C.34 Distribution of Differences in Clustering Coefficients for FF-100 ... 133 C.35 Distribution of Differences in In-Degree Distributions for FF-100 .. 134 C.36 Distribution of Differences in Out-Degree Distributions for FF-100 . 134 C.37 Distribution of Differences in Average Geodesic Path Lengths for FF-250135 C.38 Distribution of Differences in Clustering Coefficients for FF-250 ... 135 C.39 Distribution of Differences in In-Degree Distributions for FF-250 .. 136 C.40 Distribution of Differences in Out-Degree Distributions for FF-250 . 136 C.41 Distribution of Differences in Average Geodesic Path Lengths for FF-500137 C.42 Distribution of Differences in Clustering Coefficients for FF-500 ... 137 C.43 Distribution of Differences in In-Degree Distributions for FF-500 .. 138 C.44 Distribution of Differences in Out-Degree Distributions for FF-500 . 138 C.45 Distribution of Differences in Average Geodesic Path Lengths for FF- 1000 ...... 139 C.46 Distribution of Differences in Clustering Coefficients for FF-1000 .. 139 C.47 Distribution of Differences in In-Degree Distributions for FF-1000 .. 140 C.48 Distribution of Differences in Out-Degree Distributions for FF-1000 140

D.1 GR-100 Convergence Plot ...... 142 D.2 GR-250 Convergence Plot ...... 143 D.3 GR-500 Convergence Plot ...... 144 D.4 GR-1000 Convergence Plot ...... 145 D.5 BA-100 Convergence Plot ...... 146 D.6 BA-250 Convergence Plot ...... 147 D.7 BA-500 Convergence Plot ...... 148 D.8 BA-1000 Convergence Plot ...... 149 D.9 FF-100 Convergence Plot ...... 150 D.10 FF-250 Convergence Plot ...... 151 D.11 FF-500 Convergence Plot ...... 152 D.12 FF-1000 Convergence Plot ...... 153 D.13 APA-100 Convergence Plot ...... 154 D.14 APA-250 Convergence Plot ...... 155 D.15 APA-500 Convergence Plot ...... 156 D.16 APA-1000 Convergence Plot ...... 157 Chapter 1

Introduction

The main contribution of this work is the introduction of a novel genetic program- ming (GP) technique which is useful for the automation of the inference of complex networks. GP is an evolutionary computational technique based on the ideas of Dar- winian evolution. It begins with a population of randomly created programs and utilizes recombination and mutation operators to evolve fit programs. It has been shown to be useful in a number of applications such as regression [9, 74, 87], the development of circuits [45], evolutionary art [16], and the construction of artificial agents [41], and even the automatic inference of complex networks [11]. A complex network is a system of entities that are interrelated via connections in meaningful ways [69]. The study of these systems reveals a great deal about the systems, both natural and artificial, which they represent. For example, Facebook is a kind of a complex network where the members are entities, and the connections are friendships [86]. Food webs are examples of natural complex networks wherein the animals, both prey and predator, are the entities and the transfer of energy from prey to predator are the connections [69]. Other complex networks might include the Internet [70], protein interactions [69], citation networks [5, 13, 29], neural pathways of the brain [12], and even sexual relations between high school students [15]. Understanding complex networks requires that we understand the structure and dynamics of the networks being investigated. It is only then that the behaviour of the entities and the processes acting on them can be understood properly. One way in which complex networks are studied is via graph models [69]. Graph models are algorithms which, put simply, produce graphs. In the case of complex networks, they model the behaviours of the network to approximate the structure and dynamics of a complex network. There are a number of examples of graph models produced for complex networks

1 CHAPTER 1. INTRODUCTION 2 found in literature [68]. There exist graph models which attempt to identify preva- lent structural properties found in many complex networks such as power-law degree distributions [13], small world effect [90], and community structure emergence [54]. These networks properties have been demonstrated in a great number of real-world examples such as biological networks [84, 88], social networks [86], and technological networks [37]. However, constructing a graph model requires large amounts of data, in-depth study into the mechanisms of the network, and a great deal of time. Fur- thermore, constructed models are often cited for inaccuracies [13, 27, 54, 90] and in order to construct them one must have a deep understanding of the processes and behaviours of the network. In 2012, Bailey et al. [11] proposed a method for the automatic inference of graph models for complex networks. The method was an important step in the construc- tion of graph models for complex networks as it helped address the difficulties in the development of graph models for complex networks. In other words, there is a cir- cular nature in constructing graph models whereby it is necessary to understand the complex network at hand to develop a graph model for understanding the network [10]. In Bailey’s work [10], he also identified that some researchers reuse existing graph models to produce new models from other data that can lead to inaccuracies of the models. Nevertheless, this suggests that researchers have some understanding of how the graph model of the network at hand should behave. As such, work has gone into identifying classes of complex networks which helps in the manual construction of models (for example, see [8, 46, 47, 52]). The methods proposed by Bailey et al. [12] employed additional algorithms, outside of GP, such as community detection algorithms to assist in producing community structures. These observations suggest that there is some merit in utilizing the knowledge of the researcher to assist the GP system in producing models. This work is motivated primarily by this final point. Researchers utilizing a tool for the automation of the inference of graph models do have valuable knowledge to contribute about the complex network at hand. As such, this thesis is aims to produce a methodology for evolving graph models using a GP system which

• Allows the incorporation of expert knowledge

• Provides a means of expressing incomplete knowledge that does not hinder the GP performance

Expert knowledge, in the context of modelling complex networks, is any knowledge CHAPTER 1. INTRODUCTION 3 that is certain about the processes, structure, and properties of the complex network at hand or in general. As will be shown in Chapter 4, there is much that can be known about the structure of a complex networks in general. Moreover, sometimes there are insights based on the context of the network at hand that introduce new properties of the network. For example, Dorogovtsev and Mendes [27] identifies age of research papers as an important property in citation networks. No matter the case, expert knowledge is incomplete about a network being modelled. Thus, a framework should allow incomplete knowledge to be placed as easily as creating a filling the blanks puzzle so that a GP system can fill in the blanks without making creating huge jumps in its problem space.

1.1 Main Goal

This thesis, inspired by the previous work in the automatic inference of graph models, contends that a new methodology which affords the opportunity of the researcher to provide expert knowledge about the network being modelled is warranted. Compared to manual efforts in graph model construction, this reduces the time and effort re- quired to construct models, and also draws on the knowledge already known about the network at hand. Incorporating knowledge into automation can assist in pro- ducing not just reasonable models that replication the structure and dynamics of the network, but also models that have some essence of semantic encoded by the research. As a result of this goal, it is necessary to redesign traditional approaches to GP in order to allow the incorporation of such a system. This thesis proposes a novel GP system using the object-oriented programming paradigm to facilitate the incor- poration of a partial program. Using an Object-Oriented GP (OOGP) system which involves partial implementation of a program facilitates the incorporation of expert knowledge. Such a facility has benefits that extend beyond this work, provides further motivation for its development (see [60] for more details).

1.2 Challenges and Contributions

In designing a graph model, there is a great deal of knowledge applied both specif- ically about the complex network being modelled and complex networks in general. Incorporating this knowledge into a GP system is a non-trivial task, especially as much is still unknown about complex networks. Each network type has its a distinc- tiveness and caveats that researchers who study them are the best aware. Therefore, CHAPTER 1. INTRODUCTION 4 this thesis only begins to identify the utility of the proposed approach. While this thesis does benefit from previous work in automatic inference of graph models (see [10, 11, 12]), here directed complex networks are explored and therefore a new set of challenges encountered. Directed complex networks provide additional information through the directionality of relations. They provide a sense of flow in time or in information that is not captured completely by undirected networks. As a result of the directionality and the information it conveys, this provides a unique set of challenges. In order to produce a methodology that is capable of handling the additional challenges of directed complex networks modifications to the previous approaches.

1.2.1 Abstracting a Graph Model

The OOGP methodology, proposed in this thesis, requires the implementation of an abstract class as a template for programs produced. In the case of graph models, a model is required which encapsulates general behaviour of a complex network. If a model existed that described the general behaviour of all complex networks, then OOGP would not be beneficial as this would simply be a question of parameter tuning. Instead, some reliance on the network’s known behaviours must be encoded into the algorithm. This thesis proposes the first such abstraction for OOGP.

1.2.2 Function Sets

A function set provides the building blocks for the development of a program in GP. In abstracting the graph model for OOGP, the function set proposed in Bailey [10] is no longer usable because an abstract class (OOGP’s program) consist of several functions each with different purposes. Therefore, each function within the abstract class requires a function set. Each function set relates only to relevant tasks of the function it is mapped to. The complete set of functions utilized to construct the program needs be sufficient to express a large variety of models, but not so much that the OOGP becomes bogged-down finding useful configurations. This thesis proposes a collection of functions sets specific to the abstract class proposed.

1.2.3 Contributions

In summary, the contributions made by this thesis are CHAPTER 1. INTRODUCTION 5

1. Proposes a new methodology for the automatic inference of graph models for complex networks that facilitates the incorporation of known behaviours.

2. Proposes a novel OOGP methodology which facilitates the incorporation of expert knowledge.

3. Proposes an abstraction of graph models for the automatic inference of complex networks.

4. Proposes an OOGP function set for the construction of graph models.

1.3 Thesis Structure

The remainder of this thesis is organized as follows. Chapter 2 provides a brief in- troduction to complex networks in order to provide the reader with the requisite knowledge for this thesis. In Chapter 3, graph models will be discussed, and an in- troduction to some well known graph models of complex networks is provided. Then in Chapter 4, a review of some of the notable techniques model construction is pro- vided. Chapter 5 provides a background in GP sufficient to understand the proposed GP system. Chapter 6 presents the proposed OOGP system in detail including the structure and representation of the population of candidate programs. As well, the chapter will also provide details about the implementation of genetic operators for the OOGP. Chapter 7 will describe the exact methods used by the proposed method- ology. Chapter 8 will show how the system was used to reproduce some known graph models. Finally, Chapter 10 will provide some conclusions and other inspiration for future works. Chapter 2

Complex Networks

A complex network is a system of entities joined together through relations or con- nections. Governed by underlying processes, complex networks form a structure that is neither regular nor completely random. There is no definition of complex networks universally accepted. However, one definition that might be adopted as characteristic of complex networks is that they have a statistical complexity [48]. That is that there are patterns within the structure that have observable properties that can be used to construct probabilistic descriptions of the structure or behaviour of the network. Complex networks are found everywhere. They are widely studied in many fields such as social sciences [15, 22, 52], biology [57, 75, 84], computer science [76, 77], and others [69]. In biology, complex network are often investigated as molecular systems where protein structure or gene relationships are investigated. In social sciences, complex networks, often called , describe human interactions, like friendships in Facebook. The study of complex networks reveals a great deal about the world [69]. Study of individual classes of network often leads to discoveries outside of the domain of the network and into general theories of complex networks [5, 68]. Thus, a common form of representation is useful. A popular representation of complex networks is a graph [69]. This representation utilizes vertices connect via edges to represent the entities (vertices) and relations (edges) in complex networks. This representation provides a means of mathematically analysing the structure of complex networks [30], and thus facilitates quantitative study of complex networks. Since complex networks are dynamic systems [70], it is useful to be able to study their behaviour. A means of representing the dynamic behaviours of complex networks is through the graph model. In terms of modelling complex networks, graph models provide an algorithmic means of representing the processes that construct complex

6 CHAPTER 2. COMPLEX NETWORKS 7 networks. As a result, graph models can be employed to investigate properties of various complex networks [69]. Many graph models have been proposed for the study of complex networks [27, 54, 90]. In this chapter, an introduction to some types of complex networks is provided in Sections 2.1, 2.2, and 2.3. Then, in Section 2.4, a brief introduction to representing complex networks will be provided. This section will highlight two representations of complex networks, namely graphs and graph models. With an understanding of these representations, an introduction to measuring the structural properties of complex networks will be provided in Section 2.5. For those readers interested, a much more comprehensive introduction to complex networks can be found in [69].

2.1 Social Networks

Social Networks are networks that are the product of social interactions of people. The study of social networks can be dated back to the 1920s with works by Almack [7] and Wellman [91]. However, Moreno and Jennings [66] are often credited as pioneers in the study of social networks. Moreno and Jennings’s work with sociograms provided a framework for mapping group interactions. Sociograms are often used to limit misbehaviour in the classroom [93]. Milgram [63] is another important name in the study of social networks. His “small world” experiment [85] explored the notion of “six degrees of separation” that states that any given person is separated by no more than five other people for any given person. The property was demonstrated using a letter passing exercise. The exercise had several individuals from Nebraska passing a letter to other people they knew on a first name basis in another state. The recipients then passed the letter on to people in yet another state. The data collected from the experiment formed acquaintance chains showed an average number of intermediate steps of 5.2, and a great deal of the chains overlapped by the same three people. Social networks often investigate the relationship of friendships [8, 22, 63] but are certainly not limited to those relations. Other examples of social networks include citation networks [25, 90], sexual relations [15], race relations [81], and on-line social networks [52], i.e. Facebook , Google+, Twitter. CHAPTER 2. COMPLEX NETWORKS 8 2.2 Biological Networks

Biological networks are networks that are biological systems. This includes systems such as protein-protein interactions [57], neural networks [84], and food webs [56]. With increased access to biological information, the study of biological systems as complex networks is becoming more prevalent [75]. The first molecular networks were characterized by Dagley et al. [24] and with newer research tools, researchers can characterize protein-protein interactions and gene regulatory systems with increasing accuracy [75]. These networks are observed to have many of the same structural properties also shared outside of biological networks, such as the power-law degree distribution [88]. The study of molecular networks reveals information about the structure of the networks. The study provides insight into understanding biological processes [75]. Proulx et al. [75] also remarks that while empirical studies of biological networks pro- vides insight into the theory, application of theory must provide a predictive frame- work for testing the hypotheses. Such a practice is an important insight about bio- logical networks that extends into complex networks in general. It provides a strong motivation for the efforts of this thesis and its related works [10, 11, 12].

2.3 Information Networks

Information networks are complex systems of data linked together in some fashion. It is thought that all information networks are man-made [69]. Furthermore, networks of information often have social component [69]. Consider, for example, the social media site LinkedIn, a social network where members share affiliations together. However, there is also a flow of information about people observed through the interactions of endorsement of skills between members. Arguably, one of the most well-known examples of an information network is the World Wide Web (WWW). In this network of information, web pages are the linked together through hyper-links. While the WWW is a man-made collection of web pages, its growth is largely unregulated and leads to an enormous directed graph which makes its study challenging [6]. Nevertheless, the WWW has transformed the way knowledge is transmitted and has an underlying structure [37]. There are a great deal of other types of information networks which have drawn the attention of researchers such as peer-to-peer networks [77], recommender networks [40], keyword indexes [76], and citation networks [5, 13, 25, 38]. Of these, the citation CHAPTER 2. COMPLEX NETWORKS 9

94 96

92 93

95 98

90 76 97 99 91 100 78 88 75 74 22 73 72 79 77 24 1 9 23 83 81 89 2 11 10 82 85 87 20 70 7 21 26 71 6 80 25 84 12 3 18 86 4 15 16 8 17 14 5 13 47 19 66 69 53 57 34 27 67 49 56 65 55 33 35 63 64 52 54 45 48 37 68 51 58 39 32 50 36

38 29 31 62 40 46 44 61 41 59 60 43 30 28

42

Figure 2.1: Small-World Complex Network

network has been studied the longest [69]. Citation networks have been well studied and have informed theories of complex networks (not just information networks) on structural dynamics of complex networks. Examples include the power-law distribu- tion of degrees [13], community structure emergence [34], and the dynamics of ageing cites [27].

2.4 Representing Complex Networks

Having a good way to represent complex networks makes studying them much easier. One method of representation is a graph. A graph consists of points and links, referred to, respectively, as vertices and edges. Mathematically, a graph, denoted G = (V,E), is a set of vertices or nodes, V , and a set of edges, E. Figure 2.1 provides an example of a complex network in graph form. This graph represents a small-world network. An interpretation of the graph, used by Watts and Strogatz [90], is that the vertices are actors and the edges represent two actors having worked together on a movie [90]. Investigation of graphs of small-world networks shows that the average shortest distance to any two actors is proportional to log N, where N is the number of vertices in the network [90]. Investigating complex network through graphs is somewhat limiting. Without a great deal of observable data, the statistical complexity of a network’s structure and behaviour might not be revealed. Instead, attention can be turned to graph models. CHAPTER 2. COMPLEX NETWORKS 10

Graph models are algorithms that produce graphs. Graph models might produce regular graphs such as the K-regular graph, or they might produce random graphs, for example, the Erdos-Renyi random graph [28]. With a graph model, the processes that act on the network can be approximated and used to gather a large amount of data from the graphs it produces. For example, the network described in Figure 2.1 is produced by Algorithm 1, the Watts-Strogatz Small World model. This algorithm produces a graph, G = (V,E), which an average path length proportional to log N. In producing path lengths of approximately log N, the Watts-Strogatz Small-World model approximates the observed properties of the small-world network.

Algorithm 1: Watts-Strogatz Small World Model Input: n > 0, k > 0, d > 0, 1 ≥ p ≥ 0 Data: Watts-Strogatz Graph, G = (V,E) begin G ← lattice(n, k, d); E2 ← ∅; foreach (i, j) ∈ E do if random() < p then E2 ← E2 ∪ (i, random(|V |)); else E2 ← E2 ∪ (i, j); end if end foreach E ← E2; return G end

2.5 Measuring Complex Networks

The title of this section, Measuring Complex Networks, is something of a misnomer. The reality is that complex networks cannot be measured directly. Instead, the struc- ture of the network is analysed. Structural analysis is accomplished using statistical measures that quantify various aspects of the network. There exist a great number of measures for the structure of complex networks, many of which are identified by Newman [69]. However, for the purposes of this thesis, a few statistical measures are selected which identify key characteristics of the structure of networks. These are statistical measures commonly used in the study of complex networks found in CHAPTER 2. COMPLEX NETWORKS 11

literature [5, 8, 13, 14, 17, 25, 27, 63, 90]. Starting from the methods proposed by Bailey [10], immediately three common structural properties measured in complex networks are identified.

• Geodesic Path Length

• Transitivity

• Degree Distribution

These three measures formed the basis of fitness evaluation of the generated mod- els in the work done on automating the inference of graph models [10]. In the work of [10] these measures were used for fitness evaluation to construct graph models with a high degree of accuracy. Thus, this thesis continue this approach and leave evaluating the sufficiency of these measures to others [36]. Additionally, the aim was to propose a methodology for the inference of graph models for directed and unweighted com- plex networks where Bailey [10] is concerned with undirected and unweighted complex networks. Directed networks add new information not found in undirected networks as a result of the directionality of edges. In order to capture this new information, the procedures used in measuring the above structural properties. For computing the geodesic path lengths and transitivity of the network, this means that the computa- tional method should take into account the directionality of the edges. For the degree distribution, two measures are now required: in- and out-degree.

2.5.1 Average Geodesic Path Length

The average geodesic path length refers to the average length of the shortest path,

di,j, in a graph, G = (V,E), between all vertices i, j ∈ V [26]. Let l be the average of all geodesic path lengths of a directed graph such that (i, j) are vertex pairs in G and n is the number of vertices in the graph:

1 X l = d (2.1) n(n + 1) i,j i,j∈V |i6=j In social networks, persons with a lower mean distance between others might reveal a high influence of their opinions [69]. This measure is often found in studies related to types of social networks to illustrate the small path lengths of these networks (see [13, 54, 90]). CHAPTER 2. COMPLEX NETWORKS 12

2.5.2 Transitivity

In mathematics, given some relation ◦, transitivity is the implication, a◦b∧b◦c → a◦c. In terms of networks transitivity means that if there exists an edge, (a, b), and an edge, (b, c) that implies that there is an edge (a, c). Transitivity is an important property of social networks [69]. A graph that is transitive is said to have perfect transitivity [69]. Perfect tran- sitivity, however, is not a useful metric as only fully connected graphs have such a perfect transitivity. Instead, it is interesting to determine the amount of transitivity which occurs in a network, i.e. partial transitivity. One method for computing the partial transitivity of a network is to compute its clustering coefficient. The clustering coefficient refers to the probability that a vertex, and its neighbour are connected to a common vertex. There exist a number of measures of the clustering coefficient such as the global clustering coefficient, the local clustering coefficient, and the network average clus- tering coefficient. This thesis uses the network average clustering coefficient when considering clustering coefficients as it provides a single value and utilizes to the local clustering coefficient. The network average clustering coefficient is the average of the local clustering coefficient over all n vertices in the network (see Equation 2.2). To compute the local clustering coefficient for each vertex, i, Equation 2.3 is employed, where Ni is the set of vertices immediately connected vertex vi and ki = |Ni|.

Pn C C¯ = i=1 i (2.2) n

|{ejk : vj, vk ∈ Ni, ejk ∈ E}| Ci = (2.3) ki(ki − 1) This measure can be used to detect the presence of a small-world network [90]. If C¯ is higher in a network than the observed C¯ of a random network with a similar average geodesic path length than this indicates that the network exhibits characteristics of a small-world network.

2.5.3 Degree Distribution

Degree centrality is a prevalent in research into complex networks [69]. Degree cen- trality addresses the importance of a vertex within its network. There exist a number of measures that offer indications of the importance of a vertex in the network. The simplest of these is the degree of a vertex, that is, the number of edges connected CHAPTER 2. COMPLEX NETWORKS 13

to it. In a directed graph, there is also the in-degree and out-degree of a vertex. This is the number of in-bound edges (in-degree) and the number of out-bound edges (out-degree) of a vertex. The distribution of the degrees of vertices in a complex network can reveal im- portant information about the behaviour of a complex network. In preferential- attachment networks with a growing number of vertices (i.e., new vertices are added to the network over time), like the Barabasi-Albert model [13], the distribution of the degrees follows a power-law distribution. The power-law degree distribution referred to as a scale-free distribution, describes a distribution where the number of vertices with small degrees is large, and the distribution is heavily-tailed with only a few ver- tices having a high degree. More specifically, the probability, P (X) that a vertex will have a degree, X = d, where α and k are some constants, is

P (X) = αXk (2.4)

In order to compare the degree distributions of two graphs, a statistical method is required. In the case of this work, the Kolmogorov-Smirnoff test (KS test) [42] is used. In the two sample form, the KS test is a non-parametric hypothesis test such that the null hypothesis is that two samples originate from the same distribution function. The null hypothesis of a two-sample test is rejected at level α if Equation 2.5 is satisfied where Dn,n0 is computed as in Equation 2.6 where is c(α) is a value determined by the significance level of the test and the sample sizes of the two samples are n and n0.

In Equation 2.6, F1,n and F2,n0 are the empirical cumulative distributions of sample 1 and 2, respectively and x is common observation point, an arbitrary point in the range of both functions in the distributions.

rn + n0 D 0 > c(α) (2.5) n,n nn0

Dn,n0 = max |F1,n(x) − F2,n0 (x)| (2.6) x∈[0,min(n,n0)] For computing the empirical cumulative distribution of the degrees in a network,

Fn(x) is the proportion of vertices with degree di that satisfy di ≤ x (see Equation 2.7).

n 1 X F (x) = d ≤ x (2.7) n n i i=1 Chapter 3

Graph Models

As briefly introduced in Section 2.4 a graph model is an algorithm that produces a graph that holds certain properties of interest. Graph models help to understand complex network behaviours as they model the processes that form the network. Through repeated execution, graph models produce a set of graphs that have some form of commonality. The commonality of the graphs is dependent on the design of the graph model. For example, a graph designed to replicate transitivity will produce graphs that have similar clustering coefficients. However, the clustering coefficients of two graphs produced by such a model will differ somewhat. In other words, the graphs will not be isomorphisms, but rather will exhibit certain predictable properties. A number of graph models have been proposed which are designed to model var- ious phenomena Barab´asiand Albert [13], Dorogovtsev and Mendes [27], Leskovec et al. [54], Watts and Strogatz [90]. Furthermore, the phenomena are not attributable to random chance and the properties observed, with respect to a phenomenon, are not found in random graphs. In this chapter, the random graph model will be in- troduced. This model serves as a starting point for the study of graph models for complex networks and is important to the comparison of other graph models. Also in this chapter, a variety of graph models are introduced which model some structural properties found in real-world networks. Further information about graph models can be found in Amaral et al. [8], Krapivsky et al. [47], Newman [68, 69], Newman et al. [70], Watts and Strogatz [90].

3.1 Erdos-Renyi Model

The Erdos-Renyi model is a random graph model that can be represented two ways [33, 82]: G(n, p) and G(n, m) (illustrated later). The Erdos-Renyi model, specifically

14 CHAPTER 3. GRAPH MODELS 15

the one introduced by Solomonoff and Rapoport [82], is a well studied model (See, for example, Boccaletti et al. [17], Diestel [26], Newman [68, 69], Newman et al. [70]). Popularized by Erdos and Renyi [28], the network is a random graph model containing n vertices are connected with an independent probability of p.

Algorithm 2: Erdos-Renyi G(n, p) Graph Model Input: n > 0 0 ≤ p ≤ 1 Data: Erdos-Renyi Graph, G = (V,E) begin for i ← 1 to n do V ← V ∪ vi; end for for i ← 1 to n do for j ← 1 to n do if random() < p then E ← E ∪ (vi, vj); end if end for end for return G end

The model introduce by Solomonoff and Rapoport [82], denoted G(n, p), is defined in Algorithm 2. This is the original random graph model popularized by Erdos-Renyi [28]. In this model, n vertices are added to a graph. Afterwards, for all potential edges (i, j), the edge is added with a probability of p. The graphs produced by this model follow a binomial distribution of number of edges in the graph [29], where it is n expected |E| = p 2 in a produced graph, G = (V,E). An example graph produced by the G(n.p) model is provided in Figure 3.1. An alternative representation of the Erdos-Renyi model provided by Gilbert [33] differs from G(n, p) model as the edges are added randomly using an uniform dis- tribution. Furthermore, this representation only M edges are added to the graph. The work of Gilbert [33] and application of the alternative representation, denoted G(n, M), explores the probability that a path exists between two edges. In the G(n, M) model, given probability, p, that an edge between two vertices exists, the n−1 probability that two vertices are connected is Pn ∼ 1−n(p−1) and the probability n−1 that a path exists to all other vertices from a vertex is Rn ∼ 1−2(p−1) . Notably, n the G(n, M) model tends to behave similarly to the G(n, p) when M = p 2 . In Algorithm 3, the G(n, M) model begins by adding n vertices to the graph. CHAPTER 3. GRAPH MODELS 16

67

28

19

12 99 52 95 87 57 34 23 77 1 68 3 2

31 16 89 100 36 69 25 24 9 94 71 41 4 65 70 84 42 44 47 76 35 40 82 30 29 62 17 58 51 56 13 75 59 78 54 39 73 63

15 21 49 66 7 27 8 14 72

91 50 55 32 18 46 83 33 97 93 43 74 90 48 22 64 98 11 81 61 60 85 88 80 38 10 96 45 6 86 37 20 79 53 92 26 5

Figure 3.1: Erdos-Renyi G(n, p) Graph

Constructed where n = 100, p = 0.05, and self-edges where restricted

Then, m edges are constructed using two vertices randomly selected from the graph. An example of the output from the G(n, M) model is provided in Figure 3.2.

Algorithm 3: Erdos-Renyi G(n, M) Graph Model Input: n > 0, M > 0 Data: Erdos-Renyi Graph, G = (V,E) begin for i ← 1 to n do V ← V ∪ vi; end for for i ← 1 to M do v1 ← select(V ); v2 ← select(V ); E ← E ∪ (v1, v2); end for return G end

3.2 Watts-Strogatz Small World Model

The development of the Small World model proposed by Watts and Strogatz [90], attempted to address some short-comings of the Erdos-Renyi model in capturing real-world network properties. The model illustrates a “small world” effect [63] such that path-lengths to other vertices in the graph are small. [90] proposed that naturally occurring networks were neither completely random nor completely regular. CHAPTER 3. GRAPH MODELS 17

33

66 29 28 79

5 38 64

18 9 20 25 77 39 44 78 30 8 22 62 82 49 86 98 45 2 46 88 35 96 10 27 53 34 19 32 59 52 48 50 55 16 43 61 63 24 1 71 74 51 6 72 21 41 4 26 7 58 23 81 76 90 87

100 92 68 57 75 17 3 91 37 60 11 84 36 54 13 42 70 31 14 85 73 95 15 80

83 94 69 12 40

47 93 89 67

97 56 99

65

Figure 3.2: Erdos-Renyi G(n, M) Graph

Constructed where n = 100, m = 300, and self-edges where restricted

Networks are considered small world when the average local clustering efficient is significantly higher than that of a random graph with the same number of vertices and edges [90]. Many networks have been shown to exhibit the small-world property Watts and Strogatz [90] such as power grids, collaborative networks, and C. elegans worms. However, the model proposed by Watts and Strogatz does not generate the same degree distributions as those of real-world networks Newman [69].

Algorithm 4: Watts-Strogatz Small World Model Input: n > 0, k > 0, d > 0, 0 ≤ p ≤ 1 Data: Watts-Strogatz Graph, G = (V,E) begin G ← lattice(n, k, d); E2 ← ∅; foreach (i, j) ∈ E do if random() < p then E2 ← E2 ∪ (i, random(|V |)); else E2 ← E2 ∪ (i, j); end if end foreach E ← E2; return G end

The Watts-Strogatz model recreates the local clustering and hub formation found in real-world networks [8]. As described in Algorithm 4, the network begins as lattice. The lattice has k dimensions, each with n vertices, and each vertex has d neighbours CHAPTER 3. GRAPH MODELS 18

3 7

2 1 8 9 98 100 4 10 14 5 16 12 18 96 11 6 19 99 21

97 13 93 22 17 15 91 20 95 24 23 26 89 88 85 28 27 92 90 81 94 87 25 31 30 36 84 86 83 32 29 40 33 82 34 80 35 76 79 78 41 38 39 37 74 64 46 51 72 67 77 59 75 56 42 43 61 66 45 44 71 47 73 60 50 69 63 65 57 48 49 53 70 68 62 54 58 55 52

Figure 3.3: Watts-Strogatz Small World Graph

Generated where n = 100, k = 5, d = 1, and p = 0.05 to either side. The algorithm then proceeds to rewire each edge with probability p. Figure 3.3 provides an example of an output graph from the Watts-Strogatz model.

3.3 Barabasi-Albert Model

The Barabasi-Albert (BA) model [13] is a well known model that demonstrates pref- erential attachment whereby new entities are more likely to attach to well established entities within the network [13]. Furthermore, it exhibits scale-free growth, that is, a network with t + m vertices will have mt edges [68]. In this class of networks, it α is observed that the probability a vertex will have degree k is Pk ≈ ak + c, where α, a, c are some constants [5]. [13] found that real-world networks with scale-free distributions required both growth and preferential attachment. Their experiments with the BA model and a random network using only the growing degrees showed that the degree distributions where divergent. Algorithm 5 illustrates the BA model. As new vertices are added to the graph, existing vertices are attached to them. The probability of attachment is determined α by kj + a as found in Algorithm 5. Vertices with more existing connections have a higher probability of attaching to the new vertices. Figure 3.4 provides an output graph of the BA model. In the graph, it can be observed that there exist hubs, central vertices with a higher in-degree than other vertices in the graph. These hubs are the characteristic of the effect from preferential- attachment. CHAPTER 3. GRAPH MODELS 19

Algorithm 5: Barabasi-Albert Graph Model Input: n > 0, m > 0, α > 0, a, c Output: A completed Barabasi-Albert Network, G = (V,E) begin for i ← 1 to n do V ← V ∪ vi; for j ← 1 to i − 1 do −α Pj ← akj + c; end for for j ← 1 to m do s ← select(P ); E ← E ∪ (vi, vs); end for end for return G end

18 63 60 47

15 40 69 72 54 44 56 68 35 36 64 58 29 66 43 10

88 16 53 73

85 34 19 7 65 93 13 78

24 6 39 5 76 87 25 17 70 100 32 67 75 38 81 52 77 14 48 9 86 37

1 2 42 22 55 91 97 30 57 49 94 21 51 89 31 33 50 82 92 3 95 27 59 62 98

99 83 79 4 96 8 12 41 45 74 26 11 90 61 80 28 23 71 84 20 46

Figure 3.4: Barabasi-Albert Preferential-Attachment Graph

Constructed where n = 100, m = 1, alpha = 1, a = 1, c = 1 CHAPTER 3. GRAPH MODELS 20 3.4 Growing Random Graph Model

The Growing Random (GR) model [13] is a model similar to the Barabasi-Albert model except that it eliminates preferential attachment. This model, as described in Algorithm 6, was used to demonstrate that the scale-free property does not occur in growing networks without the presence of preferential attachment. Instead of a power- law distribution of degrees found in preferential-attachment networks, it is observed that the probability of degree, k, is exponential [13] as in Equation 3.1.

P (k) ∼ exp(−βk) (3.1)

Algorithm 6: Growing Random Graph Model Input: n > 0, m > 0 Output: Growing Random Graph, G = (V,E) begin for i ← 1 to n do V ← V ∪ vi; foreach a ∈ RandomV ertices(V − vi, m) do E ← E ∪ (vi, a); end foreach end for return G end

Examining Figure 3.5, an output graph from the GR model, the same spanning trees observed in BA graphs (See Figure 3.4) are observed. However, the high degrees hubs found in BA graphs are missing in the GR graphs. Instead, several smaller degree hubs are found scattered throughout the graph.

3.5 Ageing Preferential Attachment Model

The Ageing Preferential Attachment (APA) model [27] is similar in many respects to the BA model, but it introduces some important differences. Like the BA model, creation of edges is guided by preferential attachment. The APA model introduces vertex age to preferential attachment. The idea is that as vertices age there is a change in their probability of new attachments. de Solla Price [25] remarked that in scientific citation networks there existed a phenomenon he called the “immediacy factor.’ That is, as papers age, the number of new citations decreases because the paper is considered to be obsolete. CHAPTER 3. GRAPH MODELS 21

63 48 91 35 62 96 77 32 85 26 15 39 55 67 84 93 13 87 68 23 18 27

57 60 61 9 52 31 40 11 2 65 5 78 53 7 90 22 1 38 100 49 58 16 12 42 47 33 83 37 6 19 82 66 34 69 14 3 81 59 29 86 20 75 44 88 24 10 28 54 8 92 79 4 25 30

73 45 51 36 41 46 43

17 95 64 94 50 98 56 74 99 89 97 70 76 21 80

71 72

Figure 3.5: Growing Random Graph

Constructed where n = 100, m = 1

This attachment rule, however, has the ability to break the scale-free behaviour of the BA model when β ≥ 1 in Equation 3.2, where β, b, d are constants acting th on ti, the age of the i vertex [27]. The Ageing Preferential Attachment model, modified from the BA model, is presented in Algorithm 7. The algorithm differs from Algorithm 5 in the probability calculation and the introduction of the new variables: β (the exponent for age), b (a constant factor of age preference), and d (the zero age appeal).

α β Pi = aki + c + bli + d (3.2)

Algorithm 7: Ageing Preferential Attachment Graph Model Input: n > 0, m > 0, α > 0, a, c, β, b, d Output: A completed Barabasi-Albert Network, G = (V,E) begin for i ← 1 to n do V ← V ∪ vi; for j ← 1 to i − 1 do α β Pj ← akj + c + blj + d; end for for j ← 1 to m do s ← select(P ); E ← E ∪ (vi, vs); end for end for return G end CHAPTER 3. GRAPH MODELS 22

71 79

61 39 53 62 69 83 65

52 84 56 7 68 46 26 86 54 98 96 45 97 67 11 92 89 75 88 91 3 64 33 31 59

82 37 60 24 73 63 5 47

77 48 58 30 4

32 100 18 72 55 6 29

57 81 1 51 93 41 8 25 38 40 28

22 2 87 21 80 13 12 15 44 42 35 10 95 50 9 74 17 78 70 99 34 14 49 94 66 19 90 76 23 16 36 20

27 43 85

Figure 3.6: Ageing Preferential-Attachment Graph

Constructed where n = 100, m = 1, alpha = 1, a = 1, c = 1, beta = −1, b = 1, d = 0

Output graphs of the Ageing Preferential Attachment model, like found in Figure 3.6, generate hubs similar to those found in BA graphs (See 3.4). The spanning property observed in the growing networks exemplified in Figures 3.5 and 3.4 is also present in the Ageing Preferential Attachment. However, it is important to note that the model presented by [27] only exhibits the power-law degree distributions when the age exponent, β is less than 1.

3.6 Forest Fire Model

Leskovec et al. [54] observed that many real-world complex networks exhibit prop- erties of heavily-tailed in and out degrees, formation of communities, and increase denseness due to the power law distribution of links and a shrinking diameter. As such, they proposed a model called the Forest Fire Model, which purports to encom- pass all of these properties. Algorithm 8 illustrates the Forest Fire model algorithmically. The model begins

each iteration of graph construction with the addition of a new vertex, vi. Then, m ambassadors are selected and put into A. For each ambassador, a, an edge is created

from vi to a. Also, two numbers, x and y, are chosen randomly from a geometric p rp distribution with a means of 1−p , and 1−rp , respectively. Using x, and y, x successors and y predecessors of a are uniformly selected which are not already attached to vi.

The selected vertices are appended to A to act later as ambassadors to vi. One characteristic important to Forest Fire model is the formation of community structures, groups of vertices easily formed through dense connections. In Figure CHAPTER 3. GRAPH MODELS 23

Algorithm 8: Forest Fire Model Input: n > 0, m > 0, 1 ≥ p ≥ 0, 1 ≥ r ≥ 0 Output: A completed forest fire graph, G = (V,E) begin for i ← 1 to n do V ← V ∪ vi; A ← RandomV ertices(V − vi, m); foreach a ∈ A do E ← E ∪ (vi, a); p x ← geoP rob( 1−p ); rp y ← geoP rob( 1−rp ); W ← RandomV ertices(b ∈ V |b∈ / A ∧ (a, b) ∈ E, x) ∪ RandomV ertices(b ∈ V |b∈ / A ∧ (b, a) ∈ E, y); A ← append(A, W ); end foreach end for return G end

86 87

53 71 92 38 66 31 98 34 44 40 84 27 55 65 48 79 67 5 82 72 77 100 60 42 24 4 20 41 61 95 47 26 16 37 93 21 25 94 43 2 50 69 70 23 1 62 73 10 49 64 11 15 32 3 63 33 45

35 28 74 91 88 99 68 58 22 97 81 14 19 83 6 18 76 56 80 7 96 29 89 9 78 85 36 12

39 8

52 46 13 57 30 17 75 90 59 51

54

Figure 3.7: Forest Fire Graph

0.32 Constructed where n = 100, m = 1, p = 0.37, r = 0.37 . note: Colours represent communities. CHAPTER 3. GRAPH MODELS 24

3.7, the communities of a graph derived from the Forest Fire model is illustrated by assigning a colour to each. Chapter 4

Modelling Techniques for Graph Models of Complex Networks

A graph model provides a means of understanding the processes and behaviours that act on a specific network. It allows the exploration of the structural properties of the network and the development of predictors for the network. However, in order to construct a graph model, it is often necessary to have an intimate understanding of the structural properties and the behaviours of the network. Such a requirement means that constructing a graph model can present a significant challenge. Despite challenges in the formation of graph models for complex networks, there have been approaches proposed to construct graph models. Some of these methods exploit properties commonly found in graph models, other make assumptions about the processes that form edges within the network. Moreover, some automate the entire model construction making few assumptions about the network at all. In this chapter, a brief overview of works relating to the construction of graph models is provided. Moreover, the strengths and weaknesses of the techniques pro- posed by the works review will be presented. Finally, Section 4.5 conducts a discussion about previous methodologies in contrast to the proposed methodology of this thesis.

4.1 P ∗ Graphs

Exponential random graphs, called p∗ graphs [78], are random graphs consisting of n nodes and m dyads that can be explained by some statistics s(y). The set of dyads

(edges) are defined in a set Y = {Yi,j : i = 1 . . . n, j = 1 . . . n} where Yi,j = 1 if vertices i and j are connected via an edge otherwise Yi,j = 0. The main idea of modelling using the p∗ technique is that the observed values

25 CHAPTER 4. MODELLING TECHNIQUES FOR GRAPH MODELS 26

of structural measures on some given network only captures one of a large number of possible networks. In order to determine the dyads given n vertices, a likelihood algorithm is employed to determine the s(y) and construct the dyads sets. There exist a number of proposed methods for constructing the graphs employ- ing various likelihood algorithms. Some of these methods include loglikelihood [39], pseudolikelihood [39], and Markov chains [79]. Exponential random graphs have been demonstrated to be successful at modelling complex networks [78]. However, p∗ graphs rely heavily on local structures in networks and are therefore very slow and do not capture the network as a whole [55].

4.2 Kronecker Graphs

The Kronecker graph was proposed by Leskovec et al. [55]. The graphs produced are said to exhibit properties of real-world networks such as heavily tailed in- and out-degrees, and small diameters that shrink over time. The model employs a matrix operation called the Kronecker product. Kronecker graphs are used to model networks by constructing a Kronecker matrix containing parameters for the graph model constructed. The parameters are refined using a maximum likelihood algorithm which runs in linear time. Such a runtime is considerably quick considering prior efforts at constructing maximum likelihood algorithms produced asymptotic times that were quadratic [55]. The modelling of a network using the Kronecker graph model begins with an initial

graph K1 containing N1 vertices and E1 edges. Recursively, larger graphs K2,K3,... k k are produced such that Kk has N1 vertices and E1 edges. This process creates the Densification power-law observed in real-networks [53] and is accomplished using the Kronecker product. There are two types of Kronecker graph models, the first constructs a graph by successively applying the Kronecker product to the adjacency matrix of the graph

N1. This model is known as the deterministic Kronecker graph model. The other, the stochastic Kronecker graph model, applies the Kronecker product recursively to a probability matrix N1. The matrix contains cells such that a cell i, j is the probability of forming an edge between vertices i and j. The stochastic version of the Kronecker graph models has been applied to create supercomputer benchmarks for graph algorithms [72]. The model has also been shown to replicate real-world networks [51]. Furthermore, the model is simple and fast. How- ever, it has been suggested that this model has a limited range degree distributions CHAPTER 4. MODELLING TECHNIQUES FOR GRAPH MODELS 27

[72]. Moreover, the model relies on the assumptions of power-law distributions and the Densification property.

4.3 Chung-Lu graphs

The Chung-Lu graph model [2] constructs graph with a degree distribution given by α and β such that y vertices of degree x satisfy

log y = α − β log x (4.1)

In general, to construct a graph the model uses the sequence [3]

1. Form a set L of x distinct copies of vertex v

2. Choose, randomly, a matching of elements from L

3. For vertices u and v, the number of edges formed between u and v is equal to the number of edges in the matching of L

Aiello et al. [4] present four variations of the Chung-Lu graph model. The first model(Model A), is considered the basic model and forms the basis of the other three out models. At time t, the out-weight of u is computed as 1 + δu,t and the in-weight of in v, 1 + δv,t. An edge is then formed between u and v with the probability of

(1 + δout)(1 + δin ) α u,t v,t (4.2) t2 The second model (Model B) adds controls that allow independently controlled in- and out-degrees via the use of new parameters. Model C adds new controls that allow edge creation via four different mechanisms. These are

1. Edge from u and v randomly selected by probabilities based on the in-degree of u and the out-degree of v

2. Randomly selected vertex u connected to the newly formed vertex where u is selected probabilistically based on its out-degree.

3. A newly formed vertex is connected to a randomly selected vertex u such that its selection is probabilistically based on its in-degree.

4. A loop to the new vertex. CHAPTER 4. MODELLING TECHNIQUES FOR GRAPH MODELS 28

The final model (Model D) is a variant of Model C such that it constructs undi- rected networks. All models construct power-law distribution graphs with large com- ponents in the graph and utilize parameters to fluctuate the degree distributions. It is said that the range of degree distributions obtainable from these models are greater than those which can be produced by stochastic Kronecker graphs [72]. Furthermore, [72] argues that the optimization of Chung-Lu graphs is simpler than that of a Kronecker graph. Regardless, this model assumes that complex networks follow a power-law distribution.

4.4 Evolutionary Modelling Strategies

In this section, two methods of modelling are presented which utilize evolutionary strategies, namely Genetic Programming (GP). These strategies are unique to the previous strategies and make fewer assumptions about the network being modelled [62]. While both methods employ GP, they do so in very different way. The first method, proposed by Bailey et al. [11], uses GP to construct a complete program which when run constructs a graph. Generated models are evaluated based on their ability to reproduce specific network properties. These measures are the average geodesic path length, transitivity, and degree distribution of the network. A graph is provided as an example to the system and used to compare the output graphs of an evolved program. The methods proposed by Bailey et al. [11] provided the first attempts in automation of graph modelling using GP. However, the method utilized priority queues [10], and community detection algorithms Bailey et al. [12] to construct some types of networks. Another method, recently proposed in [62], used symbolic regression to construct networks. The work proposed a model such that at each time step a vertex was added to the graph and an edge was added to the graph probabilistically from the possible edges not already constructed. The probability of an edge being formed from u to v was determined by a probability function which was evolved by the GP. Experiments were performed on this method using the Erdos-Renyi graph model and the Barabasi-Albert graph model.

4.5 Discussion

Many of the proposed methodologies for generating graph models infer that specific properties exist in complex networks. For example, Leskovec et al. [55] explain that CHAPTER 4. MODELLING TECHNIQUES FOR GRAPH MODELS 29

“Real networks exibit a long list of surprising properties: Heavy tails for the in- and out-degree distribution, heavy tails for the eigenvalues and eigenvectors, small diameters, and densification and shrinking diameters over time.” Despite numerous examples [6, 13, 23, 67] which indicate that such properties hold, there exists no means to conclude that this is the case in general. A contradiction is found in the Ageing Preferential Attachment model, in the model there are situations where the power-law degree distribution does not hold [27]. Despite the specificity of the properties modelled by parametrized models of com- plex networks, they are important and provide a practical means of evaluating the theory in complex networks. Their existence suggests that knowledge about complex networks can be used successfully to construct accurate graph models. On the other hand, GP methods for the automatic creation of graph models offers an attractive means for the construction of graph models that do not require the assumptions made by parametrized models. The evolutionary strategies by Bailey [10] do not require that a graph model have a heavily tailed degree distribution. Evidence for this is found in the ability to replicate the Erdos-Renyi model where the degree distribution is very different from the scale-free property observed in other networks. The symbolic regression method proposed by Menezes and Roth [62] takes a step back further from any assumptions of graph models. Graphs are constructed based on a probability formulas constructed by the system. However, the previous evolutionary strategies do not make use of what is known about a complex network being modelled specifically with respect to the process of the network. This means that regardless of what is known about the processes of a network being modelled the GP is made to determine all processes in the network. No matter the method, modelling constructs only one of many possible ways of constructing a graph model and do not take into account the semantics of the network at hand [78]. While this thesis does not directly experiment with encoding the se- mantics of complex networks into solutions, it does leave room for such constructions. Moreover, it addresses the lack of expert knowledge encoded into GP methodologies via the proposal of a novel GP technique. Thus, the framework offered by the pro- posed methodology of this thesis (See Chapter 7) attempts to build upon the strengths of all modelling techniques. The framework facilitates the research to define the algo- rithmic structure of the graph model and, thereby, reduces the responsibility of the GP to evolve all processes of the graph model. Furthermore, it maintains the ability to construct a very diverse set of graph models as the expert knowledge provided to the system is varied. Chapter 5

Genetic Programming

Genetic programming (GP) is a computational intelligence technique inspired by Dar- winian evolution and introduced by Koza [43]. In genetic programming, a population of programs is randomly created and evolved using a survival of the fittest approach. That is, programs that more closely meet a desired criterion have better odds of passing their genetic code to the next generation. GP has been used in a number of different applications such as curve fitting [9, 74, 87], the development of circuits [45], evolutionary art [16], modelling [11, 80], and the construction of artificial agents [41], among many others [45, 74]. In its original form, as proposed by Koza [43], GP forms tree-based programs which utilize a functional programming paradigm. However, this is not the only representation for programs in GP. In Sections 5.2, 5.3, and 5.4, a brief review of three of forms of GP representation will be given. In Section 5.2, the tree-based form will be discussed. Section 5.3 will provide an alternative to tree-based GP, linear GP. Finally, Section 5.4 provides a background on Object-Oriented GP, the paradigm adapted by this thesis.

5.1 The Genetic Programming Algorithm

While there are many approaches to GP, they all share a very general algorithm called an evolutionary algorithm. This is not to be mistaken with the related evolutionary computational technique, Genetic Algorithms. Instead, evolutionary algorithms, in the context of this section, refers to the general algorithm that loosely describes the processes of both genetic programming and Genetic Algorithms. In the evolutionary algorithm [35], a population (or sometimes populations) of programs are stochastically transformed to build new programs. Algorithm 9 provides a loosely constructed

30 CHAPTER 5. GENETIC PROGRAMMING 31 evolutionary algorithm which describes the processes of the Genetic Algorithm.

Algorithm 9: The Evolutionary Algorithm begin randomly create a population of individuals; while acceptable solution not found or other stopping criteria not met do Determine the fitness of each individual in the population; Probabilistically select individual(s) from the population base on fitness; Create new individual(s) using a genetic operator and selected individual(s); end while return best solution end

Algorithm 9 begins with the random creation of a population of individuals. In GP, this is a population of randomly constructed programs [74]. The construction of programs is dependant upon the representation of the GP. In the later sections of this chapter, some representations available in GP will be discussed. Algorithm 9 also describes at each iteration (or in Genetic Algorithm terms, generation) that the fitness of each individual must be evaluated. In GP, this is accomplished by executing each program and evaluating its result [74].

5.1.1 Selection

Selection of individuals, the next step in Algorithm 9, is determined by a probability based on fitness. That is to say, the probability of selecting a highly fit individual should be greater than the probability of selecting an individual who is much less fit. Two methods for selecting individuals probabilistically are tournament selection and roulette selection. In tournament selection, for each individual selection k individuals are randomly chosen from the population and the fittest wins and is selected. roulette selection selects individuals by determining their proportional fitness in the population and probabilistically selecting an individual based on their proportional fitness.

5.1.2 Genetic Operators

After having selected individuals. The operations that will be performed on a se- lected individual is decided by a probability of performing the genetic operator. The CHAPTER 5. GENETIC PROGRAMMING 32

probability of each operator’s use is a parameter of the Genetic Algorithm. There are two major operators which are used: crossover and mutation. crossover is the recombination operator which takes, usually, two individuals and transforms into one or more new individuals. The common idea of all crossover oper- ations is that genetic information is taken from multiple individuals and recombined. mutation is the application of random changes to an individual. This operation, usually, has a low probability of occurring and when it does occur only small changes are made to the individual. In the case of both operations, their implementation is determined in part by the representation of the GP system at hand. As it will be demonstrated in the following sections of this chapters, the genetic representations of each approach reviewed are very different and therefore require different implementations of genetic operators that can work with the representation being employed.

5.2 Tree-Based Genetic Programming

Genetic programming is a tree-based system and was originally popularized by Koza [43]. In the vanilla GP proposed by Koza [43], functions used by a GP system were singly-typed, i.e., all elements of the program had the same type. In this form, the programs are comprised of functions and terminals. Functions are elements which have arguments and terminals are elements which do not. Figure 5.1 provides an example of a tree-based program typical of tree-based GP. In the program a function 2 is constructed that is equivalent to the expression x × 3x + x . In a tree-based genetic program, crossover operations are used to recombine two programs to create two new programs. crossover is, usually, performed by selecting a subtree from each parent and swapping them to form two new programs, the children. This process is illustrated in Figure 5.2. A variant of vanilla genetic programming, strongly-typed GP [65], introduces the ability to represent more sophisticated function definitions by facilitating the con- struction of program trees with more than one type. Instead of the set of functions and terminals all having the same type, the set contains functions and terminals which do not all have the same type. Also, the arguments of functions will require specific types. With this special consideration must be taken into account when constructing program trees. Some users of GP utilize the strongly-typed GP system to control the follow of the program structure in GP. Recognizing this representation as cumbersome, some CHAPTER 5. GENETIC PROGRAMMING 33

Figure 5.1: Singly-typed Tree-based Program

Figure 5.2: Tree-based GP Crossover Operation CHAPTER 5. GENETIC PROGRAMMING 34 researchers developed grammar-guided GP systems (GGGP) [18, 32, 92]. This form of tree-based system allowed the construction of programs that were built using context- free grammars (CFGs). Using GGGP, it is also possible to define sub-functions Wong and Leung [94]. One of the main role that GGGP systems serve, however, is to reduce the search space of the problem [59].

5.3 Linear Genetic Programming

Linear genetic programming [74] is an alternative to tree-based GP. Linear GP, like tree-based GP, produces a population of candidate programs to be evolved. However, programs in linear GP are represented as a linear array of instructions [19], rather than a tree structure. It is important to understand that linear refers to the structure of the program representation as an imperative representation and not a linear list of nodes [20]. In this sense, the building blocks of a linear GP system consist of registers (variables) and instructions. Many techniques have been proposed for linear GP [71], each with advantages and disadvantages. However, an important advantage of linear GP over tree-based GP is the ability to vary the destructiveness of crossover and mutation [19]. One approach to interpreting the chromosome is to use a function map. The chromosome representation in this approach is a linear array of bytes like in Figure 5.3. The interpretation of the bytes is determined by a table of instructions, for example, Table 5.1, which are assigned to the bytes. Similar to the representation of assembly code.

Figure 5.3: Chromosome Comprised of an Array of Bytes

Byte Code Instruction 0001 1 0110 Add 1001 9 1011 x 1100 Subtract 1111 Multiply

Table 5.1: Example of Byte to Instruction Mapping CHAPTER 5. GENETIC PROGRAMMING 35

Interpreting Figure 5.3 via Table 5.1, the chromosome’s program is + − 9 × xx1 or using infix notation 9 − x2 + 1. However, if not careful both the table used and the configuration of the chromosome will result in an infeasible program construction. The solution to this is to use a grammar that guides the construction of the chromosome. Notably, linear GP systems are said to be fast systems as they consist of low-level simple instructions and written in languages such as C and C++ [20]. However, this is not the only reason that linear GP is considered in some applications of GP. It has been shown by Langdon and Banzhaf [50] that dead code is more readily identified in linear GP than in tree-based GP.

5.4 Object-Oriented Genetic Programming

In the previous sections, discussion surrounded GP systems which produced programs which had one behaviour, i.e. a single function. However, in practice, programming to solve a problem involves multiple functions and data structures. It is possible to evolve complex programs like this using other forms of GP [44, 64, 94]. However, it is also possible to construct representations of GP as objects. object-oriented genetic programming (OOGP) provides a framework with which multiple functions and/or data structures are evolved. Object-oriented programming is a widely used programming paradigm where types are represented as entities with behaviours. In the object-oriented paradigm, instances of types, known as objects, are mutable by way of methods that are anal- ogous to behaviours. The OOGP methodology [1] provides a framework to produce programs in such a paradigm. Past approaches with OOGP made use of a multi-tree representation, whereby a chromosome’s consisting of several trees, each correspond- ing to a single method, are evolved. The multi-tree representation used by OOGP allows the simultaneous optimization of multiple methods, leading to the evolution of more complex programs. An example of the application of OOGP can be drawn from [49]. In the work by Langdon [49], OOGP was employed to evolve data structures such as a stack. A stack has multiple functions such as pop and push. Simultaneously evolving both of these behaviours in conventional approaches to GP is not simple as traditional approaches such as tree-based and linear GP systems are designed to evolve a single function. Therefore, Langdon [49] proposed a representation that consisted of multiple trees; one for each function of the stack, for instance. Other approaches have also been proposed [1, 21, 31]. However, none of these CHAPTER 5. GENETIC PROGRAMMING 36 proposed methods seem to have been widely adopted as searches on the topic do not reveal much further uses beyond initial proposals of these systems. This is un- fortunate, as these systems facilitate an incorporation of additional knowledge not easily conveyed in other approaches. OOGP allows a specification of the behaviours of an object to be easily constructed which then in turn can be used to produce data structures that have multiple behaviours. In the research of this thesis, such a structure has been adopted. A graph model is a data structure which has a set of processes that construct a graph. Therefore, a specification of the model can be constructed, and the implementation evolved from the specification. However, the current OOGP approaches available employ tree-based approaches. As explained in Section 5.2, tree-based approaches are often difficult to use as representation of the problem can be difficult. Furthermore, the crossover processes of tree-based GP is limiting and destructive [83]. Thus, we arrive at the motivation for this thesis in the proposal of a new OOGP system (see Chapter 6) that employs a linear-based OOGP and readily incorporates expert knowledge. Chapter 6

LinkableGP – An Object-Oriented Linear Genetic Programming System

LinkableGP is an Object-Oriented Genetic Programming (OOGP) System that is designed to incorporate expert knowledge. It does so by utilizing an abstract class definition. The abstract class also provides a means of defining the structure of the programs evolved and allows some functions of the evolved program to have implementations a priori. In the sections to come, the systems construction will be discussed in greater detail describing how expert knowledge is incorporated. Furthermore, the structure and representation of individuals as well as the genetic operators used during evolution will be described.

6.1 Facilitation of Expert Knowledge

To facilitate expert knowledge in GP, proposed is a novel object-orientated linear GP system, LinkableGP. LinkableGP makes use of 1) an object-orientation method- ology to construct class-based programs and 2) linear GP to construct imperatively- defined methods within the class. As a result, LinkableGP allows the use of partially- implemented classes, i.e., abstract classes, whereby the user may incorporate their domain knowledge of the problem. The knowledge is embedded into the abstract class by providing implementation of methods where the desired functionality is known a priori. The remainder of the functionality is evolved using LinkableGP’s evolutionary

37 CHAPTER 6. LINKABLEGP 38

strategies. Finally, LinkableGP produces a .NET-based1 source file corresponding to the evolved implementation. The resulting source file is structured as expected for object-oriented, imperative style .NET source code.

6.2 Motivation

The primary motivation for allowing expert knowledge encoded within an evolved program stems from the difficulties experienced when controlling the flow of programs using traditional GP approaches. When a program that has several logical stages is evolved using traditional GP approaches, often it is necessary to construct a collection of rules for the set of functions and terminals [58]. Handling of the rules is typically managed by either employing a grammar to define valid program states or via the manipulation of types. However, decomposition of a program into logical methods, each with a set of functions and terminals specific to the associated task, will result in a more natural program structure. Furthermore, interpretation of the algorithms produced by GP is often difficult due to the unintuitive program structure evolved. That is, code produced by GP is typically dissimilar to that of human-created code. Therefore, it is desirable to generate programs in a paradigm that is both familiar and intuitive. Having such programs facilitates expert evaluation of the outcome and allows for refinement of the strategies employed.

6.3 Structure and Representation

Unlike a traditional GP tree structure, which represents a single function, individuals in the LinkableGP population represent types or classes. The classes are implementa- tions of a parent type. This parent type is an abstract class which allows the user to encode expert knowledge elegantly and in a manner familiar to most programmers. The structure of individuals within a population in LinkableGP is inspired by both linear GP and OOGP representations. Furthermore, individuals are constructed using a collection of chromosomes, analogous to a DNA structure. The DNA structure of an individual defines the physical implementations of abstract methods from the parent type. That is, each chromosome corresponds to the code sequence of a single abstract method defined in the parent type. Each chromosome’s genotype is an array of integers, which directly corresponds to a code sequence that constructs a single

1Refers to Microsoft’s .NET framework CHAPTER 6. LINKABLEGP 39 method in the resulting phenotype. A visualization of the chromosome structure and genotype to phenotype process is presented in Figure 6.1.

Figure 6.1: Visualization of the genotype to phenotype mapping for an individual in the LinkableGP system.

6.3.1 The Abstract Class

The abstract class in LinkableGP serves as an embryo for each individual. The class consists of any number of implemented or abstract methods, as is expected of an abstract class. Those methods that are marked abstract are to be implemented by the individuals and evolved by LinkableGP. That is, the genotype of each individual is defined by the number of abstract methods. The phenotype of the individual is the implementation of the abstract class as defined by the genotype.

6.3.2 The Genotype

The genotype of an individual is a collection of integer arrays, called chromosomes, one for each method marked abstract by the defining abstract class. The chromosomes are of variably-lengthed arrays to allow different lengths of programs. Each value in the chromosomes represents an element in the construction of the abstract method it is mapped to. The process that governs the conversion between chromosome to method implementation is explained in Subsection 6.3.4.

6.3.3 The Phenotype and the Language

The phenotype of an individual is the implementation of the abstract class. It is resulting source of the program that has been generated by the individual through the mapping of the genotype to the phenotype. In order to build a method, a language (i.e., a set of functions, constants, and constant generators) must be defined. Each CHAPTER 6. LINKABLEGP 40 method utilizes a language set as the functions, constants, and other constructs for building one method and is not necessarily useful to another. The language, however, is not static. Dependent on the context built-in mapping the genotype, new values may result. For example, if a function that is used in the method being implemented has a result either a mutable variable is selected from the language or a new one might be created. If a new variable is created, it is added to the instance of the language that is being used by this implementation. This change in context, however, does not prevail or transfer between chromosomes.

6.3.4 Mapping Genotype to Phenotype

The mapping of a genotype to phenotype transforms each chromosome in the geno- type to a method in the phenotype. The process of mapping a chromosome to a method constructs single instructions iteratively. During the mapping of the chromo- some, an index is maintained to the current position in the chromosome array. The index is iterated each time a chromosome value is used during construction. The first step in creating an instruction is to select a function from the language. The selection of a function from the language is restricted to those functions whose arguments can be immediately satisfied by the available variables in the variable bag. To determine which function is used from the subset, the next chromosome value mod the cardinal- ity of the subset is computed. The computed value provides the index of the selected function from the subset. Given the function for the instruction, the arguments of the function must now be satisfied using variables from the variable bag which have the same type as the argument. The selection of the variable is determined by the computed value of the next chromosome value mod the cardinality of the subset of variables. The selected variable is then the variable at the index of the computed value in the subset. The assignment of variables to arguments continues until all arguments are satisfied. If the function in the current instruction has a result type that is not void, a variable must be selected to store the result. Like with arguments of a function, a subset is formed from the variable bag such that it contains only those variables which have the same type as the result type. However, there is an added constraint to variables in the subset that they also be mutable variables. The only variables which are mutable in LinkableGP are those that are constructed locally during mapping. Before a variable is selected from the subset, a new variable is added to the subset with the same type as the result type of the function. Then, as before, the next chromosome CHAPTER 6. LINKABLEGP 41

Figure 6.2: Example Chromosome value mod the cardinality of the subset is computed. The selected variable for the result of the function is the variable at the index equal to the computed value from the subset of variables. If the variable selected is the newly created variable it is added to the variable bag. Since this variable was constructed locally, it will be marked as mutable and, therefore, usable as for future assignments in the construction of the method. The process of constructing instructions continues until there the end of the chro- mosome is reached. If the end of a chromosome is reached during the construction of an instruction then, the chromosome array is extended with random values from an uniform distribution to allow the completion of the instruction. This extension remains as a part of the chromosome. If the method being constructed has a return value then, one chromosome value is reserved at the end of the chromosome array to allow the selection of the return variable from the variable bag. This value is selected, as before, by constructing a subset of variables that have the same type as the return type of the method. The variable is selected by computing the final chromosome value mod the cardinality of the subset and selecting the variable at the index equal to the computed value. If the subset is empty, an instruction is constructed using the same method as describe for constructing a typical instruction with the added requirement that the select function has the same return type as the method. In this case, the chromosome is extended sufficiently to allow the completion of the return instruction.

Example

In order to clearly understand the mapping process the following example is provided. Assume an abstract class that contains only one method that is marked abstract and has the signature void MagicNumber(x:Integer, y:String). A set of functions for the language in the example is provided in Figure 6.1. The variable bag will initial only contain inputs variables x and y which are marked as immutable. The method will be mapped given the chromosome found in Figure 6.2. CHAPTER 6. LINKABLEGP 42

Function Set Function Name Arguments Return Type Add a:Integer, b:Integer Integer Sub b:Integer, b:Integer Integer ToString a:Integer String Print a:String void ToInteger a:String Integer

Variable Bag Variable Name Mutable Type x False Integer y False String

Table 6.1: Initial Example Language

The first instruction of the phenotype will use the chromosome values at indices [0, 2] which are {29, 46, 124}.2 First the function is selected from the function set. Given the variable bag, all arguments of all functions are satisfiable, so the subset of functions is the complete set of functions in the language. The cardinality of the set is 5 and so the index of the function selected is 29 mod 5 = 4 which is the ToInteger function. The type of the only argument for the ToInteger function is a String, so the subset of variables from the variable bag is {y}. The variable y is selected as the actual parameter of the function because 46 mod 1 = 0. This function has a result variable so we consider the set {a} as the possible variables for assignment of the result for the ToInteger instruction. The variable a, is a new variable which is not added to the variable bag unless it is selected.3 Since the subset of variables from the variable bag that are mutable and of type Integer is an empty set, no other variables are added to the set. The variable which will store the result is a because 124 mod 1 = 0 and there is only the one variable. After one instruction, the language is updated to match Table 6.2. The next instruction in mapping the chromosome utilizes the chromosome values at indices [3, 6] which are the values {90, 257, 19, 13}. The subset of functions usable for this instruction, as was the case with the first instruction, is the complete function set of the language. The index of the function for this instruction is computed as 90

2Please remark that at this point in the system it is not known how many chromosomes will be required to for an instruction. However, for the purpose of determining where values are obtained in the chromosome the values used for each instruction are provided before explaining the result. 3Variable names are determined as the next available letter in the alphabet which does not mask any other already used variable in the scope of the method. If z is reached in variable naming the convention is to append is to continue with (aa, ab, ac, . . . , az, ba, . . . ) CHAPTER 6. LINKABLEGP 43

Function Set Function Name Arguments Return Type Add a:Integer, b:Integer Integer Sub b:Integer, b:Integer Integer ToString a:Integer String Print a:String void ToInteger a:String Integer

Variable Bag Variable Name Mutable Type x False Integer y False String a True Integer

Table 6.2: Example Language after one instruction mod 5 = 0, thus, the function for this instruction is Add. The add instruction has two arguments both of which are Integers. Therefore, the subset of the variable bag is {x, a} for both arguments. The variable selected for the first argument is a (257 mod 2 = 1) as is the second argument (19 mod 2 = 1). The set of variables considered for the result of the function, Add, is {a, b} such that b is a potential new variable. The computed value, 13 mod 2 = 1, determines that b is selected to store the result. As a result of this instruction, the language is updated to reflect the new variable b as seen in Table 6.3. Function Set Function Name Arguments Return Type Add a:Integer, b:Integer Integer Sub b:Integer, b:Integer Integer ToString a:Integer String Print a:String void ToInteger a:String Integer

Variable Bag Variable Name Mutable Type x False Integer y False String a True Integer b True Integer

Table 6.3: Example Language after two instructions

The next instruction uses the chromosome values from indices [7, 10] which are CHAPTER 6. LINKABLEGP 44

{51, 7, 780, 362} (See Figure 6.2). This instruction uses the function Sub as the index of the function from the subset of functions is 51 mod 5 = 1. The arguments selected from {x, a, b} for the function’s invocation are a (7 mod 3 = 1) and x (780 mod 3 = 0). The result of the function is assigned to the variable a (362 mod 2 = 0) from the set {a, b}. After this instruction, the language as provided in Table 6.3 is unchanged. The fourth instruction in the mapping of the examples chromosome uses the chro- mosome values at indices [11, 13] which are {12, 8, 177}. This function results in the use of the function ToString (12 mod 5 = 2) using the argument b (8 mod 3 = 2) from the subset of the variable bag {x, a, b}. The result of the ToString is assigned to the variable c (177 mod 1 = 0) from the set {c} which contains only a new variable as the variable bag does not contain a mutable String. The language is updated to include the new variable c and is provided in Table 6.4.

Function Set Function Name Arguments Return Type Add a:Integer, b:Integer Integer Sub b:Integer, b:Integer Integer ToString a:Integer String Print a:String void ToInteger a:String Integer

Variable Bag Variable Name Mutable Type x False Integer y False String a True Integer b True Integer c True Integer

Table 6.4: Example Language after four instructions

The final instruction in this example uses the last two values of the chromosome from Figure 6.2 which are {103, 9}. The selected function for this instruction is determined by the value 103 mod 5 = 3 and is Print. This function, unlike the functions previously used, has a void return type. Thus, only the single argument is required to be assigned. For the example this is the String c selected using the value 9 mod 2 = 1 from the set {y, c}. At this point in the example, all chromosome values have been exhausted. Thus, the mapping of the chromosome is complete. The resulting implementation of the chromosome for the method, void MagicNumber(x:Integer, y:String), is found in CHAPTER 6. LINKABLEGP 45

Algorithm 10.

Algorithm 10: Resulting Algorithm of Example Genotype-Phenotype Mapping Function MagicNumber(int x, int y) a=ToInteger(y); b=Add(a, a); a=Sub(a, x); c=ToString(b); print(c);

6.4 Operations

LinkableGP employs a typical, generational genetic algorithm that performs crossover, mutation, elitism, and selection operations. However, the crossover and mutation op- erations are modified to work with the DNA structure. A proportion of the best performing individuals from the previous generation is directly copied to the new population, via elitism while the remainder of the new population results from off- spring of the crossover operations.

6.4.1 Crossover

Crossover, visualized in Figure 6.3, is a two-phased operation. Crossover, as per- formed in LinkableGP, results in the placement of a single new chromosome in the new population constructed in a generation. The first phase referred to as mating, selects two parents using a standard tourna- ment selection operator and constructs a bit-mask to determine the chromosomes that are to be inherited from each parent. Each bit in the randomly generated bit-mask acts as parent discriminator. The chromosome value at index i is selected from parent 1 or parent 2 based upon whether the bit at index i is a 0 or 1. The mating phase is depicted in Figure 6.3a. For some chromosomes of an individual, this might be the only phase of the crossover performed. A random value, R1 is selected uniformly and if the value is greater than ρ the second phase will not occur. During the second phase called mixing, a chromosome in the child is replaced by the offspring resulting from a one-point crossover operation between the parents. A random cut-point is selected within the shortest parent chromosome. The mixing phase occurs in one of two ways, with equal probability: 1) the genetic material up to the cut-point is taken from parent 1 while the genetic material after the cut-point CHAPTER 6. LINKABLEGP 46

is taken from parent 2, or, 2) the genetic material up to the cut-point is taken from parent 2 while the genetic material after the cut-point is taken from parent 1. The

selection of the mixing method is determined by the random value, R2 If R2 < 0.5 the first method is used; otherwise the second method is used. The mixing phase is depicted in Figure 6.3b.

(a) Phase 1: Mating. A mask value of 0 denotes the corresponding chromosome is inherited from parent 1 while a mask value of 1 denotes the chromosome is inherited from parent 2.

(b) Phase 2: Mixing, with ρ = 0.5. R1 < ρ denotes the mixing phase occurs. R2 < 0.5 denotes the first portion of the genetic material is taken from parent 1, while R2 ≥ 0.5 denotes the first portion of genetic material is taken from parent 2.

Figure 6.3: Visualization of the two-phase crossover operation in the LinkableGP system.

6.4.2 Mutation

Mutation in LinkableGP is done either by extending/contracting the length of a chromosome or by randomly changing values within a chromosome. There is a chance of mutation for every chromosome of every individual added to the population in a generation. Any chromosome may be affected by any combination of either mutation approach. In the extension/contraction mutation, a chromosome has either an integer ap- pended to it or removed from it. The operation performed is selected uniformly. The chance of any chromosome of individual having this operation perform is determined by the system parameter m1.

Alternatively, with probability m2 a chromosome of an individual could have a value within its array randomly changed. The element that is mutated is determined uniformly as is its replacement value. Chapter 7

Proposed Methodology

This thesis proposes a methodology for the automatic inference of graph models for directed complex networks using Object-Oriented Genetic Programming. This approach both facilitates and benefits from the incorporation of expert knowledge. Thus, the methods proposed in this chapter utilize the knowledge gained from the examination of existing graph models representing complex networks. In Chapter 3, several models were examined which provide inspiration for the methods described in this chapter. The examination of these models offers insights that will form the basis of an abstract graph model (see Section 7.1) that will be used by LinkableGP to implement models targeted. Also from the examination of the models, a collection of function sets (the language) will be proposed for LinkableGP, which will provide the building blocks for the implementation of the abstract methods. Details of the make-up of the language will be provided in Section 7.2. When generating a graph model, there is a target complex network. However, if the complex network was known then modelling it would be trivial. Therefore, sample data must be relied on. Using a representative graph, a target graph, from the network being model provides such sample data. In order for LinkableGP to evaluate how well a program is modelling the complex network at hand, graphs must be produced by the program. The graph produced by the program can then be evaluated, statistically, to the target graph. The evaluation will provide LinkableGP an indication of the fitness of the program. In Section 7.3 the methods for evaluating the graphs produced by programs will be explained in detail. Armed with an abstract class, language, and fitness function, the inference of a graph model begins with the construction of a population of programs. Then each program is evaluated against the target graph and ranked relative to the population.

47 CHAPTER 7. PROPOSED METHODOLOGY 48

The best-performing programs are immediately moved to a new population. In order to complete the construction of a new population, genetic operators are applied to the existing population to form new programs. The completed new population then replaces the old population, and the process begins again from evaluating and ranking. The algorithm continues the process of construction populations, evaluating and, ranking for some prescribed number of iterations. The final result is a graph model which should model the targeted network.

7.1 Abstract Graph Model

In order for LinkableGP to exploit utilize knowledge about the behaviour of complex networks, an abstract class definition is required as a partial representation of the knowledge. In this section, one representation will be proposed. It is stressed that this is only one of many possible representations, and the focus of this representation is relevant to the models evolved in Chapter 8. Algorithm 11 presents the abstract class that will be used in this thesis. The first function, GenerateModel, serves as the calling function to generate a graph produced by an implementation of the class. In this function, construction of a graph begins with the creation of an initial graph. For the purposes of this thesis, this is always an empty graph. The body of the function in Algorithm 11, iterates over t time steps, each time adding a new vertex, v, to the graph. The iterations are done this way because all the complex networks experimented with in Chapter 8 follow a growing behaviour (assumed to be one vertex each iteration). After vertex addition, a collection of vertices, W , are selected to form new edges with the new vertex. However, edge creation has the possibility of being probabilistic and thus there exists a possibility that no edge will be formed. The process of selecting vertices for edge creation is something observed in all the graph models found in Chapter 3, however it is a step that is performed differently depending on the network. Therefore, this function, named SelectVertices, is marked abstract and to be evolved by LinkableGP. After the initial creation of W in the function, GenerateModel in Algorithm 11, an iteration over the W vertices are performed. In this iteration, the current vertex w from W and v are passed to a function AddEdge which will return an edge. However, this edge might be an empty edge which gives the effect of no edge being added. This function is marked abstract and is evolved by LinkableGP because the criteria in which vertex addition and the direction of the edge is unknown and dependent on CHAPTER 7. PROPOSED METHODOLOGY 49

Algorithm 11: Abstract Graph Model for Complex Networks Function GenerateModel(int t)return graph G ← InitialiseGraph(); for i ← 1 to t do v ← V ertex(); W ← SelectV ertices(G); while |W | > 0 do w ← Next(W ); E ← E ∪ AddEdge(v, w); SecondaryActions(W, w); end while V ← V ∪ v; end for return G abstract Function SelectVertices(Graph g)return vertices abstract Function AddEdge(Vertex v1,Vertex v2 )return edge abstract Procedure SecondaryActions(Vertex v, Vertices V ) the network. In the Erdos-Renyi model, the creation of edges is probabilistic, while the Growing Random model’s edge creation given two vertices is definite. There- fore, LinkableGP should be responsible for determining how edge creation should be performed. In the same iteration where the AddEdge function is called, a procedure, Sec- ondaryAction, is also invoked with the arguments W , the collection of vertices being iterated, and the vertex, w. The purpose of this procedure is to perform any other actions that happen as a consequence of the edge creation of w and v. In the Forest- Fire graph model, this would be where some neighbours of w not already visited by v (i.e., not already added to W ) would be appended to W . As this procedure is not apparent in all graph models found in Chapter 3 and would be dependent on the network being modelled, it is marked abstract.

7.2 GP Language

The functions SelectVertices, AddEdge, and SecondaryActions that were marked as abstract in the abstract class defined in Section 7.1 each required a function set for use in LinkableGP. In order to describe all the functions and terminals used by each abstract function, like functions and terminals are grouped, for convenience, into the CHAPTER 7. PROPOSED METHODOLOGY 50 following collections.

• Arithmetic functions

– Func Add(f:Func, g:Func) – Func Sub(f:Func, g:Func) – Func Mult(f:Func, g:Func) – Func Constant(c:double) – Func Pow(f:Func, g:Func)

• Vertex property evaluators

– Func InDegree() – Func OutDegree() – Func Degree() – Func Age() (only in Ageing Preferential Attachment)

• Collection providers

– TabooVertexCollection GetRandomQueue(g:Graph, n:int) – TabooVertexCollection GetRouletteQueue(g:Graph, n:int, F:Func¡Vertex,double¿) – TabooVertexCollection GetRandomStack(g:Graph, n:int) – TabooVertexCollection GetRouletteStack(g:Graph, n:int, F:Func¡Vertex,double¿)

• Edge creation functions

– Edge CreateEdge(v:Vertex, w:Vertex) – Edge CreateEdgeWithProbability(v:Vertex, w:Vertex, prob:double)

• Vertex providers

– void AddPredecessor(V:TabooVertexCollection, n:int, a:Vertex) – void AddSuccessor(V:TabooVertexCollection, n:int, a:Vertex) – void AddNeighbour(V:TabooVertexCollection, n:int, a:Vertex)

• Random value providers

– int GetRandomValue(min:int,max:int) CHAPTER 7. PROPOSED METHODOLOGY 51

– int GetGeometricValue(prob:double) – int GetBernoulliValue(prob:double)

• Constant Generators

– Floating-Point Number Generator – Integer Number Generator

The collection of arithmetic functions consists of operations for constructing ex- pression that evaluate a vertex. These expressions are Add, Multiply, Constant, and Pow. Add and Multiple both take two expressions and add/multiply their results. Pow evaluates two expressions and uses the first expression as the base, b, and the second as the exponent, e then evaluates be. Constant takes in a value, returning the same value and represents a terminal in the expression tree allowing a value to be converted to an expression. The conversion is important so that Add, Multiply and Pow functions can use constants. The collection of vertex property evaluators serves as a collection of terminals used in co-operation with the arithmetic functions previously described. The vertex properties consist of local properties of vertices. These are the in- and out-degrees of the given vertex as well as the combined in/out degree of the vertex and its local transitivity. When combined with the arithmetic functions these form an expression tree where a vertex can be evaluated to a value that can be used for roulette selection to select vertices probabilistically to pair with a new vertex. The roulette selection is one of the two base selection methods in the collection providers, the other being uniform random. The roulette selection uses an expression given to determine a value for each vertex in the graph. It then selects probabilistically one or more of the vertices based on the proportional value resultant of the expression evaluation of each vertices. Whether a roulette selection or a random selection, each has implemented using a stack and a queue. It is also important to note that these collections implement a taboo. A vertex may be added at most once into an instance of a collection even if removed from the collection. The functions, where g is a graph and n is the number of vertices to add initially to the collection, are Edge creation functions are simply those functions responsible for creating new edges. The collection consists of probabilistic edge creation, and definite edge creation and empty edge creation (handled by the system as no new edge i.e. E ← E ∪ ∅). The vertex providers take a vertex collection and a vertex, v and add n vertices related to v by some edge association. These associations consist of predecessor, CHAPTER 7. PROPOSED METHODOLOGY 52

Abstract Method Collections SelectVertices random value provider, vertex property provider, vertex collection provider, arithmetic functions, integer constants, and real constants. AddEdge edge creation functions, and real constants. SecondaryActions vertex providers, random value providers, integer constants, and real constants.

Table 7.1: Accessible Collections of Functions and Terminals by Abstract Method

successor, or neighbour. Since the vertex collections are taboo collections, the n vertices selected will be distinct to the collection thereby avoiding multi-edges. Finally, the random value providers are responsible for producing values. These providers, in contrast to generated constant, provide random values at runtime of the algorithm. Three providers are implemented serving Integer values from one of the following: an uniform distribution, geometric distribution, or Bernoulli trials. Further to these collections, generated constants in the form of Integers, and Real (double-precision floating-point) numbers are provided to some abstract methods. Each abstract method is given access to only some collections that have been described. Table 7.1 outlines the collections accessible to each method.

7.3 Fitness Evaluation

It is not possible to directly compare a program to an unknown complex network, the target, no matter what is known of the network. Therefore, it is necessary to compare the program to the target graph using instance graphs of the program being evaluated. An instance graph refers to a graph generated by an evolved program. The means of comparison, the fitness evaluation, is established based on previous works [10] in the inference of graph models and modified for the application in directed complex networks. The fitness of the program is the mean difference in average geodesic path length (Equation 7.1), network average clustering coefficient (Equation 7.2), and degree dis- tributions (both in- and out-degrees) (Equations 7.3 and 7.4) of m instance graphs

(Xi) against the target graph (T ) (See Chapter 2 for details in computing each value). In Equations 7.3 and 7.4, the function F refers to the cumulative empirical distribu- tion function of the in-degree and G to the cumulative empirical distribution function of the out-degree. CHAPTER 7. PROPOSED METHODOLOGY 53

Solution Fitnesses f1 rank f2 rank f3 rank Sum of Ranks (3, 2, 2) 3 2 2 7 (1, 3, 4) 2 3 3 8 (0, 0, 1) 1 1 1 3

Table 7.2: Example of Sum of Ranks

m 1 X F (X,T ) = | l(X ) − l(T )| (7.1) 1 m i i=1

m 1 X F (X,T ) = | C(X ) − C(T )| (7.2) 2 m i i=1

m 1 X F (X,T ) = D 0 (F (X ),F 0 (T )) (7.3) 3 m n,n n i n i=1

m 1 X F (X,T ) = D 0 (G (X ),G 0 (T )) (7.4) 4 m n,n n i n i=1 Since this problem is multi-objective in nature, a means of comparing solutions qualities is necessary. Means of comparison are established using the Sum-of-Ranks approach to multi-objective optimization. The Sum-of-Ranks algorithm sums the relative rank of each solution (or in this case graph model) in a population over each of the established objectives. For example, Table 7.2 describes a population of three solutions with three objectives where each objective goal is minimization. The ranking value of each solution is presented in the columns of f1 rank, f2 rank, and f3 rank, respectively. The ranks are summed and shown in the column Sum of Ranks. In this example, the 3rd solution is determined the best as it has the lowest summed rank value. After each new population is constructed, all programs are evaluated over the objectives: F1, F2, F3, and F4. Then, each program is ranked relative to its population using the previously described method, Sum-of-Ranks. Moreover, now each program is comparable relative to the other programs in its population. Chapter 8

Experiments – Reproducing Graph Models

In order to determine the utility of the methodology proposed by this thesis, graph models were evolved from target graphs generated by known graph models. For ex- perimentation, the models selected were the Growing Random model (GR), Barabasi- Albert model (BA), the Forest Fire model (FF), and the Ageing Preferential Attach- ment model (APA) described in Chapter 3. Known models were select as a more thorough validation can be performed as limitless number of graphs could be used for comparisons. The GR model was selected as it is a random model with only a growing property making it the simplest to model. The BA model uses a vertex selec- tion process which requires the LinkableGP to find a formula to represent it. The FF model is the most complex of the four models selected. It has complex patterns for the addition of edges to the graph and makes it the most challenging of the models for LinkableGP. Finally, the APA adds a new property (age) to the modelling and therefore requires the system to discover new processes not seen in other models. For each model, four target graphs were generated with 100, 250, 500, and 1000 vertices, respectively. LinkableGP was run 30 times for each target graph, resulting in a collection of evolved models. For each target size of each known model, a single evolved model was chosen from the target’s respective collection of evolved models. The final evolved model for each target was selected by way of a Sum-of-Ranks test on the collection of evolved models. The evolved model ranked the highest was selected. A number of graphs (1000) were generated using each final evolved with the same number of vertices as its respective target graph and compared to the target graph. These comparisons provide a means of verifying that the final fitness of the graph was not a chance occurrence.

54 CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 55

The aim of validating the final evolved model is to determine whether it reproduces the graph model that produced the target graph used. Logically, there exist many ways that validation schemes might be designed, some more rigorous than other. One such way is visual examination of the graph model algorithms produced and the graphs it produces. However, visual inspection is not a very formal approach to validation but does reveal some information about the model and the graphs it produces. Therefore, only a brief visual comparison for each set of experiments will be provided. A complete set of algorithms from the evolved models is found in Appendix A. Also, side by side visualization of target graphs and graphs produced by each evolved graph model are found in Appendix B. The approach of this thesis in the validation of models is to compare the structure of the graphs produced by evolved model to those of the targeted model statistically. In order to do so, each final evolved model was used to produce 1000 graphs of each size (100, 250, 500, 1000 vertices). The same was done with the targeted graph model. The evolved graph model and the targeted graph model were then compared over each same-vertices-sized set. Comparison was performed by way of t-tests to compare the average geodesic path lengths and the clustering coefficients and using KS tests for the in- and out-degree distributions. The sample sizes were selected as they provided a sample power of 0.971. The effect size for Welch t-tests was 0.2 with a significance level of 0.01. The effect size for chi-squared tests was determined to be 0.1 with a significance level of 0.01. Furthermore, where a t-test was employed, each sample was tested using the Shapiro-Wilk test and found to be normally distributed. In the remainder of this chapter, the results of the experiments on each model will be presented. In Section 8.1, the setup for experimentation will be described. Thereafter, Sections 8.2, 8.3, and 8.4 will present the results for experimentation in reproducing the GR, BA, and FF models. Finally, Section 8.6 will provide some discussion on the overall results of experiments.

8.1 Experimental Setup

For all experiments performed, LinkableGP was configured with the parameters found in Table 8.1. The system parameters for LinkableGP used in experimentation were established empirically during small-scale experiments in the earlier stages of exper- imentation. The resulting evolved models were named based on their target model and target graph size as listed in Table 8.2. CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 56

Parameter Value Population 50 Generations 30 Crossover Rate 0.9 Mutation Rate(m1, m2) 0.1 Tournament Size 3 Elitism 0.1 Initial Chromosome Length Ranges SelectVertex [1, 15] AddEdge [1, 5] SecondaryAction [1, 25]

Table 8.1: LinkableGP Parameters

Each target graph was generated using igraph1. The settings used to construct each target graph within a model only differed by number of vertices. Table 8.2 lists the settings used for each model.2 The m values used in the settings were all 1 and was chosen as to remove the responsibility of the LinkableGP from having to determine the number of initial vertex candidates. The remaining settings were chosen based purely upon the example uses provided in the igraph documentation. Beyond variation of the graph size, no experimentation with graph model parameters was performed.

8.2 Growing Random Model

The Growing Random model constructs graphs by adding vertices at each time step and forming an uniform and randomly chosen vertex with which to form an edge. Of the models experimented with, it is the simplest. However, there are certain constraints that evolved models must follow if they are to be considered feasible given the parameters used to construct the target graphs (See Section 8.1 for parameter details).

• Each vertex must have exactly one out-bound edge except the first vertex added

• A graph produced may not contain any transitivity

1igraph is a library with an implementation in R for applications with graphs. Information about the library can be found at http://www.igraph.org 2For more details on the implementation and the application of the parameters for each of this graph models, see Chapter 3. CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 57

Experiment Id Model Parameters GR-100 Growing Random Graph Model n = 100, m = 1 GR-250 Growing Random Graph Model n = 250, m = 1 GR-500 Growing Random Graph Model n = 500, m = 1 GR-1000 Growing Random Graph Model n = 1000, m = 1 BA-100 Barabasi-Albert Model n = 100, m = 1, α = 1, a = 1, c = 1 BA-250 Barabasi-Albert Model n = 250, m = 1, α = 1, a = 1, c = 1 BA-500 Barabasi-Albert Model n = 500, m = 1, α = 1, a = 1, c = 1 BA-1000 Barabasi-Albert Model n = 1000, m = 1, α = 1, a = 1, c = 1 FF-100 Forest Fire Model n = 100, m = 1, p = 0.37, 0.32 r = 0.37 FF-250 Forest Fire Model n = 250, m = 1, p = 0.37, 0.32 r = 0.37 FF-500 Forest Fire Model n = 500, m = 1, p = 0.37, 0.32 r = 0.37 FF-1000 Forest Fire Model n = 1000, m = 1, p = 0.37, 0.32 r = 0.37 APA-100 Ageing Preferential Attachment Model n = 100, m = 1, α = 1, a = 1, c = 1, β = −1, b = 1, d = 0 APA-250 Ageing Preferential Attachment Model n = 250, m = 1, α = 1, a = 1, c = 1, β = −1, b = 1, d = 0 APA-500 Ageing Preferential Attachment Model n = 500, m = 1, α = 1, a = 1, c = 1, β = −1, b = 1, d = 0 APA-1000 Ageing Preferential Attachment Model n = 1000, m = 1, α = 1, a = 1, c = 1, β = −1, b = 1, d = 0

Table 8.2: Experiments CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 58

One of the selected best-evolved models, the GR-100, provides an initial per- spective on the quality of solutions that will be seen in this section. Figure 8.1 shows the target graph and a graph produced from the GR-100 model. It is observed that both have similar types of spanning trees and have a number of vertices that have 3 to five in-degrees that serves as a sort-of hub. Examination of Algorithm 12 rep- resenting the GR-500 model shows that a great deal of benign code was produced but, importantly, variable b in SelectVertices generates a random selection of vertex and then an edge is created by AddEdge which is similar to the processes of GR.

28 89

24 68 73 77 76 33 21 20 70 59 52 82 43 88 2 57 18 29 40 55 30 58 75 94 19 69 54 47 1 92

23 22 100 81 63 87 17 64 3 91 84 5 83 10 4 11 74 25 85 90 6 56 12 32 42 45 7 86 31 16 15 50 14 8 96 93 61 97 60 79 27 34 98 36 67 99 26 37 95 9 35 44 41 13 71 49 48 62 46 53 38 72 65

39 66

80 78 51 (a) Growing Random Model with 100 Vertices

92 44 61 50 83 100 62 55 77 32 33 14 43 52 36 76 66 86 73 10 95 40 72 93 15 68 63 67 88 75

12 97 98 74 8 51 31 11 99 87 69 16 35 20 6 82 29 59 37 42 4 84 27 22 91 94 24 21 34 19 96 85 5 81

70 58 71 48 2 28 39 89 80 79 41 18 7 3 9 47

23 17 60 38 26 53 1 57 54

46 64 65 78 25 90 13 56 49 30 45 (b) Evolved Growing Random Model with 100 Vertices

Figure 8.1: Graphs from the GR model and the GR-100 model

In examining the results of GR-100, GR-250, GR-500, and GR-1000 found in

Table 8.3, based on fitness objectives F2, and F4, it is found that the constraints are satisfied. If any of the models produced graphs which violated the out-bound degree

constraint, F4 would show a D-Value that sometimes deviated from 0. In the case of transitivity, if a model sometimes produced transitivity there would

be some deviation found in the difference in clustering coefficients (F2). CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 59

Algorithm 12: GR-500 Function GenerateModel(int t)return graph G ← InitialiseGraph(); for i ← 1 to t do v ← V ertex(); W ← SelectV ertices(G); while |W | > 0 do w ← Next(W ); E ← E ∪ AddEdge(v, w); SecondaryActions(W, w); end while V ← V ∪ v; end for return G override Function SelectVertices(Graph g)return vertices a ← GeometricV alue(0.402696); b ← GetRandomQueue(g); c ← GetLocalT ransitivity(); return b override Function AddEdge(Vertex v1,Vertex v2 )return edge a ← CreateEdge(v1, v2); return a override Procedure SecondaryActions(Vertex v, Vertices V ) a ← GeometricV alue(0.886134); a ← RandomV alue(a, a); a ← BernoulliV alue(0.93379);

F1 F2 F3 F4 Experiment x¯ s x¯ s x¯ s x¯ s GR-100 0.07 0.342 0 0 0.037 0.014 0 0 GR-250 0.702 0.338 0 0 0.032 0.011 0 0 GR-500 0.339 0.351 0 0 0.024 0.009 0 0 GR-1000 0.08 0.344 0 0 0.016 0.006 0 0

Table 8.3: Final evolved GR models results comparing 1000 graphs produce by the best-evolved model vs. its target graph CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 60

The results in Table 8.3, with respect to F3, presents the average and standard deviation of the D-Value of the in-degree distribution of 1000 graphs from the evolved model versus its respective target graph. The mean results found in Table 8.3 are fount to be more than 3 standard deviations less than the relevant critical values as given in Table 8.4. This observation shows that the in-degree distributions produced by the evolved models are similar to those found in the target graph.

Interpretation of the results for F1 is difficult to interpret via comparison to the model. The values are optimal relative to the other evolved models not selected to produce average path lengths similar to those found in the target graph. However, there is no means of determining if these differences are fit relative to the target graph and therefore the target model. Instead, it is necessary to defer to the validation of the evolved models to determine the how well the average geodesic path lengths compare to the observed average geodesic path lengths of the GR. In Figure 8.2, the results of the comparison between the best-evolved model for each GR target is shown for their distribution of average geodesic path length and network average clustering coefficient. This figure shows that all evolved models pro- duce the clustering coefficients expected. Comparing the network average clustering coefficient between the evolved models the growing random model the seem to be sim- ilar except the GR-1000. The path lengths observed in this evolved model exceed those observed in the actual growing random model(See Subfigure 8.2d).

n1 \n2 100 250 500 1000 100 0.192 0.161 0.149 0.143 250 0.161 0.122 0.105 0.096 500 0.149 0.105 0.086 0.074 1000 0.143 0.096 0.074 0.061 note: the critical values with a significance of 0.05 are computed by q n1+n2 Dcrit = 1.36 n1n2

Table 8.4: Dcrit Values CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 61

3.0 ● ● ● ● ● ● ● ● ● ● ● ● 3.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

2.5 2.5

2.0 2.0

1.5 1.5

1.0 1.0

0.5 0.5

0.0 0.0 CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP

(a) GR-100 (b) GR-250

● ● ● ● 3.5 ● ● ● ● ● ● ● ● ● ● ● ●

2.5 ● ● ● ● 3.0 ● ● ● ● ● ● ● ● ● ● ● ● 2.0 2.5

2.0 1.5

1.5 1.0 1.0

0.5 0.5

0.0 0.0 CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP

(c) GR-500 (d) GR-1000

Figure 8.2: Box plots of the average geodesic path length (AGP) and network average clustering coefficient (CC) for GR experiments (evolved) versus the growing random model(target). In each chart the four groups of box plots represent, in order from left to right, graphs with 100 vertices, 250 vertices, 500 vertices, and 1000 vertices

A t-test comparing the samples of average geodesic path lengths from the evolved models and the growing random model was performed, and the results are found in Table 8.5. The results confirm the previous observations of Figure 8.2, that with a 99% confidence the average geodesic path lengths of the GR-100, GR-250, and GR-500 evolved models are similar to those observed in the growing random model. The final area properties to inspect in validation for the evolved models of the growing random model are the in- and out-degree distributions. To compare this CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 62

Model \Size 100 250 500 1000 GR-100 1.285 0.322 0.196 0.903 GR-250 0.609 1.287 0.358 0.454 GR-500 0.366 0.125 1.622 0.955 GR-1000 8.486 9.59 7.536 9.524

Table 8.5: T-Values of the comparisons between GR evolved models and the growing random model. The critical T-Statistic for a 0.01 significance test with 1000 degrees of freedom is 2.326.

1000 graph pairs were compared at each vertex size for each model. In 100% of cases all four evolved models were able to replicate the degree distributions observed in the growing random model for graphs with 100, 250, 500, 1000 vertices.

8.3 Barabasi-Albert Model

The Barabasi-Albert Model is graph model that models growing preferential attach- ment networks. A highlight of these networks is that the network is scale-free meaning that the in-degree of the network follows a power law distribution, a common property of social networks [69]. In the experiments conducted with this model, there are a few constraints which are resultant of the parameters used in constructing the target graph. As a result, evolved models that do not observe these constraints can be said to be invalid. For the target graphs in this section those constraints are

• Each vertex must have exactly one out-bound edge except the first vertex added

• A graph produced may not contain any transitivity

Comparing the graphs produced by the BA model and the BA-250 model found in Figure 8.3 shows that the graph of the evolved model produces a few high in-degree vertices like the graph of the BA model. The figure also shows that the spanning trees expected in a BA graph are also in the evolve model’s graph. The algorithm of the BA-500 shows that the equation expected to compute the probabilities of selection are being discovered. BA-500 pseudo-code is found in B as Algorithm 18. Examination of Table 8.6 confirms that all evolved model satisfied the constraints of the out-degree and transitivity. It is apparent that the evolved models did not show deviation from the constraints because the values of F2, F3 are zero for their mean and standard deviation. The lack of deviation means that the models all showed in CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 63

(a) Barabasi-Albert Model with 250 Vertices

189 171 208 182 249 194 51 75 102 137 163 47 238 184 201 158 18 132 169 125 74 69 106 153 227 105 200 116 138 149 76 27 53 4 187 177 234 143 107 120 37 124 39 95 233 240 203 52 25 219 186 23 191 127 100 14 122 79 9 20 241 223 19 108 246 235 112 36 90 24 43 183 3 133 115 56 212 114 83 211 165 210 166 16 101 87 41 33 98 7 30 229 226 13 21 61 250 152 131 159 65 86 215 104 73 154 40 31 26 167 216 113 140 49 221 118 6 80 91 72 176 146 71 144 134 117 45 2 172 29 209 197 168 198 162 185 1 15 155 59 96 111 34 170 32 129 92 84 70 242 17 42 174 173 230 99 199 243 192 220 8 188 222 109 145 136 119 82 225 55 35 78 57 151 248 239 135 193 218 190 237 11 54 178 224 123 231 205 206 50 157 10 121 196 28 207 141 139 5 38 89 60 12 148 202 63 58 175 94 93 85 147 150 110 62 217 160 244 204 22 128 66 228 142 44 180 164 181 46 156 67 88 48 236 161 77 245 179 64 130 232 126 81 247 68 103 213 195 97 214 (b) Evolved Barabasi-Albert Model with 250 Vertices

236 168 219 215 92 72 159 52 248 180 167 33 237 235 185 104 207 166 165 192 212 110 17 128 96 145 142 65 29 204 193 111 205 21 211 119 151 61 120 249 221 171 67 32 133 82 54 250 18 247 73 138 90 134 84 198 178 187 98 195 10 139 130 190 246 12 224 182 109 6 94 122 196 86 93 64 78 140 24 20 131 50 157 174 175 126 188 200 230 209 242 22 186 31 234 216 35 23 137 154 103 7 87 149 164 203 101 172 47 51 5 70 191 118 199 63 117 9 97 48 223 3 75 150 176 8 233 238 129 208 44 222 13 66 197 76 42 41 231 141 59 55 80 83 91 108 62 220 189 99 169 26 19 114 213 147 71 194 102 68 183 57 125 116 11 37 46 156 179 105 36 228 14 30 225 40 232 15 60 135 239 27 79 2 85 58 241 214 184 34 53 77 107 113 112 162 95 143 155 74 89 49 160 226 25 69 163 148 173 146 245 202 123 38 56 43 127 106 210 4 206 1 227 217 45 115 170 201 144 161 181 136 158 16 100 240 28 218 121 39 243 132 177 153 81 229 244 124 152 88

Figure 8.3: Graphs from the BA model and the BA-250 model CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 64

F1 F2 F3 F4 Experiment x¯ s x¯ s x¯ s x¯ s BA-100 0.046 0.338 0 0 0.08 0.025 0 0 BA-250 0.245 0.284 0 0 0.042 0.017 0 0 BA-500 0.311 0.314 0 0 0.018 0.007 0 0 BA-1000 0.314 0.326 0 0 0.014 0.005 0 0

Table 8.6: Final evolved BA models results comparing 1000 graphs produce by the evolved model vs. its target graph their 1000 tests that the clustering coefficients and the out-degree distributions were identical to the target.

For the fitness objective, F3, Table 8.6 provides the mean and standard deviation of the 1000 comparisons to the target graph. Examination of these values demonstrated that even +3 standard deviations are below the critical D-Values for the KS test (see table 8.4). Thus, the evolved model always showed similar in-degree distributions to the target graph. As was the case in Section 8.2, comparing the average geodesic path lengths of the evolved model to the target graph reveals little about the quality of the model. However, the validation using the target model will help to determine the quality of the average geodesic path lengths. Comparison of the properties of the average geodesic path lengths of the actual Barabasi-Albert model and the evolved models will provide a much more meaningful measure of performance. Figure 8.4 shows the observations during validation of the evolved models targeting the Barabasi-Albert model for the average geodesic path length and the clustering coefficient. The box plots show that the distribution of the observed average geodesic path lengths of the evolved models compared to those of the Barabasi-Albert model are quite similar. Validation by way of the t-test reveals that the mean observations are not statistically different. Table 8.7 shows the results of this test. As was observed in the initial testing of the evolved model versus the target graph, validation shows that the constraints with respect to out-degree and clustering coefficients are respected. The clustering coefficients of all four evolved graph models were always 0. The out-degree distributions of the evolved models were, also, always identical to the graphs produced by the Barabasi-Albert model. The KS-test of the in-degree distributions of the evolved model versus those of the Barabasi-Albert model showed that 100% of the graphs tested had statistically similar distributions. CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 65

Model \Size 100 250 500 1000 BA-100 0.507 0.538 0.245 0.038 BA-250 1.951 0.993 0.322 0.123 BA-500 1.749 0.429 1.914 1.14 BA-1000 1.898 1.617 0.653 0.029

Table 8.7: T-Values of the comparisons between BA evolved models and the Barabasi- Albert model. The critical T-Statistic for a 0.01 significance test with 1000 degrees of freedom is 2.326.

● ● ● ● ● ● ● ● 2.0 ● ● ● ● 2.5 ● ● ● ● ● ● ● ● ● ● ● ● 2.0 ● ● ● ● ● ● ● ● 1.5

1.5 1.0

1.0

0.5 0.5

0.0 0.0 CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP

(a) BA-100 (b) BA-250

2.5 ● ● ● ● ● ● ● ● ● ● ● ● 2.0 ● ● ● ● ● ● ● ●

2.0 ● ● ● ●

1.5 1.5

1.0 1.0

0.5 0.5

0.0 0.0 CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP

(c) BA-500 (d) BA-1000

Figure 8.4: Box plots of the average geodesic path length (AGP) and network average clustering coefficient (CC) for BA experiments (evolved) versus the growing random model(target). In each chart the four groups of box plots represent, in order from left to right, graphs with 100 vertices, 250 vertices, 500 vertices, and 1000 vertices CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 66 8.4 Forest Fire Model

The Forest Fire Model presents a new set of challenges in modelling that were not experienced with the Barabasi-Albert and growing random models. In the Forest Fire model, there is a clustering coefficient and a non-regular out-degree. This model presents the greatest challenge in modelling because the processes in the FF model exhibit recursion, use floating point constants that are very sensitive, and there are multiple processes that select vertices for attachment. Algorithm 13 shows one of the most successful and simplest solutions found. This algorithm illustrates that LinkableGP had to evolve a number of constants in order to be able to construct a good model.

Algorithm 13: FF-500 Function GenerateModel(int t)return graph G ← InitialiseGraph(); for i ← 1 to t do v ← V ertex(); W ← SelectV ertices(G); while |W | > 0 do w ← Next(W ); E ← E ∪ AddEdge(v, w); SecondaryActions(W, w); end while V ← V ∪ v; end for return G override Function SelectVertices(Graph g)return vertices a ← BernoulliV alue(0.02819); a ← GeometricV alue(0.941932); a ← BernoulliV alue(0.347325); b ← GetRandomStack(g); c ← GetRandomQueue(g); return c override Function AddEdge(Vertex v1,Vertex v2 )return edge a ← CreateEdge(v1, v2); return a override Procedure SecondaryActions(Vertex v, Vertices V ) a ← GeometricV alue(0.63182); a ← GeometricV alue(0.681023); AddSuccessor(V, a, v); AddP redecessors(V, b, v); CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 67

(a) Forest Fire Model with 100 Vertices

95

55

33 45 83

40 96 50 47 29

46 31 22 14 49 61 71 90 57 85 27 65 13 36 17 78 98 6 18 80 32 9 43 86 10 91 62 1 41 53 2 88 16 70 51 37 59 25 74 30 38 69 72 7 24 3 73 12 34 58 93 84 44 97 63 8 99 67 54 39 79 21 4 20 92 100 26 56 19 5 52 81 15 76

28 75 77 48 68 23 11 42 66 60

35 64

82 87

89 94 (b) Evolved Forest Fire Model with 100 Vertices

48 20 9 25 30 57 89 90

23 92 2 24 83 1 41 84 67 80

69 86 95 33 62 39 78 56 32 28 37 71 74 16 43 3 4 79 98 13 10 76 29 31 12 5 49 6 7 53 11 50 14 59 55 73 75 54 46 88 40 8 15 17 34 47 21 96 42 26 51 19 93 77 36 72 97 99 18 70 63 38 87 27

52 66 94 85 35 91

22 45 44 64 68 58 81 61

100 60

65

82

Figure 8.5: Graphs from the FF model and the FF-100 model

Despite some successes in evolving models that replicated the FF model, not all solutions were successful. Figure 8.5 shows one such example. The graph produced by the FF-100 shown in the figure has a noticeable difference from the graph produced by the FF model. The graph from the FF-100 does not have the same type of dense connection sets as in the graph from the FF model. Examination of the evolved models compared to the target graph reveals the increased difficulty in modelling the Forest Fire. Table 8.8 shows the results of the comparison between the evolved models and the target graph. This table shows that the models display a difference in average geodesic path lengths to the target that is similar to those observed in Tables 8.3 and 8.6 which is a good initial indicator. However, a more thorough examination of the average geodesic path lengths against that of the actual model is required. Likewise, the difference in clustering coefficient values seem good but comparison against the target graph does not add much value to the model. CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 68

F1 F2 F3 F4 Experiment x¯ s x¯ s x¯ s x¯ s FF-100 0.182 0.165 0.055 0.054 0.079 0.04 0.081 0.035 FF-250 0.127 0.115 0.082 0.053 0.191 0.069 0.118 0.02 FF-500 0.123 0.123 0.004 0.02 0.045 0.018 0.047 0.02 FF-1000 0.138 0.123 0.001 0.016 0.04 0.018 0.037 0.015

Table 8.8: Final evolved FF models results comparing 1000 graphs produced by the evolved model vs. its target graph

Fitness objectives F3 and F4, in Table 8.8 do, however, provide some valuable insights into the model. The mean value of the objectives for each model are below the relevant critical D-values for the KS test found in Table 8.4. Even one standard deviation is still below the critical D-value. However, unlike the results of the previous

models, a computed value of +3 standard deviations from the mean values of F3 and

F4 for each model shows that the values exceed the D-critical value. This result means that sometimes the evolved models can produce degree distributions that are not similar those observed in the target graph. In order to validate the ability of the evolved models to replicate the degree distri- butions expected of the target model, an experiment was designed. In the experiment, a graph from an FF evolved model was compared to a graph from the FF model using the KS test. The KS test was perform 1000 times each with a new graph from the evolved model and the FF model. For each test performed a count of how many times the KS test found the graphs to have similar degree distributions was recorded. The same experiment was performed on where both graphs in the KS test came from the FF model to provide a baseline of how often the model generated statistically similar degree distributions. The proportion of tests that showed a similar degree distribution is found in Table 8.9. A Pearson’s chi-squared test [73] was performed to determine if the number of occurrences where the evolved model had a similar degree distribution was at least as much as when the graphs both came from the FF model. The Pearson’s chi-square test examines categorical variables to determine how likely the difference observed between groups occurred by chance. Each one-sided Pearson’s chi-squared test performed for an evolved model compared the differences observed in the frequency of KS test that should statistically similar degree distributions versus the same for the target model. The results shown in brackets in Table 8.9 reveals that the FF-100 and FF-250 models often reject the null hypothesis (p-value less than 0.01). The evolved model differences did not occur by chance and were found to be less than expected by the CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 69

Model \Size 100 250 500 1000 In-Degree FF-100 0.99 (1) 0.95 (0.963) 0.8 (0) 0.48 (0) FF-250 0.83 (0) 0.48 (0) 0.14 (0) 0 (0) FF-500 1 (1) 0.98 (1) 0.96 (1) 0.96 (1) FF-1000 0.99 (1) 0.95 (0.963) 0.999 (1) 0.89 (1) Target Model 0.96 0.93 0.87 0.81 Out-Degree FF-100 1 (1) 0.97 (0.863) 0.92 (0.004) 0.79 (0) FF-250 0.95 (0.015) 0.69 (0) 0.27 (0) 0 (0) FF-500 0.99 (0.999) 0.99 (1) 0.96 (0.834) 1 (1) FF-1000 0.99 (0.999) 1 (1) 0.95 (0.5) 0.99 (1) Target Model 0.97 0.96 0.95 0.93

Table 8.9: Results of 1000 KS-tests comparing degree distributions of the evolved models and the forest fire model. The values in this table are the proportion of tests that showed that each graph pair had the similar degree distributions. Values in brackets are the computed p-value of a one-sided Pearson’s chi-square test which tested the null hypothesis that the evolved model passed the KS test at least as often as the target model.

target model. However, the FF-500 and FF-1000 did show no statistically significant differences and slightly more often was able to match the degree distributions more steadily the target model compared to itself. Validation of the models by way of the average geodesic path lengths and the clustering coefficients, provides further evidence that the forest fire model has not been successfully modelled by the FF-100 and FF-250 evolved models. Figure 8.6 provides box plots of the average geodesic path lengths and clustering coefficients observed during the validation. The figure shows just how significant the difference in the average path lengths and clustering coefficients are for the FF-100 and FF- 250. However, Figure 8.6 also shows that the FF-500 and FF-1000 evolved have sim- ilar distributions for the average path lengths and clustering coefficients to the forest fire model. T-Tests performed on these distributions confirm that the means of the FF-500 and FF-1000 evolved models average geodesic path lengths and clustering coefficients are not statistically different. CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 70

2.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

1.5 1.5

1.0 1.0 ● ● ● ●

● ● ● ●

● ● ● ● ● ● ● ● 0.5 0.5

● ● ● ● ● ● ● ● CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP

(a) FF-100 (b) FF-250

● ● ● ● ● ● ● ● 2.0 1.6

● ● ● ● ● ● ● ● 1.4

1.5 1.2

1.0 1.0 0.8 ● ● ● ●

0.6 0.5

● ● ● ● 0.4 CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP

(c) FF-500 (d) FF-1000

Figure 8.6: Box plots of the average geodesic path length (AGP) and network average clustering coefficient (CC) for FF experiments (evolved) versus the growing random model(target) note: In each chart the four groups of box plots represent, in order from left to right, graphs with 100 vertices, 250 vertices, 500 vertices, and 1000 vertices

8.5 Ageing Preferential Attachment Model

The preferential attachment is closely related to the Barabasi-Albert model however it differs as it considers the age of vertices. The vertex property makes this model very different than the previously examined models of this chapter. In order to CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 71 successfully model the network produced by the ageing preferential attachment model, it is necessary to add new constructs to the language and the abstract class. The first construct that needs to be added is in the abstract class. Algorithm 11 is amended to track time in the graph now and provide vertices with a birth value. The change is reflected in Algorithm 14. In order for a model to utilize this information, the language must also be updated. Therefore, the Vertex property evaluators, is amended to include a function that provides the age of a vertex. The age of a vertex is computed as the difference in the current time set for the graph and the birth value of the vertex.

Algorithm 14: Modified Abstract Graph Model for Complex Networks Function GenerateModel(int t)return graph G ← InitialiseGraph(); for i ← 1 to t do v ← V ertex(); SetBirth(v, i); SetT ime(g, i); W ← SelectV ertices(G); while |W | > 0 do w ← Next(W ); E ← E ∪ AddEdge(v, w); SecondaryActions(W, w); end while V ← V ∪ v; end for return G abstract Function SelectVertices(Graph g)return vertices abstract Function AddEdge(Vertex v1,Vertex v2 )return edge abstract Procedure SecondaryActions(Vertex v, Vertices V )

The target graphs produce by the ageing preferential attachment model used the same settings as the Barabasi-Albert target graphs (see intro of this chapter) and an age preference exponent of -1. The value of the age preference exponent means that over time new vertices are less likely to attach to older vertices. Also, the target graphs produce have a regular out-degree and no clustering coefficients. Thus, evolved models should also exhibit these characteristics. Immediately, it is observed in Table 8.10 that the expected out-degree and cluster- ing coefficients of the targeted models are reproduced. If there were evidence that an evolved model produced a clustering coefficient or had an out-degree greater than one CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 72

F1 F2 F3 F4 Experiment x¯ s x¯ s x¯ s x¯ s APA-100 1.38 1.048 0 0 0.23 0.007 0 0 APA-250 1.701 1.198 0 0 0.022 0.008 0 0 APA-500 1.461 0.093 0 0 0.022 0.008 0 0 APA-1000 1.56 0.902 0 0 0.023 0.009 0 0

Table 8.10: Final evolved APA models results comparing 1000 graphs produce by the evolved model vs. its target graph

Model \Size 100 250 500 1000 APA-100 0.374 0.981 0.223 0.431 APA-250 1.538 0.454 0.341 1.035 APA-500 0.37 1.048 0.782 1.125 APA-1000 0.265 1.836 0.549 1.183

Table 8.11: T-Values of the comparisons between APA evolved models and the grow- ing random model. The critical T-Statistic for a 0.01 significance test with 1000 degrees of freedom is 2.326.

there would be evidence of a mean or standard deviation different than 0 in columns

F2 and F4 of Table 8.10.

Further examination of Table 8.10 shows that F3, the D-value computed for the difference in in-degree distributions, for all evolved models is fit to the target. Adding three standard deviations to the mean over each model is still well below the relevant critical D-value. However, notably, the difference in average geodesic path lengths of these evolved models to the target graph is much greater than those reported for evolved of the GR, BA, and FF. In the validation, it is discovered that the distribution of average geodesic path lengths for the ageing preferential attachment model are much wider than the previous models. Thus, examining the box plots found in Figure 8.7 reveals that the average path length differences may be acceptable. Table 8.11 shows the results of the t-tests performed on the observed average geodesic path lengths of each evolved model and the target model. The analysis confirms that there are no statistically significant differences in the average geodesic path lengths of the models compared to those of the target model. CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 73

● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 ● ● ● ●

4 4

3 3

2 2

1 1

0 0 CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP

(a) APA-100 (b) APA-250

● ● ● ● 6 ● ● ● ● 7

● ● ● ● 6 5 ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● ● 4

4 3 3 2 2

1 1

0 0 CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) CC (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP (target) AGP CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) CC (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP (evolved) AGP

(c) APA-500 (d) APA-1000

Figure 8.7: Box plots of the average geodesic path length (AGP) and network average clustering coefficient (CC) for APA experiments (evolved) versus the growing random model(target) note: In each chart the four groups of box plots represent, in order from left to right, graphs with 100 vertices, 250 vertices, 500 vertices, and 1000 vertices

Finally, the KS-tests performed for validation of the in-degree distributions of the evolved models showed no statistical difference in the in-degree distributions in any of the four evolved model compared to the target model. Thus, with respect to the fitness measures employed, it is concluded that the four models are valid replications of the ageing preferential attachment model. CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 74 8.6 Overall Performance

The results discussed in this chapter have made no remark of the performance of the average runs produced by the methods of this thesis. In this section, the general perform of the GP system will be analysed. First, the average final result of trial runs will be examined. Then, the performance by generation of the system will be discussed.

Fitness Objective Mean Std. Dev Median 3rd Quantile Skewness Kurtosis GR-100 F1 6.532 24.853 0.001 0.002 3.474 13.070 F2 0.000 0.000 0.000 0.000 NA NA F3 0.080 0.155 0.020 0.002 2.124 5.878 F4 0.045 0.131 0.000 0.000 2.646 8.049 GR-250 F1 0.030 0.071 0.007 0.016 1.145 20.502 F2 0.011 0.059 0.000 0.000 5.199 28.034 F3 0.041 0.098 0.012 0.016 3.589 15.240 F4 0.036 0.114 0.000 0.000 3.038 10.650 GR-500 F1 82.817 1888.323 0.007 0.039 1.789 4.200 F2 0.006 0.033 0.000 0.000 5.199 28.034 F3 0.073 0.131 0.010 0.013 1.962 5.894 F4 0.037 0.093 0.000 0.000 3.480 15.83 GR-1000 F1 232.560 428.636 0.0167 0.720 1.261 2.590 F2 0.000 0.001 0.000 0.000 4.967 26.305 F3 0.071 0.104 0.007 0.133 1.108 2.502 F4 0.034 0.077 0.000 0.007 2.762 10.818 Table 8.12: Overall Final Fitness Results of GR-100, GR-250, GR-500, and GR- 1000

In order to have some indication of the average system performance, descriptive statistics of the final fitness values of each experiment were computed. The descriptive statistics are reported in Tables 8.12, 8.13, and 8.14. In many of the cases, the mean performance of LinkableGP at modelling was poor and had a very high standard devi- ation. However, the skewness of the results indicates that the average performance of LinkableGP was heavily and positively skewed. Also, the kurtosis of the results show that the majority of the results were distributed about the median. Thus, the me- dian fitness values are a much better indication of the average performance. Detailed CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 75 evidence of the skewed distributions and the tendency about the median is provided in Appendix C via charts of the empirical cumulative distributions of the results.

Fitness Objective Mean Std. Dev Median 3rd Quantile Skewness Kurtosis BA-100 F1 6.609 25.047 0.004 0.044 3.474 13.071 F2 0.000 0.000 0.000 0.000 NA NA F3 0.092 0.070 0.050 0.147 1.164 3.640 F4 0.021 0.082 0.000 0.000 3.625 14.510 BA-250 F1 16.701 63.057 0.025 0.370 3.474 13.071 F2 0.030 0.079 0.000 0.000 2.268 6.342 F3 0.086 0.006 0.078 0.156 0.139 1.228 F4 0.130 0.213 0.000 0.291 1.152 2.599 BA-500 F1 55.317 153.011 0.0540 0.135 2.517 7.519 F2 0.005 0.028 0.000 0.000 5.199 28.034 F3 0.112 0.123 0.137 0.146 2.925 14.460 F4 0.046 0.126 0.000 0.000 2.575 8.045 BA-1000 F1 64.879 249.191 0.506 0.898 3.545 13.569 F2 0.092 0.195 0.000 0.011 2.182 6.96 F3 0.109 0.065 0.143 0.153 -0.800 1.857 F4 0.154 0.245 0.000 0.300 1.620 5.388 Table 8.13: Overall Final Fitness Results of BA-100, BA-250, BA-500, and BA- 1000

Tables 8.12, 8.13, and 8.14 show that the median performance is much better. In most cases the median, and often the 3rd quantile, final fitness values were good enough to produce fit models, based on observations of the best performing evolved models. The only exceptions to the number of quality solutions was experiments with the Forest Fire model. The routine quality of the results shows that the system can consistently work towards the optimization of graph models given a single target graph. When working with evolutionary systems such as GP, it is interesting to examine if the system is improving over time. In order to determine how well LinkableGP improved solutions over time, convergence plots were produced. These plots examine the average fitness values of the best solutions of each trial run over each generation. Figure 8.8 provides an example of the typical convergence for an experiment. The figure illustrates that during the first ten generations, solutions improved a great deal CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 76

Fitness Objective Mean Std. Dev Median 3rd Quantile Skewness Kurtosis FF-100 F1 0.0482 0.143 0.010 0.032 4.856 25.632 F2 0.33 0.062 0.003 0.012 1.722 4.265 F3 0.186 0.219 0.095 0.188 1.725 4.854 F4 0.234 0.239 0.160 0.328 1.267 3.704 FF-250 F1 0.549 2.820 0.011 0.032 5.195 28.002 F2 0.019 0.044 0.005 0.009 3.799 17.792 F3 0.242 0.210 0.220 0.356 1.072 3.489 F4 0.221 0.198 0.212 0.292 1.318 4.656 FF-500 F1 3.329 17.554 0.011 0.094 5.197 28.018 F2 0.094 0.122 0.038 0.153 1.289 3.561 F3 0.208 0.203 0.118 0.328 1.805 6.035 F4 0.342 0.254 0.274 0.448 0.775 2.326 FF-1000 F1 0.192 0.358 0.075 0.197 3.755 17.332 F2 0.146 0.159 0.089 0.261 0.732 1.922 F3 0.248 0.198 0.236 0.315 1.144 4.217 F4 0.355 0.252 0.340 0.471 0.821 2.869 Table 8.14: Overall Final Fitness Results of FF-100, FF-250, FF-500, and FF- 1000

and then came to a slow convergence. However, sometimes the fitness objectives

F1 and F4 will diverge late in the run in exchange for improved values of F2 and

F3. Figure 8.9 provides an example of this phenomenon. Often, individual runs that

experience the divergence of F1 and F4 resulted in infeasible models. The convergence plots for each experiment is available in Appendix D.

8.7 Summary

In this chapter experiments were conducted with the proposed methodology to repli- cate several known graph models: the growing random, Barabasi-Albert, Forest Fire, and Ageing Preferential Attachment. In each experiment, evolved graph models were compared to a target graph generated by the relevant known graph models by way of 1000 graphs from the evolved model compared to the target graph. In every case it was demonstrated, where applicable, that the evolved models matched the target graph well. However, some measures were not so easily compared such as the cluster- CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 77

● 1.0 F1 ● F2 ● ● F3 ● F4

0.8 ●

● 0.6 ● ● ●

● ● ● ● ● ● ● 0.4

Normalized Fitness ● ●

● ● ● ● ● ● ● ● ● ● ● ●

0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0

0 5 10 15 20 25 30

Generations

Figure 8.8: Convergence Plot for FF-500

● ● 1.0 F1 ● F2 ● F3 ● F4 0.8

● ● ● ● ●

● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4 ● ●

Normalized Fitness ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● 0.0

0 5 10 15 20 25 30

Generations

Figure 8.9: Convergence Plot for BA-250 ing coefficient and the average geodesic path length as there was no clear threshold for determining fitness. The objective of this chapter was to demonstrate that the proposed methodology could replicate the model using only a single target graph. Having benefited from the use of known graph models, the chapter was able to provide a validation of evolved models against the actual model. Thus, the evolved models were compared statistically to the respective graph model. It was demonstrated that all evolved CHAPTER 8. EXPERIMENTS – REPRODUCING GRAPH MODELS 78 models, except the FF-100 and FF-250, had statistically similar values to their target model by way of t-tests comparing the differences in the average geodesic path lengths and the clustering coefficients. In the case of the Forest Fire Model, additional statistical test was employed to demonstrate the fitness of in- and out-degree of the evolved models. It was desired to know if the frequency at which a graph from the evolved model and a graph from the target model had similar degree distributions was statistically different. A chi- squared test was employed to determine if the frequency was less than the frequency of statistically similar degree distributions between two graphs from the target model. It was shown that the evolved models trained with more vertices were able to satisfy all tests however those evolved models trained with 250 or fewer vertices failed. The flexibility of the proposed methodology and LinkableGP was shown in exper- imentation with the ageing preferential attachment model. The ageing preferential attachment model introduces a new characteristic, namely vertex age, which require modifications to the system not required for experimentation with the other known graph models. Again through the same statistical approaches used with the other models it was demonstrated that the proposed methodology was able to produce statistically similar models respective to the fitness measures employed. Finally, it was shown in Section 8.6 that the overall results of the trial runs rou- tinely produced graphs that might have passed the validation criteria employed with the best models as many of the final fitness values or trial runs were comparable to the best models. The descriptive statistics of the final fitness values for the trial runs demonstrated a bias towards producing models that had very similar fitness values to those observed in the best model. While the results did show that some trial runs performed poorly, not only was it the case that the poor performance was obvious but also that the more than 75% of the results were comparable to the best evolved model. Chapter 9

Comparison of Proposed Methods to Other Modelling Approaches

In Chapter 8, the proposed methods were empirically examined by replicating existing and well-known graph models. The experiments showed that the proposed methods were able to replicate those models with respect to the fitness measures used. How- ever, these are not the only fitness measures that can be used. Moreover, in the case of the forest fire model, the fitness measures did not take into account all properties of the graph model. Recently, work has been undertaken to try and determine a framework for de- ciding which network measures will provide a good fitness of evolved graph models. This work done by Harrison [36] focusses on a large number of measures of network properties and provides insights on they prevalence in complex networks. Harrison [36] utilizes the study on networks measures and the methods proposed of this thesis. His experiments in replication of existing and well-known undirected graph models and in the construction of graph models for real-world complex networks provides further evidence that the proposed methods of this thesis are capable of producing models for a wide variety of networks. Additionally, his work demonstrates that the framework proposed by this thesis is flexible as his approach uses a different version of the abstract class proposed than proposed in this thesis. Other modelling techniques, as presented in Chapter 4, have also shown promise in constructing graph models for complex networks. Techniques, such as Kronecker graphs [55], have also been widely applied in modelling real-world networks. This is thesis and no other work has yet compared this technique to other in terms of performance and accuracy. However, it can be said that this technique proposes a flexible framework for embedding specific knowledge about complex networks that

79 CHAPTER 9. COMPARISON OF PROPOSED METHOD 80

is not as easily reproduced by other works. Furthermore, when such knowledge is embedded into solutions it is not as clear what their impact on the network is.

9.1 Incorporation of Expert Knowledge

When constructing a graph model manually, it is necessary to collect a great deal of knowledge about the network at hand. This knowledge arises from insights about complex networks in general and the data being explored. In Chapter 7, an abstract class was introduced that encapsulates knowledge about the complex networks that would be later experimented with. Some of the abstract class included general knowl- edge with respect to graph models and other parts were very specific to the types of graph models that would later be the subject of experimentation. Knowledge added about graph models in general included:

• Discrete time steps

• Specific placement of vertex selection

• Specific placement of edge creation/modification/removal

Knowledge specific to the models explored included:

• Vertex growth over time

• Restriction to edge creation only

• Allowance additional actions after edge creation

• Addition of age information (only employed in Ageing Preferential Attachment)

This thesis makes no claim that this is the only nor best way to generalize graph models for complex networks. It is simply an introduction to the concept and a demonstration of the application. The abstract form utilized is quite rigid and re- strictive to work with the models relevant to the experiments in Chapter 8. However, modification to the algorithm is simple and sometimes necessary. In Section 8.5, a modification of the algorithm was incorporated to handle age of a vertex. In Harrison’s [36], modifications were made to work with non-growth type algorithms as well as undirected networks. The idea of incorporating expert knowledge into solutions is informed a great deal by the previous automatic inference methods as proposed by Bailey [10]. His CHAPTER 9. COMPARISON OF PROPOSED METHOD 81

Algorithm 15: Recreation of Algorithm 3 from Bailey [10] begin SET GROW NODES; for each node n do CONNECT STUB PERSIST(FLOAT TO PROB(0.02), TRUE); end for end

Algorithm 16: BA-500 Function GenerateModel(int t)return graph G ← InitialiseGraph(); for i ← 1 to t do v ← V ertex(); W ← SelectV ertices(G); while |W | > 0 do w ← Next(W ); E ← E ∪ AddEdge(v, w); SecondaryActions(W, w); end while V ← V ∪ v; end for return G Function SelectVertices(Graph g)return vertices b ← GetInDegree(); c ← Constant(1); c ← Add(b, c); d ← GetRouletteQueue(g, c); return d override Function AddEdge(Vertex v1,Vertex v2 )return edge a ← CreateEdge(v1, v2); return a override Procedure SecondaryActions(Vertex v, Vertices V ) CHAPTER 9. COMPARISON OF PROPOSED METHOD 82

work utilized a priority queue structure which may have been responsible for creating the preferential attachment properties in experiments at replicating the Barabasi- Albert model. Bailey [10] remarks that a function CONNECT STUB PERSIST is responsible for determining the vertices selected from a priority queue and are re- sponsible for the preferential attachment. Bailey [10] also explains that a function SET GROW NODES is responsible for the growth process. In Algorithm 15, a recre- ation of one of the results from Bailey’s [10] thesis is provided. The algorithm shows a great deal of knowledge that is being incorporated into the system as the two instructions of the algorithm each do a great deal of work that is not a product of the evolutionary algorithm. Algorithm 16, a similar result from this thesis, shows the growth strategy (implemented a priori) and the preferential attach- ment (implemented as a roulette selection), also, simplifying the efforts of the GP. However, Algorithm 16 provides a more transparent understanding of the processes at work that is easily modifiable and controllable. In the parametrized models discussed in Chapter 4, a great deal of expert knowl- edge for the general understanding of complex networks is being employed as they were formed on from a very intimate understanding of complex networks in general. However, the techniques do not lend themselves to modification for knowledge of spe- cific complex networks. The models rely solely on the parameter tuning to determine the model for a specific complex network. The evolutionary strategy that uses sym- bolic regression [62] also employs a very specific algorithm that is dependent on the tuning of an equation.

9.2 Performance

Previous approaches such as those presented in Chapter 4 have been demonstrated to be successful a modelling both directed and undirected networks with the exception of Bailey [10] which has only been tested with the latter. Moreover, all have been demonstrated to work with real-world networks. Application in real-world networks is an important step for any modelling technique as it is the motivation of all graph modelling techniques. In this thesis the proposed method has only been tested with directed networks produced by well-known graph models. However, the methods of this thesis have already been applied to work successfully with undirected complex networks and real-world networks [36]. It cannot be said that this method outperforms or under- performs compared to any other method in terms of successfully generation of graph CHAPTER 9. COMPARISON OF PROPOSED METHOD 83

models. However, it can be said that this method has demonstrated that it works on a variety of different complex networks. A comparative study of approaches in modelling complex networks is a relevant future study. However, as an interim comparison, this thesis’ method proposes means of easily incorporating partial information about the structure and process of a com- plex network that is not readily incorporated into other approach. The abstract class representation provides an simple tool unlike any other approach where very little to a great deal of a priori knowledge can be encoded into solutions. This is a means that finding a solution is reduced to only what is unknown about the network. It is not impossible for other techniques to incorporate short-cuts equivalent to those achieved by the abstract class but the framework offered in this thesis provides an clear and easy means of doing so. Chapter 10

Conclusion

In this thesis, a new methodology for the inference of graph models for complex net- works was proposed. In doing so, the thesis also proposes a novel approach to GP, Object-Oriented GP (OOGP). The goal of the thesis was to demonstrate the pro- posed methodology that allowed expert knowledge of complex networks. Moreover, specific knowledge of a complex network being modelled could be incorporated into the construction of a fit graph model. In the experimentation via the replication of known graph models, it was demon- strated that the methods employed were able to produce fit graph models compared to the targeted well-known graph models. The selected best-evolve models had final fitness values that did not occur by random chance as shown by post-run comparisons to its target graph. It was, also, demonstrated that through validation that the best- evolved models were able to reproduce certain structural properties expected from an accurately replicated model. It was demonstrated, in the graph model replication experiments that trial runs routinely produced results comparable to those of the best-evolved models. This work also demonstrated that the final models from trials often had a similar fitness to the best models. While it was not confirmed with the same rigour employed with best- evolved models, it is likely that most of the final models would have also been shown to be accurately replicated models. An important goal of this work was to develop a methodology that offered flexibil- ity to the researcher to embed their expert knowledge into the graph model solutions produced by the system. The development of LinkableGP formed a framework that was conducive to incorporating knowledge. The methods employed by the system allows researchers to develop a structure for a graph model, provide it with a gen- eral algorithm, and restrict the search of LinkableGP in more expressive ways. An

84 CHAPTER 10. CONCLUSION 85 initial example of how LinkableGP accomplishes this was outlined in Chapter 7. The chapter introduces a generalized and abstracted form of a graph model for complex networks with growing degrees that when tested is shown to produce valid models. Further evidence of the power of LinkableGP’s representation of knowledge is demon- strated in the adaptations of the graph model abstraction for experiments with the ageing preferential attachment model. The model was selected as a demonstration of additional properties of a graph model. LinkableGP was demonstrated to be quickly and easily adapted to work with the new properties while still producing fit models. In previous works [10], ad-hoc systems were necessary such as heap structures and community detection algorithms. Under the traditional GP approaches employed, these served to convolute the graph model and made it hard to understand their impact clearly on the evolved algorithm. However, this methodology presents a means to define the structure and a partial definition of the graph model. Thus, the effects of additional algorithms are more readily assessable in evolved solutions. This does not mean that the methods of this thesis are superior nor is there evidence to suggest it is inferior. The discussion in Chapter 9, instead, suggests that they are different and that understanding the impact of expert knowledge is easier with the methods of this thesis. Moreover, it is arguable that the code produced by LinkableGP is much closer to that expected of a programmer. It has been remarked by reviewers of the publications produced from this work that the design of LinkableGP has a good focus on the generation of real code in line with the way programming is done. The code in solutions is able to be modularized and have a well-defined structure. Having such code means that when additional strategies are employed that do not rely completely on evolution, it is clear and easy to understand the role in the resultant model. While this thesis did not focus on applications of LinkableGP outside the context of graph model inference, LinkableGP does have application outside complex networks such as the evolution of data structures [60], and multi-behaviour artificial agents. This thesis and the publications derived from the development of LinkableGP serve as evidence that it addresses some shortcomings of other GP approaches. While other GP approaches can evolve multiple functions simultaneously and incorporate expert knowledge, LinkableGP provides a means of doing that shares the programming task with the human that is more accessible. LinkableGP, inspired by object-oriented programming, introduces the means to define more fluidly a program structure. In the opinion of this author, the approach, in contrast to other approaches, is more conducive to solving more complex problems that were tenable before by GP. It facilitates the incorporation of expert knowledge in a meaningful way and alleviates CHAPTER 10. CONCLUSION 86 the pressure from the GP to evolve parts of the solution already known. It is the expectation of this author that continued study will help to further prove the powers of LinkableGP. Finally, This thesis has also served as an introduction to the inference of graph models for direct complex networks. It introduced a modified set of fitness measures from previous works (see [10]) to accommodate the differences between directed and undirected networks. It also introduced a new set of functions necessary to construct directed networks. It was demonstrated that the proposed methodology was able to work with these changes and still produce models that were accurate to the expected behaviours of the measures of each evolved model.

10.1 Limitations

In the development of a novel GP system which was well suited for the evolution of graph models for complex networks, there is much that must be considered. While this thesis does attempt to address representation and structure of GP in its role for the evolution of graph models, it has placed little focus on rigorous evaluation of the evolved models. The thesis only manages to validate models in the context of the fitness measures used to construct the model during evolution. It is important to note that the four measures of fitness do not fully represent each model. For example, the Forest Fire model is known to produce models with community structures. In validating evolved models, no attempt was made to examine whether these communities were appropri- ately formed. The approach of fitness evaluation and validation does not allow the claim that an evolved model is an accurate replica of its target. Instead, it is only possible to say that the methods successfully replicated the properties measured of the models. Furthermore, this thesis does not explore real-world networks. Insufficient time was able to be dedicated to experimentation with real-world data and therefore no analysis of the methods could be conducted on real-world experiments. In this thesis, an abstract class that embedded some knowledge of graph models was proposed. This model relied on assumptions that the networks being modelled were growing networks that only grew one vertex at a time. In general, complex networks do not behave this way. While other abstract classes were experimented with, it was decided to use this model as a short-cut to evolution providing it with the most knowledge that was common between all targeted models. When modelling real- CHAPTER 10. CONCLUSION 87 world networks it is not likely that such knowledge could be assumed. For examples of other abstract classes please see [61] and [36].

10.2 Future Work

The results of this work have not been able to address a number of open problems in the inference of graph models for complex networks. Furthermore, it has also served to open more questions both in the graph model inference and the study of GP. Several times through this work, it may have been apparent to the reader that evolved graph models were only compared based on a limited set of fitness measures that covered some very general structural properties of complex networks. However, these measures are not known to be sufficient to model accurately a specific complex network let alone complex networks in general. For example, the Forest Fire model generates graphs that are known to contain communities. The methodology for mea- suring the fitness of graph models used in this work does not incorporate this. Thus, two questions can be asked

1. Are the fitness measures sufficient to evolve models that produce graphs that exhibit properties that are not measured during evolution?

2. Are there other fitness measures required to ensure a model is accurate to the complex network at hand?

Further study into the fitness measure sufficiency is required. Although, a study has already been completed [36] which identifies which are the most useful measures found using well known graph models, it is not clear if these are sufficient measures. Studies should investigate which fitness measures will most accurately capture the properties of complex networks in general or provide a framework for determining what fitness measures are necessary for a specific complex network. It is also necessary to expand efforts in the inference of graph models of complex networks to other types of networks. So far research has investigated unweighted networks both undirected and directed. Studies involving complex networks which are weighted, or where vertices have traits are important areas to investigate. In practical application for research, complex networks often are weighted and (especially social networks) have traits. An approach that is demonstrated to model these types of networks would prove a tool. With respect to GP, this work has an important contribution as it has introduced another technique in evolving programs. However, the scope of this work is limited CHAPTER 10. CONCLUSION 88 to evolving complex networks and, therefore, does not fully investigate the approach presented. Further study using LinkableGP should investigate its application in other problems such as evolving data structures or other complex algorithms. Exploration into the genetic operators should also be conducted which focuses on understanding the impact on operators in the translation from genotype to phenotype. The mapping of the genotype to the phenotype is sensitive to the genetic operators but to what extent is not clearly known. Bibliography

[1] Russ Abbott. Object-oriented genetic programming, an initial implementation. In International Conference on Machine Learning: Models, Technologies and Applications, pages 26–30, 2003.

[2] William Aiello, Fan Chung, and Linyuan Lu. A random graph model for massive graphs. In Proceedings of the thirty-second annual ACM symposium on Theory of computing, pages 171–180. Acm, 2000.

[3] William Aiello, Fan Chung, and Linyuan Lu. A random graph model for power law graphs. Experimental Mathematics, 10(1):53–66, 2001.

[4] William Aiello, Fan Chung, and Linyuan Lu. Random evolution in massive graphs. In Handbook of massive data sets, pages 97–122. Springer, 2002.

[5] Reka Albert and Albert-Laszlo Barabasi. Statistical mechanics of complex net- works. Reviews of Modern Physics, 74:47–97, 2002.

[6] R´eka Albert, Hawoong Jeong, and Albert-L´aszl´oBarab´asi.Internet: Diameter of the world-wide web. Nature, 401(6749):130–131, 1999.

[7] John C Almack. The influence of intelligence on the selection of associates. School and Society, 16:529–530, 1922.

[8] L. A. N. Amaral, A. Scala, M. Barthelemy, and H. E. Stanley. Classes of small- world networks. PNAS, 97:11149–11152, 2000.

[9] D.A. Augusto and H. J C Barbosa. Symbolic regression via genetic programming. In Neural Networks, 2000. Proceedings. Sixth Brazilian Symposium on, pages 173–178, 2000.

[10] Alexander Bailey. Automatic inference of graph models for complex networks with genetic programming. Master’s thesis, Brock University, 2013.

89 BIBLIOGRAPHY 90

[11] Alexander Bailey, Mario Ventresca, and Beatrice Ombuki-Berman. Automatic generation of graph models for complex networks by genetic programming. In Proceedings of the Fourteenth International Conference on Genetic and Evolu- tionary Computation Conference, GECCO ’12, pages 711–718, New York, NY, USA, 2012. ACM.

[12] Alexander Bailey, Beatrice Ombuki-Berman, and Mario Ventresca. Automatic inference of hierarchical graph models using genetic programming with an appli- cation to cortical networks. In Proceeding of the Fifteenth Annual Conference on Genetic and Evolutionary Computation Conference, GECCO ’13, pages 893–900, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-1963-8.

[13] Albert-L´aszl´oBarab´asiand R´eka Albert. Emergence of scaling in random net- works. Science, 286(5439):509–512, 1999.

[14] M. Barthelemy and L. A. N. Amaral. Small-world networks: Evidence for a crossover picture. Physical Review Letters, 82:3180–3183, 1999.

[15] P. S. Bearman, J Moody, and K Stivel. Chains of affection: The structure of adolescent romantic and sexual networks. American Journal of , 100: 44–91, 2004.

[16] Steven Bergen and BrianJ. Ross. Evolutionary art using summed multi-objective ranks. In Rick Riolo, Trent McConaghy, and Ekaterina Vladislavleva, editors, Genetic Programming Theory and Practice VIII, volume 8 of Genetic and Evo- lutionary Computation, pages 227–244. Springer New York, 2011. ISBN 978-1- 4419-7746-5.

[17] Stefano Boccaletti, Vito Latora, Yamir Moreno, Martin Chavez, and D-U Hwang. Complex networks: Structure and dynamics. Physics reports, 424(4):175–308, 2006.

[18] Walter Bohm and Andreas Geyer-Schulz. Exact uniform initialization for genetic programming. Foundations of Genetic Algorithms IV, pages 379–407, 1997.

[19] Markus Brameier and Wolfgang Banzhaf. Explicit control of diversity and ef- fective variation distance in linear genetic programming. In JamesA. Foster, Evelyne Lutton, Julian Miller, Conor Ryan, and Andrea Tettamanzi, editors, Genetic Programming, volume 2278 of Lecture Notes in Computer Science, pages 37–49. Springer Berlin Heidelberg, 2002. BIBLIOGRAPHY 91

[20] Markus F Brameier and Wolfgang Banzhaf. Linear genetic programming. Springer, 2007.

[21] Wilker Shane Bruce. The Application of Genetic Programming to the Automatic Generation of Object-Oriented Programs. PhD thesis, School of Computer and Information Sciences, Nova Southeastern University, Fort Lauderdale, Florida, USA, 1995.

[22] Robert Cairns, Hongling Xie, and Man-Chi Leung. The popularity of friendship and the neglect of social networks: Toward a new balance. New Directions for Child and Adolescent Development, 1998(81):25–53, 1998.

[23] Fan Chung, Linyuan Lu, T Gregory Dewey, and David J Galas. Duplication models for biological networks. Journal of computational biology, 10(5):677–687, 2003.

[24] Stanley Dagley, Donald Elliot Nicholson, et al. An introduction to metabolic pathways. Blackwell Scientific Publications, Ltd., Oxford, 1970.

[25] Derek de Solla Price. Networks of scientific papers. Science, 149:510–515, 1965.

[26] Reinhard Diestel. Graph Theory, volume 173 of Graduate Texts in Mathematics. Springer-Verlag, Heidelberg, third edition, 2005.

[27] S. N. Dorogovtsev and J. F. F. Mendes. Evolution of networks with aging of sites. Phys. Rev. E, 62:1842–1845, Aug 2000. doi: 10.1103/PhysRevE.62.1842.

[28] P Erdos and A Renyi. On random graphs i. Publ. Math. Debrecen, 6:290–297, 1959.

[29] Paul Erd˝osand Alfred Renyi. On the strength of connectedness of a random graph. Acta Mathematica Hungarica, 12(1):261–267, 1961.

[30] Ernesto Estrada. The structure of complex networks: theory and applications. Oxford University Press, 2011.

[31] Jinfei Fan and Franz Oppacher. Object oriented genetic programming, June 1 2009. US Patent App. 12/475,819.

[32] Andreas Geyer-Schulz and A Geyer-Schulz. Fuzzy rule-based expert systems and genetic machine learning. Physica-Verlag Heidelberg, 1997. BIBLIOGRAPHY 92

[33] E. N. Gilbert. Random graphs. The Annals of Mathematical Statistics, 30(4): 1141–1144, 12 1959. doi: 10.1214/aoms/1177706098.

[34] Michelle Girvan and Mark EJ Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12): 7821–7826, 2002.

[35] David Goldberg. Genetic algorithms: search and optimization algorithms, 1989.

[36] Kyle Robert Harrison. Network similarity measures and automatic construction of graph models using genetic programming. 2014.

[37] Bernardo A Huberman and Lada A Adamic. Internet: growth dynamics of the world-wide web. Nature, 401(6749):131–131, 1999.

[38] Norman P Hummon and Patrick Dereian. Connectivity in a citation network: The development of dna theory. Social Networks, 11(1):39–63, 1989.

[39] David R Hunter, Mark S Handcock, Carter T Butts, Steven M Goodreau, and Martina Morris. ergm: A package to fit, simulate and diagnose exponential- family models for networks. Journal of Statistical Software, 24(3):nihpa54860, 2008.

[40] Henry Kautz, Bart Selman, and Mehul Shah. Referral web: combining social networks and collaborative filtering. Communications of the ACM, 40(3):63–65, 1997.

[41] Gul Muhammad Khan, Julian Francis Miller, and David M. Halliday. Coevolu- tion of intelligent agents using cartesian genetic programming. In Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, GECCO ’07, pages 269–276, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-697-4.

[42] Andrej NikolaeviˇcKolmogorov. Foundations of the theory of probability. 1950.

[43] John R Koza. Genetic Programming: vol. 1, On the programming of computers by means of natural selection, volume 1. MIT press, 1992.

[44] John R Koza. Genetic programming II: automatic discovery of reusable programs. MIT press, 1994. BIBLIOGRAPHY 93

[45] John R. Koza, III Bennett, Forrest H., and Oscar Stiffelman. Genetic pro- gramming as a darwinian invention machine. In Riccardo Poli, Peter Nordin, William B. Langdon, and Terence C. Fogarty, editors, Genetic Programming, vol- ume 1598 of Lecture Notes in Computer Science, pages 93–108. Springer Berlin Heidelberg, 1999. ISBN 978-3-540-65899-3.

[46] Paul L Krapivsky and Sidney Redner. Organization of growing random networks. Physical Review E, 63(6):066123, 2001.

[47] Paul L Krapivsky, Sidney Redner, and Francois Leyvraz. Connectivity of growing random networks. Physical review letters, 85(21):4629, 2000.

[48] James Ladyman and James Lambert. What is a complex sys- tem? Technical report, University of Bristol, 2011. URL http://http://philsci-archive.pitt.edu/9044/4/LLWultimate.pdf.

[49] William B Langdon. Evolving data structures with genetic programming. In Pro- ceedings of the 6th International Conference on Genetic Algorithms (ICGA95), pages 295–302, 1995.

[50] William B Langdon and Wolfgang Banzhaf. Repeated sequences in linear genetic programming genomes. Complex Systems, 15(4 (c)):285–306, 2005.

[51] Jure Leskovec and Christos Faloutsos. Scalable modeling of real graphs using kronecker multiplication. In Proceedings of the 24th international conference on Machine learning, pages 497–504. ACM, 2007.

[52] Jure Leskovec and Julian J Mcauley. Learning to discover social circles in ego networks. In Advances in neural information processing systems, pages 539–547, 2012.

[53] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time: den- sification laws, shrinking diameters and possible explanations. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 177–187. ACM, 2005.

[54] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graph evolution: Densifi- cation and shrinking diameters. ACM Trans. Knowl. Discov. Data, 1(1), March 2007. ISSN 1556-4681. doi: 10.1145/1217299.1217301. BIBLIOGRAPHY 94

[55] Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. Kronecker graphs: An approach to modeling networks. The Journal of Machine Learning Research, 11:985–1042, 2010.

[56] Heike K Lotze, Marta Coll, and Jennifer A Dunne. Historical changes in ma- rine resources, food-web structure and ecosystem functioning in the adriatic sea, mediterranean. Ecosystems, 14(2):198–222, 2011.

[57] AR Mashaghi, Abolfazl Ramezanpour, and V Karimipour. Investigation of a protein complex network. The European Physical Journal B-Condensed Matter and Complex Systems, 41(1):113–121, 2004.

[58] Robert I. McKay, Nguyen Xuan Hoai, Peter Alexander Whigham, Yin Shan, and Michael O’Neill. Grammar-based genetic programming: a survey. Genetic Programming and Evolvable Machines, 11(3-4):365–396, 2010.

[59] Robert I Mckay, Nguyen Xuan Hoai, Peter Alexander Whigham, Yin Shan, and Michael O?Neill. Grammar-based genetic programming: a survey. Genetic Pro- gramming and Evolvable Machines, 11(3-4):365–396, 2010.

[60] Michael Richard Medland, Kyle Robert Harrison, and Beatrice Ombuki-Berman. Incorporating expert knowledge in object-oriented genetic programming. In Pro- ceedings of the 2014 conference companion on Genetic and evolutionary compu- tation companion, pages 145–146. ACM, 2014.

[61] Michael Richard Medland, Kyle Robert Harrison, and Beatrice Ombuki-Berman. Demonstrating the power of object-oriented genetic programming via the infer- ence of graph models for complex networks. In Nature and Biologically Inspired Computing (NaBIC), 2014 Sixth World Congress on, NaBIC ’14, pages 305–311. IEEE, 2014.

[62] Telmo Menezes and Camille Roth. Symbolic regression of generative network models. Scientific reports, 4, 2014.

[63] Stanley Milgram. The small world problem. Psychology today, 2(1):60–67, 1967.

[64] Julian F Miller and Peter Thomson. Cartesian genetic programming. In Genetic Programming, pages 121–132. Springer, 2000.

[65] David J Montana. Strongly typed genetic programming. Evolutionary computa- tion, 3(2):199–230, 1995. BIBLIOGRAPHY 95

[66] Jacob L Moreno and Helen Hall Jennings. Who shall survive? 1934.

[67] Amit A Nanavati, Siva Gurumurthy, Gautam Das, Dipanjan Chakraborty, Kous- tuv Dasgupta, Sougata Mukherjea, and Anupam Joshi. On the structural prop- erties of massive telecom call graphs: findings and implications. In Proceedings of the 15th ACM international conference on Information and knowledge man- agement, pages 435–444. ACM, 2006.

[68] M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45:167–256, 2003.

[69] M. E. J. Newman. Networks: An Introduction. Oxford University Press, 2010.

[70] Mark Newman, Albert-Laszlo Barabasi, and Duncan J. Watts. The Structure and Dynamics of Networks. Princeton University Press, 2006.

[71] Mihai Oltean and Crina Grosan. A comparison of several linear genetic program- ming techniques. Complex Systems, 14(4):285–314, 2003.

[72] Ali Pinar, C Seshadhri, and Tamara G Kolda. The similarity between stochastic kronecker and chung-lu graph models. CoRR, abs/1110.4925, 2011.

[73] Robin L Plackett. Karl pearson and the chi-squared test. International Statistical Review/Revue Internationale de Statistique, pages 59–72, 1983.

[74] Riccardo Poli, W William B Langdon, Nicholas F McPhee, and John R Koza. A field guide to genetic programming. Lulu. com, 2008.

[75] Stephen R Proulx, Daniel EL Promislow, and Patrick C Phillips. Network think- ing in ecology and evolution. Trends in Ecology & Evolution, 20(6):345–353, 2005.

[76] Patrick Reynolds and Amin Vahdat. Efficient peer-to-peer keyword searching. In Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware, pages 21–40. Springer-Verlag New York, Inc., 2003.

[77] Matei Ripeanu. Peer-to-peer architecture case study: Gnutella network. In Peer- to-Peer Computing, 2001. Proceedings. First International Conference on, pages 99–100. IEEE, 2001. BIBLIOGRAPHY 96

[78] Garry Robins, Pip Pattison, Yuval Kalish, and Dean Lusher. An introduction to exponential random graph (p∗) models for social networks. Social networks, 29(2):173–191, 2007.

[79] Garry Robins, Tom Snijders, Peng Wang, Mark Handcock, and Philippa Patti- son. Recent developments in exponential random graph (p∗) models for social networks. Social networks, 29(2):192–215, 2007.

[80] Kumara Sastry and DavidE. Goldberg. Probabilistic model building and com- petent genetic programming. In Rick Riolo and Bill Worzel, editors, Genetic Programming Theory and Practice, volume 6 of Genetic Programming Series, pages 205–220. Springer US, 2003.

[81] Thomas C Schelling. Dynamic models of segregation? Journal of mathematical sociology, 1(2):143–186, 1971.

[82] Ray Solomonoff and . Connectivity of random nets. The bulletin of mathematical biophysics, 13(2):107–117, 1951.

[83] Terence Soule, James A Foster, et al. Code size and depth flows in genetic programming. Genetic Programming, pages 313–320, 1997.

[84] Olaf Sporns and Jonathan D Zwi. The small world of the cerebral cortex. Neu- roinformatics, 2(2):145–162, 2004.

[85] Jeffrey Travers and Stanley Milgram. An experimental study of the small world problem. Sociometry, 32(4):pp. 425–443, 1969.

[86] Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P. Gummadi. On the evolution of user interaction in facebook. In Proceedings of the 2nd ACM SIGCOMM Workshop on Social Networks (WOSN’09), August 2009.

[87] E.J. Vladislavleva, G.F. Smits, and D. den Hertog. Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. Evolutionary Computation, IEEE Transactions on, 13(2): 333–349, April 2009. ISSN 1089-778X.

[88] Andreas Wagner and David A Fell. The small world inside large metabolic net- works. Proceedings of the Royal Society of London. Series B: Biological Sciences, 268(1478):1803–1810, 2001. BIBLIOGRAPHY 97

[89] Stanley Wasserman. : Methods and applications, vol- ume 8. Cambridge university press, 1994.

[90] Duncan J Watts and Steven H Strogatz. Collective dynamics of ?small- world?networks. nature, 393(6684):440–442, 1998.

[91] Beth Wellman. The school child’s choice of companions. The Journal of Educa- tional Research, pages 126–132, 1926.

[92] Peter A Whigham et al. Grammatically-based genetic programming. In Pro- ceedings of the workshop on genetic programming: from theory to real-world ap- plications, volume 16, pages 33–41, 1995.

[93] Charles H Wolfgang. Solving discipline and classroom management problems. Wiley Global Education, 2008.

[94] Man Leung Wong and Kwong Sak Leung. Applying logic grammars to induce sub-functions in genetic programming. In Evolutionary Computation, 1995., IEEE International Conference on, volume 2, pages 737–740. IEEE, 1995. Appendix A

Select Evolved Graph Models

The algorithms presented here are selected from the experiments in Chapter 8. They are implementations of Algorithms 11, or 14 (APA only). These models were chosen as they were characteristic of the best evolved models for their targeted model.

98 APPENDIX A. SELECT EVOLVED GRAPH MODELS 99

Algorithm 17: GR-500 Function GenerateModel(int t)return graph G ← InitialiseGraph(); for i ← 1 to t do v ← V ertex(); W ← SelectV ertices(G); while |W | > 0 do w ← Next(W ); E ← E ∪ AddEdge(v, w); SecondaryActions(W, w); end while V ← V ∪ v; end for return G override Function SelectVertices(Graph g)return vertices a ← GeometricV alue(0.402696); b ← GetRandomQueue(g); c ← GetLocalT ransitivity(); return b override Function AddEdge(Vertex v1,Vertex v2 )return edge a ← CreateEdge(v1, v2); return a override Procedure SecondaryActions(Vertex v, Vertices V ) a ← GeometricV alue(0.886134); a ← RandomV alue(a, a); a ← BernoulliV alue(0.93379); APPENDIX A. SELECT EVOLVED GRAPH MODELS 100

Algorithm 18: BA-500 Function GenerateModel(int t)return graph G ← InitialiseGraph(); for i ← 1 to t do v ← V ertex(); W ← SelectV ertices(G); while |W | > 0 do w ← Next(W ); E ← E ∪ AddEdge(v, w); SecondaryActions(W, w); end while V ← V ∪ v; end for return G Function SelectVertices(Graph g)return vertices a ← BernoulliV alue(0.590228); a ← RandomV alue(10, 2); b ← GetInDegree(); c ← Constant(1); c ← Add(b, c); d ← GetRouletteQueue(g, c); return d override Function AddEdge(Vertex v1,Vertex v2 )return edge a ← CreateEdge(v1, v2); return a override Procedure SecondaryActions(Vertex v, Vertices V ) a ← BernoulliV alue(0.350639); APPENDIX A. SELECT EVOLVED GRAPH MODELS 101

Algorithm 19: FF-500 Function GenerateModel(int t)return graph G ← InitialiseGraph(); for i ← 1 to t do v ← V ertex(); W ← SelectV ertices(G); while |W | > 0 do w ← Next(W ); E ← E ∪ AddEdge(v, w); SecondaryActions(W, w); end while V ← V ∪ v; end for return G override Function SelectVertices(Graph g)return vertices a ← BernoulliV alue(0.02819); a ← GeometricV alue(0.941932); a ← BernoulliV alue(0.347325); b ← GetRandomStack(g); c ← GetRandomQueue(g); return c override Function AddEdge(Vertex v1,Vertex v2 )return edge a ← CreateEdge(v1, v2); return a override Procedure SecondaryActions(Vertex v, Vertices V ) a ← GeometricV alue(0.63182); a ← GeometricV alue(0.681023); AddSuccessor(V, a, v); AddP redecessors(V, b, v); APPENDIX A. SELECT EVOLVED GRAPH MODELS 102

Algorithm 20: APA-1000 Function GenerateModel(int t)return graph G ← InitialiseGraph(); for i ← 1 to t do v ← V ertex(); SetBirth(v, i); SetT ime(g, i); W ← SelectV ertices(G); while |W | > 0 do w ← Next(W ); E ← E ∪ AddEdge(v, w); SecondaryActions(W, w); end while V ← V ∪ v; end for return G override Function SelectVertices(Graph g)return vertices a ← GetAge(); b ← GetInDegree(); c ← Constant(); d ← Add(b, c); e ← GeometricV alue(0.347125); b ← Div(d, b); e ← GetRoulleteQueue(g, c); return e override Function AddEdge(Vertex v1,Vertex v2 )return edge a ← CreateEdge(v1, v2); return a override Procedure SecondaryActions(Vertex v, Vertices V ) a ← GeometricV alue(0.157924); a ← RandomV alue(a, 1); b ← GeometricV alue(0.785169); Appendix B

Graph Visualizations of Evolved Model vs. Real Model by Number of Vertices

103 APPENDIX B. GRAPH VISUALIZATIONS 104

28 89

24 68 73 77 76 33 21 20 70 59 52 82 43 88 2 57 18 29 40 55 30 58 75 94 19 69 54 47 1 92

23 22 100 81 63 87 17 64 3 91 84 5 83 10 4 11 74 25 85 90 6 56 12 32 42 45 7 86 31 16 15 50 14 8 96 93 61 97 60 79 27 34 98 36 67 99 26 37 95 9 35 44 41 13 71 49 48 62 46 53 38 72 65

39 66

80 78 51

Figure B.1: Growing Random Model with 100 Vertices

92 44 61 50 83 100 62 55 77 32 33 14 43 52 36 76 66 86 73 10 95 40 72 93 15 68 63 67 88 75

12 97 98 74 8 51 31 11 99 87 69 16 35 20 6 82 29 59 37 42 4 84 27 22 91 94 24 21 34 19 96 85 5 81

70 58 71 48 2 28 39 89 80 79 41 18 7 3 9 47

23 17 60 38 26 53 1 57 54

46 64 65 78 25 90 13 56 49 30 45

Figure B.2: Evolved Growing Random Model with 100 Vertices APPENDIX B. GRAPH VISUALIZATIONS 105

228 218 162 203 245 242 72 178 237 67 121 93 57 250 204 103 129 173 116 84 190 44 246 183 215 161 63 223 179 226 197 151 42 159 79 119 83 33 138 148 50 150 196 182 69 145 30 16 127 249 77 118 240 225 154 236 49 142 65 132 221 25 143 235 209 91 86 20 195 87 202 11 140 175 135 64 193 144 41 35 112 174 214 146 68 130 36 78 80 176 109 22 141 40 46 208 51 59 34 100 27 2 55 76 238 47 156 165 60 128 244 62 1 206 115 82 66 24 229 117 31 71 3 114 37 15 75 233 167 12 222 89 152 9 177 123 6 239 38 120 213 191 74 107 137 199 126 125 23 220 227 17 153 13 4 194 181 45 105 32 134 19 56 166 70 185 248 155 101 169 88 8 241 133 102 61 97 124 90 219 110 53 188 5 113 94 14 108 95 200 10 85 7 106 111 210 205 201 171 180 39 230 96 224 139 164 192 43 98 21 18 232 160 234 131 231 26 157 147 28 168 212 122 54 29 73 58 52 216 243 184 104 92 211 170 207 48 136 187 99 198 186 149 247 158 163 81 217

189 172

Figure B.3: Growing Random Model with 250 Vertices

247 203 182 196 201 186 148 179 140 87 110 130 169 51 162 19 145 106 160 217 139 206 66 222 21 161 240 194 15 69 233 114 134 218 47 219 168 192 227 213 116 187 209 166 45 126 234 14 215 99 165 163 199 78 236 88 197 68 207 164 229 12 52 184 82 67 150 146 173 156 60 237 18 143 154 8 132 85 138 158 76 125 111 120 121 65 77 5 70 180 211 55 25 37 239 86 46 20 170 31 249 83 223 119 122 3 22 75 246 144 39 10 7 235 33 93 34 102 221 64 226 2 44 30 97 53 104 72 103 1 56 191 101 231 210 4 50 141 228 71 238 181 36 153 11 124 176 57 225 198 38 6 123 200 142 98 248 9 174 183 92 61 172 62 89 133 100 115 48 43 49 109 136 242 208 175 63 149 94 152 189 16 32 27 244 129 159 167 214 13 84 24 59 205 193 35 204 151 185 178 17 26 113 54 216 42 118 29 177 128 241 23 155 81 220 171 28 96 58 73 224 74 95 79 107 137 40 202 212 91 190 117 105 80 41 232 147 108 188 131 157 112 230 90 245 195 127 243 135 250

Figure B.4: Evolved Growing Random Model with 250 Vertices APPENDIX B. GRAPH VISUALIZATIONS 106

360 363 217 154 393 77 117 264 392 478 175 75 388 316 79 121 255 379 274 378 302 399 489 366 330 164 318 61 483 498 402 71 297 210 243 407 183 262 438 23 19 228 431 108 362 342 99 469 235 433 152 173 96 421 428 62 216 417 52 432 314 389 411 146 485 203 132 409 83 240 324 413 13 165 49 145 241 337 367 268 308 332 426 336 474 32 196 327 376 25 138 58 90 368 116 73 37 377 309 14 197 127 185 114 471 345 391 260 280 206 329 56 188 141 343 184 15 172 287 475 344 70 60 278 186 375 191 159 85 46 131 488 5 304 205 189 120 296 415 449 129 476 69 386 150 124 64 31 84 78 170 271 382 312 390 215 81 171 233 242 247 106 473 289 211 30 8 161 202 28 261 252 142 479 136 88 444 403 144 39 459 10 134 250 119 109 20 86 372 346 67 12 9 40 103 187 496 4 385 325 115 358 227 246 143 168 57 24 335 74 91 429 151 133 486 232 446 282 315 394 313 320 371 239 207 466 195 277 201 425 455 63 93 349 208 434 477 334 406 66 180 423 397 294 3 2 398 174 416 272 248 440 192 341 55 11 50 160 285 283 331 350 167 458 169 356 26 464 157 156 451 226 494 293 200 306 149 273 347 47 333 270 53 68 182 348 408 17 481 300 199 321 305 166 467 472 213 499 317 340 43 1 257 439 153 222 420 44 97 269 45 487 424 418 92 310 36 422 125 427 373 435 249 112 281 307 193 29 419 492 181 105 453 140 328 155 237 6 118 410 102 323 139 158 22 338 364 16 465 298 18 436 412 198 254 405 437 95 311 220 147 94 238 177 82 135 130 214 123 223 463 319 288 442 290 111 176 137 448 218 100 351 190 51 225 204 72 404 396 352 7 359 38 395 179 430 299 87 41 275 35 178 326 42 500 267 265 266 286 454 462 291 460 353 301 229 231 401 484 128 236 27 251 259 76 482 54 34 110 162 59 234 456 230 284 101 107 447 468 48 470 365 276 256 163 445 104 295 224 126 21 212 258 361 113 245 98 65 263 219 400 89 221 244 80 495 292 380 122 357 209 443 354 369 490 148 33 303 480 370 461 457 322 253 384 452 450 355 339 194 381 493 387 374 383 414 279 497 491 441

Figure B.5: Growing Random Model with 500 Vertices

467 316 178 369 499 382 288 354 412 272 118 315 242 243 392 105 48 202 265 185 241 257 495 245 236 414 372 385 44 132 370 331 368 225 363 145 68 149 192 126 291 201 171 160 247 189 344 437 480 124 116 367 161 426 89 240 204 410 55 359 46 438 64 61 138 37 98 310 244 148 199 408 475 115 487 264 488 111 451 131 130 261 294 140 179 384 343 440 489 498 262 220 58 317 254 275 214 411 375 41 172 249 450 300 154 40 334 491 446 91 296 84 305 203 456 332 303 191 345 357 304 251 164 75 52 398 490 361 277 206 23 421 248 139 177 417 103 232 492 415 500 198 483 349 19 15 486 16 32 173 81 376 147 43 494 419 293 468 365 330 312 31 383 455 231 281 222 38 60 104 134 387 219 27 400 403 12 125 453 114 466 436 407 136 24 196 260 49 338 301 11 405 13 390 377 448 167 70 229 276 175 80 33 459 230 352 59 395 371 112 101 4 21 423 311 212 481 7 200 336 328 8 374 99 92 216 402 54 100 351 497 278 28 39 285 169 5 158 166 299 153 473 22 252 209 477 234 253 284 465 195 418 364 286 444 327 358 29 142 479 10 97 183 197 297 155 397 393 409 65 270 337 474 36 96 309 435 427 433 335 51 472 355 26 449 454 113 429 162 452 190 17 6 347 273 445 250 325 470 176 1 314 235 428 424 255 186 431 366 430 464 150 292 266 406 88 463 152 215 45 227 434 20 53 258 108 50 181 208 394 237 66 416 122 302 280 432 339 228 441 211 485 2 143 182 90 14 205 67 77 391 86 319 151 72 30 443 82 233 123 210 360 42 187 146 306 308 157 290 259 121 102 106 287 3 56 117 226 279 159 9 401 107 263 174 380 478 238 35 165 93 193 413 471 321 269 389 224 87 207 168 362 469 404 322 128 85 18 346 348 350 320 239 439 283 217 289 57 388 73 62 63 333 329 493 25 47 496 341 271 378 274 425 323 420 34 71 133 356 76 324 246 476 74 180 484 156 313 69 442 298 188 120 461 282 307 268 381 83 223 460 422 78 462 396 144 353 340 141 94 379 218 457 109 295 482 110 79 135 119 318 221 95 129 386 267 127 170 163 213 184 447 373 342 194 458 326 137 256 399

Figure B.6: Evolved Growing Random Model with 500 Vertices APPENDIX B. GRAPH VISUALIZATIONS 107

903 793 913 832 758 947 649 406 587 943 517 653 638 752 908 531 329 310 581 341 601 526 560 681 306 932 955 779 919 844 363 340 141 729 248 963 394 989 487 906 469 157 173 133 555 602 209 724 538 379 484 905 467 639 342 377435 687 312 568 207 300 540 269 718 631 972 161 694 775 664 130 500 645 138 813 656 571 322 204 426 288 367 553 941 937 180 980 317 725 827 890 938 633 462 432 798 911 355 665 935 740 76 498 245 883 993 780 850 549 561 44 164 781 516 424 979 619 864 344 575 518 701 751 145 967 736 599 732 706 486 493 440 386 796 966 590 402 944 525 197 754 453 617 565 672 629 242 552 720 562 225 390 626 928 474 953 652 539 745 307 313 291 651 427 609 528 361 845 199 308 879 503 880 821 717 129 107 455 515 536 739 236 25723 48 196 512 449 298 715 892 519 583 962 446 481 372 524 350 501 425 179 826 456 594 106 761 623 648 735 292 837 711 925 840 814 299 567 504 123 423 952 596 950 818 103 642 114 643 75 634 55 470 373 772 66 922 673 916 385579 336 134 227 479 282 358 719 659 36 823 728 247 420 647 600 885 70 97 297 119 302 77 215 856 523 683 774 124 472 863 53 804 904 691 816 326 748 357 939 33 99 608 189 663 800 747 249 51 616 68 270 403 475 74 170 854 210 278 430 542 482 374 621 940 693 452 842 930 255 45 961 499 149 376 666 375 218 463 999 942 417 241 3 510 968 110 615 778 178 228 763 280 415 98 151 620 201 946 160 505 734 803 959 923 977 333 572 159 35782 768 370 191 855 140 52 112 985 795 193 94 429 69 576 121 744 96 483 226 707 216 459 57 662 570 886 860 727 829 2 901 382 881 92 198 169 641 704 101 8 12 183 765 392 769 431 274 488 378 240 464 221 705 192 825 613 398 150 4 271 79 998 731 439 995 688 84 328 261 43 205 514 578 806 186 349 171 125 289 521 260 921 32 47 491 593 490 927 146 830 986 214 760 841 598 533 1 485 920 788 982 545 413 208 566 563 824 945 877 843 551 95 54 889 252 202 166 223 534 167 343 990 852 389 64 450 5 513 976 805 158 301 657 396 203 224 267 422 676 438 136 416 685 277 875 250 187 246 25 285 530 914 324 71 868 969 369 675 833 135 604 100 988339 809 628 447 709 200 677 828 443 24 1660 635 667 726 915 34 759 697 176 794 353 163 17 371 550 26 848 984 360 502 359 397 132 742 954 230 548 305 421 7 102 206 22 689 238 695 543 815 296 137 156 20 696 887 834 787 40 91 131 799 535 6 529 671 574 810 964 603 126 162 917 65 27 949 387 83 776 770 347 155 287 283 632 80 14 29 63 441 334 750 527 38 960 862 637 352 153 279 364 154 11 458 777 148 233 56 49 9 614 738 477 716 237 263 177 120 256 327 410 39 564 88 253 466 165 820 899 674 679 573 309 82 105 78 784 414 874 122 72 118 46 293 478 362 90 910 882 876 884 19 790 682 10 259 348 703 660 258 311 234 509 625 757 847 81636 381 85 273 351 730 741 584 61 338 817 764 451 211 244 692 434 987 356 290 558 58 142 801 433 104 974 465 480 174 811 115 31 872 547 86 37 275 15 281 658 893 325 866 802 495 859 537 611 319 983 497 266 669 749448 21 399 870 184 405 789 454 476 62 185 354 661 262 698 934 588 143 272 766 213 212 365 822 346 773 624 610 554 557 276 418 958 589 494 991 175 541 303 873 507 606 28 41 73 419 195 896 970 721 690 678 408 586 835 50 294 222 933 380 812 506 867 93902 644 722 194 900 897 172 585 13 797 948 708 314 556 457 710 971 144 973 30 366 622 975 851 320 473 18 243 926 743 111 442 254 857 684 409 996 286 712 304 147 295 577 569 654 127 733 401 929 315 591 113 865 331 67 407 680 858 714 655 792 384 894 650 786 108 232 898 783 316 109 994 508 869 753 992 436 168 978 117 393 116 489 428546 997 496 912 220 891 839 853 59 182 42 152 849 268 284 532 646 468 411 846 321 605 511 807 723 831 251 936 755 445 592 188 640 404 836 89 383 323 471 239 861 231 332 702 878 918 767 219 597 217 957 713 345 128 746 522 460 190 686 181 139 668 544 1000 931 87 229 492 265 437 895 318 264 981 395 335 391 670 956 771 368 965 612 607 951 924 412 559 330 700 400 595 461 235 791 907 627 520 838 909 737 618 819582 699 444 762 337 630 756 785 808 871 888 388 580

Figure B.7: Growing Random Model with 1000 Vertices

841 505 693 824 934 756 938 180 401 412 160 719 170 713 879 441 958 730 902 688 453 152 110 634 758 704 728 992 161 821 801 668 828 428 908 627 599 328 87 557 321 377 763 892 866 875 184 633 678 467 641 777 419 166 484 191 463 158 624 812 568 99 784 600 942 318 408 421 523 296 308 116 429 864 138 682 530 215 305 33 466 772 171 622 846 476 491 742 455 585 265 444 991 521 680 532 744 592 125 334 226 712 319 865 961 988 804 349 172 309 715 848 858 307 407 669 502 859 621392 496 480 540 721 838 416 969 550 112 876 84 353 167 311 986 494 576 868 814 252 612 15 531 518 446 843 313 914 982 700 338 727 393 826 536 284 150 279 629 510 817 193 631 271 418 464 870 427 773 388 317 241 154 723 295 660 968 541 447 68 916 867 741 231 234 276 136 768 331 499 72108 635 582 890363 703 365 364 289 256 114 96 819 795 406 100 755 946 222 109 46 92 776 657 385 273 587 473 894 188 237 528 22 437 498 605 989 835 524 901 54 359 577 857 11 810 906 545 367 760 98 888 574 350 449 59 155 44 48 67 362 337 895 534 107 861 952 586 58 212 493 856 497 794 159 471 250 706 495 205 662 132 570 511 816 553 4134 506 636 249 124 648 674 602103 248 733 410 28 492 840900 964 383 975 347 880 230 563 312 785 280 995 653 597 580 177 601 90 70 76 183 939 378 501 987 29 689 294 551 268 764 461 199 16 780 549 37 940 738 655 207 120 53 4 45 552 874 38 603 646 78 55 860 579 178 457 194 783 517 254 639 647 663 717 696 578 731 885 322 148 360 649 994 26 834 520 830 626 656 431 380 264 299 60 546 281 734 702 931 749 575 21 283 878 89 726 753 73 236 707 560 452 555 774 440 985 165 948 508 729 443 676 687 18 718 584 12 933 390847 686 352 642 608 77 945 459 775 877 832 694 799 263 9 882 519614113 397 911 14 724 105 221 61 825 787 610 187 516 47 69 500 507 829 527 348 340 185 306 736 571 49 389 316 197 361 836 883 235 451 487 538 943 387 259 102 897 409 379 162 75 504 86 565 792 175 979 770 203 57 366 104 36 849 845 51 617 436 186 369 143 39 515 513 926 438 417 767 561 97 291 119 927 454 333 396 583 304 803 899 2 1 746 904 971 211 370 27 40 135 709 529 909 771 886 238 450 195 681 228 695 798 8 481 101 915 83 811 739 950 844 298 71 474 232 891 153 818 118 974 402 650 595 214 6 343 342 391 269 482 559 209593 842 439 465 115 24 323173 935 35 562 990 833 423 813 924 302 403 661 606 854 781 430 7 905 246 917 791 851 862 539 146 326 117 782 930 64 356 169 590 19 458 468 240 242 766 790 748 292 456 788 190 126 554 133 5 141 893 997 217 433 618 91 925 336 604 140 685 42 270 625 959 93 996 488 839 245 415 665 400 192 667 434 548 198 156 13 66 3 244 827 765 589 963 189 157 147 822 759 278 174 95 121 164 182 937 567 912 907 10 424 573 472 200 735 800 373 509 525 287 30 671 303 202 320 196 683 23 477 422 475 807 262 808 267 144 645 980 722 485 106 395 978 420 243 314 677 630 691 394 145 762 43 25 751 684 65 358 301 425 999 399 293 351 566 611 247 128 448 547 637 219 747 376 32 414 277 354 374 664 324 941 130 163 919 17 460 823 936 921 837 537 179 79 652 754 761 371 384 210 20 920 168 949 208 88 962 591 224 346 910 50 486 699 470 335 82 442 809 716 632 889 512 666 896 137 976 831 272 81 542 288 598 332 670 122 31 52 297 229 725 701 290 609 887 375 705 708 953 355 223 852 315 850 615 176 123 558 80 127 405 710 330 544 966 928 111 368 884 957 960 253 954 984 382 993 411 131 341 257 752 793 206 522 679 932 503 285 903 620 56 853 596 572 797 286 569 543 398 779 62 139 863 533 344 967 216 806 201 134 815 951 643 698 947 258 85 74 151 640 690 697 956 923 922 898 672 345 142 973 955 435 260 1000 372 998 63 526 129 251 607 872 233 855 227 613 740 386 204310 638 644 869 732 805 94 413 535 616 149 275 489 469 651 581 514 981 261 220 796 381 479 282 225 802 970 274 329 478 737 720 972 300 213 757 789 745 778 673 218 692 181 786 266 873 490 426 658 675 357 820 628 462 239 255 983 445 711 556 404 564 619 977 913 918 339 483 965 327 881 929 325 623 659 714 944 871 743 432 588 654 750 769 594

Figure B.8: Evolved Growing Random Model with 1000 Vertices APPENDIX B. GRAPH VISUALIZATIONS 108

57 95 52

50 69 54 31 9

41 56 24 7 87 83 60 99 37 55 25 67 51 44 5 36 3 34 19 21 94 72 13 39 6 84 76 30 90 97 18 1 96 80 74 14 43 98 17 8 16 63 79 68 86 61 10 91 38 48 88 23 70 64 89 32 2 93 49 22 47 59 20 53 42 15 100 46 85 33 92

4 35 11 82

73 28 62 71 26 75 27 29 12 65 45 66 78 40 81 58 77

Figure B.9: Barabasi-Albert Model with 100 Vertices

20 74 17

6 25 38 58 18 21 75 100 55 4 19 56 35 49 88 12

77 78 81 97 91 48 51 85 94 1 93 86 50 2 45 73 47 80 40 8 7 95 61

37 64 96 53 68 59 11 72 82 84 29 39 99 67 83 5 31 33 10 3 9 66 28 46 92 36 79 57 32 14 13 89 90 60 54 76 26 52 65

15 23 87 24 62 22 34 41 16 43 63 30 42 27 70 71 44 69 98

Figure B.10: Evolved Barabasi-Albert Model with 100 Vertices APPENDIX B. GRAPH VISUALIZATIONS 109

189 171 208 182 249 194 51 75 102 137 163 47 238 184 201 158 18 132 169 125 74 69 106 153 227 105 200 116 138 149 76 27 53 4 187 177 234 143 107 120 37 124 39 95 233 240 203 52 25 219 186 23 191 127 100 14 122 79 9 20 241 223 19 108 246 235 112 36 90 24 43 183 3 133 115 56 212 114 83 211 165 210 166 16 101 87 41 33 98 7 30 229 226 13 21 61 250 152 131 159 65 86 215 104 73 154 40 31 26 167 216 113 140 49 221 118 6 80 91 72 176 146 71 144 134 117 45 2 172 29 209 197 168 198 162 185 1 15 155 59 96 111 34 170 32 129 92 84 70 242 17 42 174 173 230 99 199 243 192 220 8 188 222 109 145 136 119 82 225 55 35 78 57 151 248 239 135 193 218 190 237 11 54 178 224 123 231 205 206 50 157 10 121 196 28 207 141 139 5 38 89 60 12 148 202 63 58 175 94 93 85 147 150 110 62 217 160 244 204 22 128 66 228 142 44 180 164 181 46 156 67 88 48 236 161 77 245 179 64 130 232 126 81 247 68 103 213 195 97 214

Figure B.11: Barabasi-Albert Model with 250 Vertices

236 168 219 215 92 72 159 52 248 180 167 33 237 235 185 104 207 166 165 192 212 110 17 128 96 145 142 65 29 204 193 111 205 21 211 119 151 61 120 249 221 171 67 32 133 82 54 250 18 247 73 138 90 134 84 198 178 187 98 195 10 139 130 190 246 12 224 182 109 6 94 122 196 86 93 64 78 140 24 20 131 50 157 174 175 126 188 200 230 209 242 22 186 31 234 216 35 23 137 154 103 7 87 149 164 203 101 172 47 51 5 70 191 118 199 63 117 9 97 48 223 3 75 150 176 8 233 238 129 208 44 222 13 66 197 76 42 41 231 141 59 55 80 83 91 108 62 220 189 99 169 26 19 114 213 147 71 194 102 68 183 57 125 116 11 37 46 156 179 105 36 228 14 30 225 40 232 15 60 135 239 27 79 2 85 58 241 214 184 34 53 77 107 113 112 162 95 143 155 74 89 49 160 226 25 69 163 148 173 146 245 202 123 38 56 43 127 106 210 4 206 1 227 217 45 115 170 201 144 161 181 136 158 16 100 240 28 218 121 39 243 132 177 153 81 229 244 124 152 88

Figure B.12: Evolved Barabasi-Albert Model with 250 Vertices APPENDIX B. GRAPH VISUALIZATIONS 110

433 405 319 308 414 189 102 476 400 457 322 383 336 51 55 219 134 250 233 214 118 76 419 460 409 207 489 464 310 23 155 183 46 230 159 437 305 40 407 237 333 390 285 442 160 104 461 38 466 60 274 374 42 101 200 249 397 396 141 410 70 324 190 163 137 139 18 311 194 120 150 29 136 389 107 470 339 65 147 58 192 370 253 37 79 16 380 463 301 264 182 395 74 213 335 487 115 343 6 286 294 48 492 431 312 317 330 450 430 327 119 338 97 35 126 83 328 353 177 133 142 494 354 270 228 14 398 167 19 392 295 164 351 7 403 198 469 221 399 281 309 394 53 406 499 363 362 465 500 481 413 385 88 148 61 375 178 100 334 320 244 359 468 77 103 376 306 372 31 387 62 496 123 303 95 240 161 33 10 265 212 445 449 204 441 475 278 50 86 422 307 366 329 404 1 111 299 140 444 316 245 325 218 217 248 453 203 28 467 170 261 415 332 94 275 5 323 9 27 480 439 314 454 381 300 452 291 256 263 491 340 129 169 337 302 486 176 132 30 64 420 98 296 497 215 416 114 168 371 352 138 26 15 32 12 246 259 172 201 80 152 284 216 440 91 22 56 184 355 427 424 344 158 81 227 462 269 162 66 495 254 458 379 209 243 382 293 241 17 304 110 24 2 205 361 482 71 472 297 471 229 195 279 73 282 428 436 224 360 127 455 276 179 234 435 434 171 459 39 242 255 347 262 47 93 149 292 117 211 326 174 175 34 321 210 288 485 54 106 206 411 197 498 280 247 165 402 393 72 92 146 187 49 258 298 289 474 451 421 52 287 277 236 144 367 3 488 82 44 238 290 432 223 75 267 423 166 388 78 130 348 157 342 63 483 151 235 418 105 456 220 232 43 4 425 67 356 266 386 8 378 191 447 202 222 116 113 257 125 484 124 122 226 346 315 357 96 13 479 341 84 89 20 36 365 154 345 193 41 391 446 11 331 181 128 85 57 173 443 180 145 153 384 109 368 59 283 25 87 350 239 112 69 473 208 251 21 45 364 90 188 199 196 377 318 68 99 426 121 135 131 108 156 271 358 429 143 369 438 477 185 273 186 260 225 268 412 252 272 231 401 349 313 373 408 478 448 490 493 417

Figure B.13: Barabasi-Albert Model with 500 Vertices

305 331 336 363 389 238 478 213 438 488 87 161 64 328 26 390 206 69 303 348 475 207 96 16 429 431 148 247 268 184 437 51 197 159 466 304 14 486 141 410 404 103 72 82 193 387 235 218 347 309 176 98 53 224 220 174 377 346 21 19 277 263 491 7 113 229 28 339 330 269 52 398 11 32 337 90 233 297 155 278 253 446 405 230 473 344 255 291 403 423 463 394 214 449 275 311 81 6 301 441 470 370 314 123 286 129 236 260 138 354 342 453 181 242 212 71 369 460 186 228 427 308 411 78 461 124 36 320 322 428 217 63 119 202 298 282 495 167 226 285 45 462 264 4 288 120 34 154 132 126 75 480 203 415 130 421 204 252 496 157 85 112 299 273 294 476 292 259 307 340 25 360 100 29 381 284 432 80 156 261 335 458 396 388 99 68 185 246 399 417 355 33 416 467 137 1 485 41 89 127 66 254 318 494 345 110 162 179 209 97 117 500 94 283 319 378 289 201 249 93 122 280 150 147 306 315 456 3 86 332 334 310 31 312 70 444 482 361 323 497 136 10 499 379 425 324 164 222 239 40 435 300 211 22 178 76 191 8 47 18 474 240 227 27 384 241 244 2 221 393 237 373 338 59 316 262 327 39 358 232 450 12 274 459 445 302 24 471 270 171 366 402 30 436 88 321 433 43 359 295 483 400 487 84 313 200 42 108 140 144 46 452 152 9 158 58 372 92 5 188 23 447 15 106 134 216 105 146 409 325 439 350 469 56 95 107 448 368 293 397 104 143 256 392 149 125 219 153 67 406 116 83 464 490 79 35 374 198 401 498 376 353 175 215 172 102 371 195 356 60 50 422 38 231 208 114 357 248 266 180 48 173 13 61 380 326 391 414 91 111 385 190 440 362 163 128 276 290 434 49 418 169 257 182 234 412 352 77 109 279 419 333 121 20 115 382 465 281 250 296 17 317 177 210 192 135 272 349 489 165 430 101 170 343 199 57 375 225 420 37 407 341 329 468 73 365 44 451 74 413 265 160 118 395 131 271 454 139 145 245 481 251 287 442 493 477 168 142 133 223 54 183 65 408 194 443 151 258 455 55 62 457 364 367 424 205 267 383 196 166 189 426 484 472 187 492 479 386 243 351

Figure B.14: Evolved Barabasi-Albert Model with 500 Vertices APPENDIX B. GRAPH VISUALIZATIONS 111

830 813 778 819 610 905 745 770 631 794 727 653 962 591 511 582 647 846 724 841 961 943 362 121 250 403 898 990 634 228 884 533 574 454 179 717 376 303 602 972 645 916 182 850 894 329 37 324 734 862 485 928 386 398 649 902 995 615 897 499 426 367 363 441 652 891 114 347 519 906 90 419 360 525 863 95 183 315 784 605 780 389 129 885 650 930 390 251 85 63 954 578 406 71 723 307 405 493 854 877 369 84 49 494 434 673 382 576 361 300 355 476 436 764 843 865 557 982 125 160 254 468 711 28780 693 763 568 852 869 655 886 546 786 352 61 138 128 51 402 910 797 412 372 881 6 507 524 495 864 190 380 989 453 272 677 814 613 878 73 953 697 722 714 100 545 542 853 592 925 64 465 792 635 439 492 53 977 482 999 235 246 289 172 893 923 284 467 946 335 825 768 560 427 408 810 944 740 62 874 537 274 858 373 314 263 596 549 249 554 670 368 180 935 787 154 484 987 965 744 868 985 917 632 150 703 364 882 565 570 420 194 938 547 142 883 19 218 929 200 151 350 399 452 210 424 449 343 659 701 17 124 171 240 258 676 528 135 311 503 479 781 278 353 586 247 241 514 423 808 339 540 198 585 738 388 564 205 140 41 340 719 689 120 109 471 534 506 295 50 967 595 848 968 950 949 146 624 789 475 617 931 374 89 526 199 575 170 450 502 704 133 831 375 607 232 26 920 15 937 35 651 345 733 757 13 333 658 598 253 785 550 998 708 288 381 573 748 338 1 435 59 836 597 779 48 472 555 501 266 298224 563 417 202 731 984 529 259 460 992 544 156 535 192 616 662 875 203 291 275 508 826 428 504 66 379 901 425 964 872 99 824 963 76 8 851 753 67 952 11 556 65 248 282 196 899 682 231 913 234 421 391 866 103 887 265 919552 147 816 442 145 348 70 829 618 325 822 666 39 769 608 687 242 750 690 795 752 683 22 932 201 14 593 239 357 42 716 326 691 207 395 378 60 12 483 742 97 695 648 456 480 25 448 735 942 117 892 322 216 34 859 351 55 283 699 144 921 93 760 319 112 276 671 604 660 24 21 159 970 667 148 27 132 193 783 271 481 759 1000 833 991 626 440 256 257 583 308 28 75 486 446 432 52 909 811 474 541 755 903 188 411 309 709 627 43 131 267 32 758 433 29 349 500 334 383 681 840 773849 444 225 516 926 665 83 800 936 337 637 589 323 236 517 994 68 820 459 621 181 206 518 455 941 158 791 611 342 413 33 644 798 78 861 805 214 976 900 720 176 466 141 698 584 88 818 579 675 127 305 646 638 969 908 2 686 81 302 487 187 725 149 44 96 512 4 728 988 973 7 107 306 857 551 761 496 293 46 888 960 802 245 956 510 669 86 955 558 489 134184 365 280 927 523 23 16 983 566 227 400 384 679 20 346 834 155 940 252 827 115 590 612 587 94 213 273 139 896 707 521 397 281 312 567 163 688 204 392 531 561 980 904 777 543 58 809 473 45 880 945 136 162 520 262 680 260 321 844 356 105 415 470 569 497 600 161 856 803 430 873 394 958 404 641 387 167 577 418 978 975 614 77 243 538 971 377 72 867 812 31 396 685 328 491 911 997 710 642 782 530 209 845 766 625 668 736 370 393 633 855 562 712 261 860 119 993 54 38 410 838 817 513 175 301 737 924 358 79 9 268 640 407 553 130 123 3 684 74 889 177 331 173 215 152 137 628 157 790 5 330 269 185 445 657 986 974 775 189 876 285 416 601 82 409 270 341 87 536 102 746 327 726 36 771 623 879 548 91 837 692 385 294 730 672 522 56 116 580 664 762 212 310 505 47 438 643 104 490 606 463 296 414 98 153 959 174 806 747 422 111 933 772 106 594 700 320 451 796 469 718 122 211 336 835 732 126 630 178 69 233 359 164 706 609 143 18 110 847 599 721 332 30 197 939 118 498 739 169 914 57 290 461 238 915 870 729 168 229 464 715 741 92 947 220 801 366 922 713 488 226 749 663 477 208 316 674 793 694 10 907 951 186 619 354 656 313 401 572 934 297 756 244 966 443 108 230 871 581 622 832 636 292 654 166 571 437 788 678 371 304 807 318 754 702 981 629 705 101 113 996 767 532 264 478 457 776 603 515 828 221 559 839 277 509 279 957 765 222 620 195 821 217 40 842 237 948 286 431 458 918 804 799 815 223 191 219 317 429 165 255 462 895 344 751 299 539 743 588 639 979 527 774 447 661 912 823 696 890

Figure B.15: Barabasi-Albert Model with 1000 Vertices

618 299 678 880 545 734 889 262 187 731 645 790 551 925 162 608 699 952 828 229 649 179 953 547 305 775 936 401 732 756 807 744 959 911 461 737 571 383 526 450 127 308 983 689 730 354 441 344 636 686 453 135 386 272 721 553 343 269 112 415 351 972 975 414 870 681 321 302 735 792 173 174 672 784 903 688 476 632 220 413 691 635 248 265 516 829 470 381 23 196 621 627 745 945 279 451 349 152 822 520 107 245 161 988 781 254 823 766 859 981 674 694 311718 525 996 446 307 314 310 287 111 539 517 397 97 816 601 337 761 514 607 247 676 801 363 604 606 266 656 47 836 644 235 758 268 722 597 508 213 197 298 824 116 854 658 5 340 747 63 435 569 253 630 256 25 191 924 971 185 282 930 120 990 719 372 782 457 473 212 777 480 533 87 968 257 584 797 6 548 746 478 190 577 560 984 7 259 692 564 238 358 595 641 236 374832 942 620 938 416 536 121 921 423 81 866 138 806 48 460 944 205 62 324 411 825 33 59 49 328 159 424 843 217 964 772 442 177 201 982 865 379 382 765 316 899 131 740 170407 589 195 709 284 639 954 69 206670 583 684 590 61 625 447 341 939 462 980 18 319 654 15 281 325 427 31 126 874 16 79 755243 557 838 58202 979 373 591 336 713 449 391 158 528 339 164 119 274 237 951 301 113 562 458 651 937 94 637 685 943 987 647 168 946 66 711 998 724 165 955 426 977 28 923 967 244 38 261 568 250 710 978 1 794 327 680 439 610 596 233 492 852 738 576 927 335 660 148 313 37 129 85 579 833 653 41 117 93 280 535 71 580 214 9 544 895 518 273 359 893 503 433 817 130 506 410 498 542 890 885 762 200 139 725 671 147 421 586 687 820 485 707 182 330 445 813 904 40 695 682 329 614 469 812 928 648 840 969 419 795 72 29 778 774 486 474 432 541 679 136 315 628 258 646 510 760 615 215 55 288 459 12 436 361 27 464 230 431 512 30 489 570 452 153 872 726 640 617 956 392 490 638 8 60 150 463 192 362 396 495 35 814 331 252 105 527 109 54 471804 993 675 558 947 629 613 53 348 289 856 999 180 246 101 36 11 626 622 500 908 377 98 347 171 585 67 357 86 770 877 850 4 368 898 137 948 561 332 395 345 46 932 122 499 68 467 587 2 128 796 207 759 360 555 616 145 403 176 995 892 175 34 443 184 356 677 56 554 504 529 934 296 134 255 380 51 633 306 697 642 146 986 846 704 74 513 723 286 712 231 22 342 958 95 563 619 855 786 19 573 367 172 902 203 456 271 773 715 497 837 851 960 483 102 370 216 505 167 847 132 808 293 961 118 108 733 600 251 805 475 353 574 295 106 992 472 73 52 963 88 417 285 32 566 178 440 751 141 785 912 612 728 662 652 90 394 422 708 169 703 537 133 103 549 696 543 821 609 523 455 465 309 594 425 65 659 700385 831 209 154 501 827 802 10 264 881 949 914603 448 208 906 75 863 572 862 77 183 779 830 283 189 905 933 494 783 534 114 538 798 352 588887 683 226 242 916 922 312 754 156 690 915 913 665 166 976 521 110 104 864 909 228 926 970 267 323 546 669 142 565 769 210 567 920 974 333 896 752 263 842 901 931 835 477 867 364 241556 729 193 186 668 399 997 402 375 882 3 78 468 366 249 365 592 989 736 13 706 605 763 598 524 326 198 355 941 800 222 412 466 438 390 714 181 509 224 739 826 663 338 965 875 388 650 232 21 270 100 17 623 935 799 20 234 891 221 753 962 858 869 631 123 194 160 593 405 973 45 581 115 793 522 811 278 575 780 991 871 530 428 70 26 227 853 82 50 42 80 14 886 124 488 91 294 334 322 369 661 841 860 918 346 204 211 791 834 742 819 552 57 143 144 741 771 702 479 950 515 879 84 24 720 910 493 634 408 693 99 810 297 868 907 815 43 292 454 919 750 884 188 917 151 89 384 624 39 655 994 64 848 929 502 44 400 878 757 611 839 657 698 300 789 883 559 701 406 876 487 664 496 218748 894 491 511532 966 240 788 404 275 716 409 223 393 437 398 717 83 845 888 531 76 350 317 673 260 318 291 749 163 743 96 481 578 1000 434 540 149 140 277 420 92 157 378 304 429 125 155 371 666 225 667 303 376 389 818 482 727 430 602 239 849 809 387 857 985 199 900 767 276 873 219 897 599 940 507 290 787 582 768 550 643 418 320 957 444 484 705 861 803 764 844 519 776

Figure B.16: Evolved Barabasi-Albert Model with 1000 Vertices APPENDIX B. GRAPH VISUALIZATIONS 112

95

55

33 45 83

40 96 50 47 29

46 31 22 14 49 61 71 90 57 85 27 65 13 36 17 78 98 6 18 80 32 9 43 86 10 91 62 1 41 53 2 88 16 70 51 37 59 25 74 30 38 69 72 7 24 3 73 12 34 58 93 84 44 97 63 8 99 67 54 39 79 21 4 20 92 100 26 56 19 5 52 81 15 76

28 75 77 48 68 23 11 42 66 60

35 64

82 87

89 94

Figure B.17: Forest Fire Model with 100 Vertices

48 20 9 25 30 57 89 90

23 92 2 24 83 1 41 84 67 80

69 86 95 33 62 39 78 56 32 28 37 71 74 16 43 3 4 79 98 13 10 76 29 31 12 5 49 6 7 53 11 50 14 59 55 73 75 54 46 88 40 8 15 17 34 47 21 96 42 26 51 19 93 77 36 72 97 99 18 70 63 38 87 27

52 66 94 85 35 91

22 45 44 64 68 58 81 61

100 60

65

82

Figure B.18: Evolved Forest Fire Model with 100 Vertices APPENDIX B. GRAPH VISUALIZATIONS 113

218 204

210 142 127 193 198 205 245 228

211 72 232 118 196 135 239 145 212 242 75 168 130 234 216 136 113 191 220 214 223 18 171 236 117 179 233 152 209 115 112 207 188 246 45 94 159 213 123 128 111 25 96 157 124 52 63 173 120 31 109 202 21 146 67 101 91 108 74 121 190 76 169 166 141 110 20 90 151 217 81 50 6 13 37 30 68 79 103 41 154 14 89 156 53 32 137 78 160 56 143 62 23 241 181 229 2 4 237 60 24 158 175 40 95 167 3 148 208 225 200 19 71 35 1 46 70 93 248 8 131 114 172 5 186 129 133 125 98 182 88 215 231 230 22 134 195 69 201 9 100 139 147 187 86 12 48 7 189 11 27 43 80 107 10 73 49 26 162 54 126 222 65 116 170 150 97 105 38 82 15 164 238 161 250 77 29 149 44 138 16 47 34 33 99 184 57 92 87 219 66 51 55 102 85 180 197 192 132 177 17 224 153 42 28 119 199 36 243 247 104 61 176 226 39 106 83 155 59 206 178 235 227 174 84 144 221 249 58 163 140 122 194 203 240 244 165 183

64

185

Figure B.19: Forest Fire Model with 250 Vertices

116

112 247

214 250 131 101 150 234 144 177 93 40 224 189 82 145 149 215 178 43 154 16 90 33 140 14 204 157 211 45 212 28 193 38 39 139 8 243 181 138 106 74 200 134 147 119 217 117 22 203 245 85 169 190 122 113 53 31 4 158 173 62 123 160 148 244 220 233 219 66 29 7 96 239 87 3 32 110 184 42 19 27 182 191 141 13 162 207 242 227 232 70 174 222 226 126 59 164 166 114 48 23 6 54 44 175 176 35 88 36 80 161 47 105 229 63 102 49 103 68 37 225 167 1 194 159 199 11 24 104 94 137 21 135 208 228 143 18 34 132 76 25 79 83 2 241 125 75 20 235 195 151 58 179 188 127 9 121 152 109 165 97 15 218 185 10 60 30 92 50 91 202 71 186 246 55 52 180 84 86 213 26 12 98 64 78 153 46 236 56 223 81 124 41 111 118 170 196 5 249 171 115 187 69 100 51 128 61 142 248 155 133 201 198 231 17 183 146 136 120 130 95 99 65 205 237 209 129 72

57 197 192 89 240 168 67 163 107 172 73 206 230 156 210 77 108 221 216 238

Figure B.20: Evolved Forest Fire Model with 250 Vertices APPENDIX B. GRAPH VISUALIZATIONS 114

471 244 318 472 96 124 474 364 426 200 225 275 309 78 229 209 418 46 463 446 237 50 299484452 478 454 453 410 469 286 161227 108 494228 432 250 87 269 88 36 357 271 497 122 344 145 249 319 415 491 162 456 277 276 289 134 183 297 157 274 470 291 417 180 230 396 220 382 489 133 263 280 226 350 419 19 347 208 495 247 223 467 393 239 205 203 486 462 391 260 115 158 287 441 76 404 187 144 201 182 62 348 164 120 68 111256 492 330 113 58 480 361 386 321 351 369 343 16 89 112 444 241 154 475 379 55 105 235 461 174 38193 377 342 311 292 267 240 160 425 114 40 370 42 22 43 455 232 171 45 31 51 477 366 121 9273 72 165 57 316459 389 93 141 294 12 146 258 47 254 460 217 202 473 308 184 21824 148 15 77 481 283 107 106 53 191 14 159 7221 358 325 139 212 173 138 128 26 204 434 5 215 28 261 189 376 285 79152 85 449 414 295 20 188 83 129 143 4 438 23 390 66 306 65 29 132 61 428 341 345488 6 349 231 34 39 482 443 64 33 30263 9 155 499 300 255 326 127 288 373236 30 56 54 136 371 211 123 90 168 310 436 37 25 363 17 304 266 313 440 338 176 175 59 323 82 10 198 279 400 2 3 378 458 207 137 94 380 468 395 192 60181 169 130156 196 103 398 116 307 99 1 52 110 41 355 210 412 298 401 126119 314 8 44 102 265 81 251 272 301 21 442 360 224 32 465 372 270 284 335 320 476 195 11 186 91 18 48 451 424 359327 86 278 185 118 346 328 95 383 117 75 172 375 435 437 367 466 290 97 206 71 80 429 332 199 340 49 27 384 421 329 411 147 245 362 356 315 409 104 246 234 405 324243 381 439 177 219 101 248 305 153 420 13 296 406 416 433 322 448 394 135 163 150 397 337 392 131 282 197 125 334 500 216 69 293 257 339 167 98 485 70 100 365 385 402 142 413 253 262 487 336 35 427 374 333 109 387 74 151 170 490 214 268 457 464 222 423 422 483 281 213 273 190 264 493 140 408 498 399 67 403 353 166 445 317 194 149 450 447 233 354 331 179 84 312 407 352 238 178 479 431 303 259 242 252 388 430 368 496

Figure B.21: Forest Fire Model with 500 Vertices

397

370 494 476 332 278 369 359 288 460 300

386 294 309 393 235 361 231 226 435 194 206 210 140 82 459 403 161 254 421 191 350 321 467 76 178 389 443 422 463 222 126 305 437 356 151 276 111 283 475 155 121 380 234 396 270 438 462 284 47 75 376 394 464 218 490 196 479 484 84 198 470 338 401 483 211 19 465 408 363 306 122 385 130 366 239 351 138 232 358 71 148 108 388 377 53 299 290 469 439 39 195 73 56 334 281 35 152 220 319 57 364 315 34 378 333 480 468 17 454 474 176 347 445 183 158 49 360 251 61 173 269 263 485 343 429 181 143 147 488 94 83 95 415 352 444 320 60 28 38 185 69 262 409 144 171 496 96 100 66 170224345 329 451 335 434 292 391 45 267 55 259 146 255 387 291 6 3 418 44 32 317 314 340 163 430 373 98 217 179 86 348 160 228 88 450 405 279 219 189131 261 80 153 65 64 113 237 275 216 104 367 102 31 110 10311 25621 12 233 446 193 310 15 341 265 36 166 26 42167 497 411 417 41 478 133 180 289 328 344 134 236 249 302 4 54 458 466 240 156 101 461 1 8 46 241 188 124 357 70 7 30 209 62 5 456 187 192 346 25 268 16 23 159 150 112 477 117 297 442 18 428 14 9 207 27 2467 78 280 2 482 11 250 89 381 172 326 416 22 238 127 132 390 296 50 59 457 175 13 115 40 312 500 248 52 368 432 308 204 374 118 142 51 286 215 139 327 29 298 37 141 120 420 336 74 455 304 331 413 106 164392 149 384 116 440 242 154 264 303 402 85 449 79 295 372 427 77 109 68 253 481 168 20 169 128 199 410 43 285 452 243 182 365 63 287 258 318 33 48 223 313 145 339 260 165 419 190 247 441 81 137 272 433 407 213 277 90 282 379 423 412 355 221 212 87 129 307 97 293 119 257 424 337 92 201 72 486 493 271 472 135 58 244 330 382 197 266 398 200 229 107 274 323 177 436 491 301 353 202 162 99 105 406 103 499 205 431 425 399 414 375 114 354 362 395 93 184 225 91 157 246 492 214 203 227 325 498 448 324 400 252 453 230 404 136 174 471 426 473 316 123 487 245 349 322 383 371 125

495 489 342

186

447 208 273

Figure B.22: Evolved Forest Fire Model with 500 Vertices APPENDIX B. GRAPH VISUALIZATIONS 115

818 600 562 676 569 191 540 368481 205 913 332 533 323 437 974 181 846 966 164 961 763 999 629 803 968 560741 513 902 795 851 724 561 668 286 901 130 778 613 355443 376 625 916 127 885 554 550 791 994 509678 950 444 501 435 711 620 728 232251 298 743 660 898 702 195970 591 289 577 567881 530 653 661 665 483 525 488 896 425 307 467 476 597 893829 499 344 406 293 836 468 908 311 528 243 757 865 604 84 371 210 930 452 466 991 787 159 988 935 234 60 630 523 726 703 295 720 945 874 875 80 830 809 636 725 884 257 287 603732645 802 948 917 534 696 280 194 864514 391 445 794 578 641 131 993 161 654 980 542 810 587 43 172 267 303 291 235 967 722272 333 111 329 969 200117 461 822 34 698 873 746 188 506 674 75 266 883 214 292 785 338 140 156 498 679 581477 241 207 749 493 301 989 192 1000 614 472 219 904 321 853 268 427 868 663 246247 657 815 148 113 349 139 960 832 171 269 419 91 951 734 339 223 306 374 12814 53 254 713 242 490438 934 579 59492 209 847565 686360 316 936 965 479 807 6 122 675 692 153 837 346 61 808 697 262 393 82 281 529 28 154 12 422 877 840 436458 816 842 474 730 505 655 998 618 249109 104 47515 876 404 25531 270 334 588 190 915 418 366 736 352 953 626 93 114 619 975 545 400526 44 907367 32 300 144424 643 850 649 478 299 196 721 984 959 817 105 320 992964745 325237 110 880259 145 100 972 564 921 801 800 252 240 932 108666 849 357 428 940 947 155 126 71 715 231 700 852 570 861574 9465 163 255 824 764 750754 203796 62 198 507 199174459 628 83 602 317 610 539 652 220 441 77366 211 16074 793 335 712 343 348 504 23 14719485170 685 68308680 305 1234483372 412 644 532 894 224 979 76088827 213650 59297858 151449 95 843 548 723 511 197 103 283 670617 943 693 94413 53869435440 496 118 833 826606 929 717 409 265 869 806 377 92852704589 777 318 442450996719768 9638 647 6555 64 786522 941 362 518 735 410 535 278 16 4571 3722584 157766129471962888166379 229 273 331 486 517 487 939 776 454 489982 184294 67 41274987 828983 2 97395 137503 37322146 71656843917845 931 557 101 819 664 671 302 250 28853678070117724510256 51031396705420279403347558 236 149 63790 497 85 35894 473 81 789 50 508 825 673 512 398 99 222216 598642167957752276938 24 484 11 862 230 426 955 648 771 462150 494738187 72918275290165744 1587820 756271 912646 632593304364 51 922834 547 469 905 899 495 389 765 475 125 413 770 85422 29627 729620446026133527168 48173775709 4171770821 804 638 361 639 669 263556 248 327 57 27217 36767 56201 58 9107408995 599841 132 524341 456 541 225582 687 142 747 465 431 314 189336733 49 640 772370 797 244 866 350 120 342857 933 937 40 31152 500 39119116386 546 552180 39230 405616 492707 141 855 563 990 15854 112672 651 555 470 823 624 544 175 369 691 551 319863 981 580 182 79 430394387 330 634 682 423 774 886 106 781 70 215 900411 65938 77202592 859183 388 566 228 831 543 206 519 782 134 623 69 90 102 447 918 607399 176 977 463 193739583 309727179 976 761 233 138 926 906 238 464 684 253264 76 421 753 925 284 903 788 812 353 135 402 146 633 820 805 516983 136 839 784 73 891 87 390 576 285 662 748 380 415 328 848 571 480 239 860 385 227 759 637384 453 986 78 911 42 429 37 870 910 351 124 86 89 737 162218 121 608 356 491 813 706 326 792 683 942 277 971 596 762 315 553 740169 312 115 731 631 226882 212 601 482 378 549 688 924 365 416 958 811 844 923 520 920 310 185 887 375 143 502 615 799 856 927 872909 742 656 572446 521 973 889 397 611 354751 914 324 695 997 345 261 432 835 584 783 414 755 337 890 575 681 718978 879 867 559 260 434 919 667 186 838 814 382 949 605 451 779 590 585 363 658621 208 892 689 282 407 821 401 699 871 322 714 396 381 635 769 612340 622 677 845 758 586 433 690 609 897 710 359 895 952

455 383 985 956 798 537 595 954 573

Figure B.23: Forest Fire Model with 1000 Vertices

926 673 812 589 626 646 243 300 871 284650591 618 824 786 495 199 831 865 283 266 553 232 919 805 261 494 860 334 557 575 768 787 779 829 728 649 833178 430737 262216713 345 972453 272 268 81 138 729 911 536 147 329 252 839 982 231 79313989 864 140 511 932 349 622850 227 228 130 644 925 157 593 748 588 545 971 976 706 161 730 627 484 476 572 120709 817780 504 487 437 927 913 109 313 53 470 638 804 539 460 281 177 215 424 310 947 687 446 382 274 475 73 955 162 683 459 331 376 866 192 323 945 541 716 396 412 907 739 542 15 914 889 445 951 366 68 595 467 526 379677 121 203 66 634 949 950 468 458 314 841 422 449 20 892 920 220 259 890 767 134 441 86 704 410993797 603 253372 332 662 423 69 477 671 78 211 587 429388 415 853 49 403 515 381 255 386 278 891 607 759 733419 421338 375 54 364 62 942 258 988 794 11 213 30151 166 710319 336 302 365 902 894 577 584 287 734 137 30 90 125 179 385 625 434 413 35 22 148952160 837 762 602 574 322 218 750 870 640 79671 158 82 197 546267 600 295 558 769 299 517 4 180 509 306 838 170 960 732 791 840 861 987 373208 48 560 172 242 201 604 80 697 996 151 738 397 899 641 47 23 792 505 181 70 922 77 966 528 799 402 16 127 707 264 901 851 630 514 682 250 342 340 12 361 611 142 614 50272 312 726 239 958 859 714 979 527 918 296 200 275 5104 2733 233 265 773 55 333 186 551 948 904 856 782 651245 977549497 129 43308 535 88 771 946 701 87 36 898 7 163 229 6 285 8 827 488 986 149 96 19126512 358 309 906 882 629 835 854 900 881 240 963 124 14 103225 513279 521703 550 774 929 936 273 770 463 176 395 74 818 22228100 131390 194 744 964 592 815 875 355 3 61 105 775 688 335 624 325 756 315 205954 237 123 383 711405 846 981 195 428 503 18 343 353 29 298 482 443 806700 917 647 141 50 783 9 399 665 360 500 357 214 520 67 143 52547 359 978 135 286 548 209736 442 444392 474 462 217 695 934 257803 479 538 483 910 876 656 811 230 85 693 330 416 352 256 653189965 678 973 207 696 10 534 187 962 270 440 816 108 928 844 590 605 248 433 101 406 941 619144 409 923 569 570 79 766 13 432 370 800 636 529 597 76 491 374 39 44 544 380 998 260 522 18863 297 957 450 116 661 567236 111 152 251 969411 156 499 717 747 234 221 820 46198 182 42 367 41 46 524 190 596 288 102 19 126 599 464 65 606 219 939 953 873 400 154 652 639 508 117 648 107 885 980238 226 586 633935 128 75 883 753 872692 244974 389 34 112 132 847 40 819576 967 694 764 654 99 21 456 150 874 821 944 136 594 751 895 11357 826277 689 663 836 798 168 369 124 159 64 84173 655 202 516 735 38 448 169 777 631 174 790 808 438 119469 715 377 91 905307 351 880 573 271 404 617 670 311 731 83 95 94 712 175 612 830 523 742 485 725 17 660 868 552 879788 153 249 828 241 97 294 749 807 408 346 893 223 642 425 263 723 995 930 968 657 921 447 167 746 420 401 802 554 435 454 45 25 316473 543931 862 843 975 740 471823 983 943 659 493 666 93 171 698 620337 556 785 486 224 757 387 507 356 183 645 305 561 347 59 579 623 680 772 658 145 122 193 354 540 498 118 133 465 1000 293328 571 938 324 37 60 720 621 489 743 384 106 344 155 198 884 598877 667 858 533 492 842 304 326 763 721519758 212 280 675 394 290 992 146 196 990 699 601664 312 391613 781 254210 822 58 581 810814 724 991 778 637 887 795 848 32 565 869 832115 414 722 439 320 896 164 776 585903 784 417 686 857 341 184 845 318 457 317 668 959 562 371 436 849 761 114 956 564 616 745 970 531 852 56 206 825 110 628 760 684276 681 350568 933 246 674 247 563 165 691 937 609 559 92 672 908 452 727 530 204 994 480 426 886 303 989 676 518 912 608 940 378 813 615455 834 501 490 582 348 235 327 362 185 580 525 532 690 506 718 915 566 431 708 282 269 752635368 496 632 984 679 289 702 510 427 555 407 398 321 393 897 478 878985 292961 855 705 472669 755 363 754 863 765 924 809 583 481 451 916 801466 291 610 578 909 999643 685 339 997 789 741 888 537 867 418 719

Figure B.24: Evolved Forest Fire Model with 1000 Vertices Appendix C

Empirical Cumulative Distributions of Fitness Results by Experiment and Objective

116 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 117

● 1.0

● ● ● ● ● 0.8 ● ● ● ● ● 0.6 ● ● ● ●

Frequency ● ● 0.4 ● ● ● ●

● 0.2

● ● ● ● 0.0

0 20 40 60 80 100

Difference in Average Path Length

Figure C.1: Distribution of Differences in Average Geodesic Path Lengths for GR- 100 1.4 1.2

● 1.0 Frequency 0.8 0.6

−1.0 −0.5 0.0 0.5 1.0

Difference in Clustering Coefficient

Figure C.2: Distribution of Differences in Clustering Coefficients for GR-100 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 118

● 1.0

● ● 0.8 0.6 Frequency

● 0.4 0.2 0.0

0.0 0.2 0.4 0.6

Difference in In−Degree Distribution

Figure C.3: Distribution of Differences in In-Degree Distributions for GR-100

● 1.0 ● ● ● ● 0.8 0.6 Frequency 0.4 0.2 0.0

0.0 0.1 0.2 0.3 0.4

Difference in Out−Degree Distribution

Figure C.4: Distribution of Differences in Out-Degree Distributions for GR-100 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 119

● 1.0 ● ● ● ● ● ● 0.8 ● ● ● ● ● ● 0.6 ● ● ● ●

Frequency ● ● 0.4 ● ● ● ● ● ● 0.2 ● ● ● ● ● 0.0

0.0 0.1 0.2 0.3 0.4

Difference in Average Path Length

Figure C.5: Distribution of Differences in Average Geodesic Path Lengths for GR- 250

● 1.0 ● 0.8 0.6 Frequency 0.4 0.2 0.0

−0.2 0.0 0.2 0.4 0.6

Difference in Clustering Coefficient

Figure C.6: Distribution of Differences in Clustering Coefficients for GR-250 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 120

● 1.0 ● ● ● ● 0.8

● ● 0.6 Frequency 0.4

● 0.2 ● 0.0

0.0 0.1 0.2 0.3 0.4 0.5

Difference in In−Degree Distribution

Figure C.7: Distribution of Differences in In-Degree Distributions for GR-250

● 1.0 ● ● ● 0.8 0.6 Frequency 0.4 0.2 0.0

−0.2 0.0 0.2 0.4 0.6

Difference in Out−Degree Distribution

Figure C.8: Distribution of Differences in Out-Degree Distributions for GR-250 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 121

● 1.0

● ● 0.8 ● ● ● ● ● ● 0.6 ● ● ● ●

Frequency ● ● 0.4 ● ● ● ● ● ● 0.2 ● ● ● ● ● 0.0

0 100 200 300 400 500

Difference in Average Path Length

Figure C.9: Distribution of Differences in Average Geodesic Path Lengths for GR- 500

● 1.0 ● 0.8 0.6 Frequency 0.4 0.2 0.0

−0.2 −0.1 0.0 0.1 0.2 0.3

Difference in Clustering Coefficient

Figure C.10: Distribution of Differences in Clustering Coefficients for GR-500 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 122

● 1.0 ● ● ●

● 0.8 ● ●

● 0.6

● Frequency 0.4

● ● 0.2

● ● 0.0

0.0 0.1 0.2 0.3 0.4 0.5

Difference in In−Degree Distribution

Figure C.11: Distribution of Differences in In-Degree Distributions for GR-500

● 1.0 ● ● ● ● ● ● 0.8 ● 0.6 Frequency 0.4 0.2 0.0

0.0 0.1 0.2 0.3 0.4 0.5

Difference in Out−Degree Distribution

Figure C.12: Distribution of Differences in Out-Degree Distributions for GR-500 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 123

● 1.0 0.8 ● ● ● ● ● ● 0.6 ● ● ● ●

Frequency ● ● 0.4 ● ● ● ● ● ● 0.2 ● ● ● ● ● 0.0

0 200 400 600 800 1000

Difference in Average Path Length

Figure C.13: Distribution of Differences in Average Geodesic Path Lengths for GR- 1000

● 1.0 ● ● 0.8 0.6 Frequency 0.4 0.2 0.0

−0.002 0.000 0.002 0.004 0.006 0.008 0.010

Difference in Clustering Coefficient

Figure C.14: Distribution of Differences in Clustering Coefficients for GR-1000 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 124

● 1.0 ● 0.8 ● ● ● ● ● 0.6

● Frequency

● 0.4

● ● 0.2 ●

● 0.0

0.0 0.1 0.2 0.3

Difference in In−Degree Distribution

Figure C.15: Distribution of Differences in In-Degree Distributions for GR-1000

● 1.0 ● ● ● ● ● ● 0.8 ● ● 0.6 Frequency 0.4 0.2 0.0

0.0 0.1 0.2 0.3

Difference in Out−Degree Distribution

Figure C.16: Distribution of Differences in Out-Degree Distributions for GR-1000 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 125

● 1.0

● ● ● ● ● 0.8 ● ● ● ● ● ● 0.6 ● ● ● ● Frequency

● 0.4 ● ● ●

● ● 0.2 ● ● ● ● ● 0.0

0 20 40 60 80 100

Difference in Average Path Length

Figure C.17: Distribution of Differences in Average Geodesic Path Lengths for BA- 100 1.4 1.2

● 1.0 Frequency 0.8 0.6

−1.0 −0.5 0.0 0.5 1.0

Difference in Clustering Coefficient

Figure C.18: Distribution of Differences in Clustering Coefficients for BA-100 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 126

● 1.0 ● ● ●

● 0.8

● ● ● 0.6 ●

● Frequency

● 0.4

● 0.2

● 0.0

0.00 0.05 0.10 0.15 0.20 0.25 0.30

Difference in In−Degree Distribution

Figure C.19: Distribution of Differences in In-Degree Distributions for BA-100

● 1.0 ● ● 0.8 0.6 Frequency 0.4 0.2 0.0

−0.2 0.0 0.2 0.4

Difference in Out−Degree Distribution

Figure C.20: Distribution of Differences in Out-Degree Distributions for BA-100 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 127

● 1.0

● ● ● ● ● 0.8 ● ● ● ● ● ● 0.6 ● ● ● ●

Frequency ● ● 0.4 ● ● ● ● ● ● 0.2 ● ● ● ● ● 0.0

0 50 100 150 200 250

Difference in Average Path Length

Figure C.21: Distribution of Differences in Average Geodesic Path Lengths for BA- 250

● 1.0 ● ● ● ● ● ● 0.8 0.6 Frequency 0.4 0.2 0.0

0.00 0.05 0.10 0.15 0.20 0.25 0.30

Difference in Clustering Coefficient

Figure C.22: Distribution of Differences in Clustering Coefficients for BA-250 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 128

● 1.0

● ● 0.8 ●

● ●

● 0.6 ●

● ●

Frequency ● ● 0.4

● ● 0.2

● ● 0.0

0.00 0.05 0.10 0.15 0.20

Difference in In−Degree Distribution

Figure C.23: Distribution of Differences in In-Degree Distributions for BA-250

● 1.0 ●

● ● ● ● 0.8 ● ● ● 0.6 Frequency 0.4 0.2 0.0

0.0 0.1 0.2 0.3 0.4 0.5 0.6

Difference in Out−Degree Distribution

Figure C.24: Distribution of Differences in Out-Degree Distributions for BA-250 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 129

● 1.0

● ● ● ● 0.8 ● ● ● ● ● ● 0.6 ● ● ● ●

Frequency ● ● 0.4 ● ● ● ● ● ● 0.2 ● ● ● ● ● 0.0

0 100 200 300 400 500

Difference in Average Path Length

Figure C.25: Distribution of Differences in Average Geodesic Path Lengths for BA- 500

● 1.0 ● 0.8 0.6 Frequency 0.4 0.2 0.0

−0.1 0.0 0.1 0.2 0.3

Difference in Clustering Coefficient

Figure C.26: Distribution of Differences in Clustering Coefficients for BA-500 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 130

● 1.0 ●

● ● 0.8 ● ● ●

● 0.6

Frequency ● ● 0.4 ●

● 0.2

● 0.0

0.0 0.2 0.4 0.6

Difference in In−Degree Distribution

Figure C.27: Distribution of Differences in In-Degree Distributions for BA-500

● 1.0 ● ● ● ● 0.8 0.6 Frequency 0.4 0.2 0.0

−0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6

Difference in Out−Degree Distribution

Figure C.28: Distribution of Differences in Out-Degree Distributions for BA-500 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 131

● 1.0

● ● ● ● ●

0.8 ● ● ● ● ● ●

0.6 ● ● ● ● ● Frequency ● ● 0.4 ● ● ● ● ● ● 0.2 ● ● ● ● ● 0.0

0 200 400 600 800 1000

Difference in Average Path Length

Figure C.29: Distribution of Differences in Average Geodesic Path Lengths for BA- 1000

● 1.0 ● ● ● ● ● ●

0.8 ● ● ● 0.6 Frequency 0.4 0.2 0.0

0.0 0.2 0.4 0.6 0.8

Difference in Clustering Coefficient

Figure C.30: Distribution of Differences in Clustering Coefficients for BA-1000 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 132

● 1.0

● ● ● ●

0.8 ● ● ● ● ● ● 0.6

● ● ● ● Frequency ● ● 0.4 ● ● ● ● ● ● 0.2 ●

● ● 0.0

0.00 0.05 0.10 0.15

Difference in In−Degree Distribution

Figure C.31: Distribution of Differences in In-Degree Distributions for BA-1000

● 1.0 ● ● ● ● ● ●

0.8 ● ● ● ● ● 0.6 Frequency 0.4 0.2 0.0

0.0 0.2 0.4 0.6 0.8 1.0

Difference in Out−Degree Distribution

Figure C.32: Distribution of Differences in Out-Degree Distributions for BA-1000 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 133

● 1.0 ● ● ● ● ● ● 0.8 ● ● ● ● ● ● 0.6 ● ● ● ●

Frequency ● ● 0.4 ● ● ● ● ● ● 0.2 ● ● ● ● ● 0.0

0.0 0.2 0.4 0.6 0.8

Difference in Average Path Length

Figure C.33: Distribution of Differences in Average Geodesic Path Lengths for FF- 100

● 1.0 ● ● ● ● ● ● 0.8 ● ● ● ● ● ● 0.6 ● ● ● ●

Frequency ● ● 0.4 ● ● ● ● ● ● 0.2 ● ● ● ● ● 0.0

0.00 0.05 0.10 0.15 0.20

Difference in Clustering Coefficient

Figure C.34: Distribution of Differences in Clustering Coefficients for FF-100 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 134

● 1.0 ● ● ● ● ● 0.8 ● ● ● ●

● 0.6

Frequency ● ● 0.4 ● ●

● ● 0.2

● 0.0

0.0 0.2 0.4 0.6 0.8

Difference in In−Degree Distribution

Figure C.35: Distribution of Differences in In-Degree Distributions for FF-100

● 1.0 ● ● ● ●

● 0.8 ● ● ● ● ● ● 0.6 ● ● ● ●

Frequency ● ● 0.4 ● ●

● 0.2

● 0.0

0.0 0.2 0.4 0.6 0.8

Difference in Out−Degree Distribution

Figure C.36: Distribution of Differences in Out-Degree Distributions for FF-100 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 135

● 1.0 ● ● ● ● ● ● 0.8 ● ● ● ● ● ● 0.6 ● ● ● ●

Frequency ● ● 0.4 ● ● ● ● ● ● 0.2 ● ● ● ● ● 0.0

0 5 10 15

Difference in Average Path Length

Figure C.37: Distribution of Differences in Average Geodesic Path Lengths for FF- 250

● 1.0 ● ● ● ● ● ● 0.8 ● ● ● ● ● ● 0.6 ● ● ● ●

Frequency ● ● 0.4 ● ● ● ● ● ● 0.2 ● ● ● ● ● 0.0

0.00 0.05 0.10 0.15 0.20 0.25

Difference in Clustering Coefficient

Figure C.38: Distribution of Differences in Clustering Coefficients for FF-250 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 136

● 1.0 ● ● ● ● 0.8

● ● ● 0.6 ● ●

Frequency ● ● 0.4 ● ● ● ● ● ● 0.2 ●

● ● 0.0

0.0 0.2 0.4 0.6 0.8

Difference in In−Degree Distribution

Figure C.39: Distribution of Differences in In-Degree Distributions for FF-250

● 1.0 ● ● ● ● ● ● 0.8 ●

● ● ● 0.6 ● ● ● ●

Frequency ● ● 0.4 ● ●

● ● ● 0.2

● ● ● 0.0

0.0 0.2 0.4 0.6 0.8

Difference in Out−Degree Distribution

Figure C.40: Distribution of Differences in Out-Degree Distributions for FF-250 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 137

● 1.0 ● ● ● ● ● ● 0.8 ● ● ● ● ● ● 0.6 ● ● ● ●

Frequency ● ● 0.4 ● ● ● ● ● ● 0.2 ● ● ● ● ● 0.0

0 20 40 60 80 100

Difference in Average Path Length

Figure C.41: Distribution of Differences in Average Geodesic Path Lengths for FF- 500

● 1.0 ● ● ● ● ● ● 0.8 ● ● ● ● ● ● 0.6 ● ● ● ●

Frequency ● ● 0.4 ● ● ● ● ● ● 0.2 ● ● ● ● ● 0.0

0.0 0.1 0.2 0.3 0.4

Difference in Clustering Coefficient

Figure C.42: Distribution of Differences in Clustering Coefficients for FF-500 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 138

● 1.0 ● ● ● 0.8 ● ● ● ● ● ● 0.6 ● ●

Frequency ● ● 0.4

● ● ● ● 0.2 ● ● ● ● ● 0.0

0.0 0.2 0.4 0.6 0.8

Difference in In−Degree Distribution

Figure C.43: Distribution of Differences in In-Degree Distributions for FF-500

● 1.0 ● ● ● ● ● ● 0.8 ● ● ● ● ● ● 0.6

● ● ●

Frequency ● ● 0.4 ● ● ● ● ● ● 0.2 ● ● ● ● ● 0.0

0.0 0.2 0.4 0.6 0.8

Difference in Out−Degree Distribution

Figure C.44: Distribution of Differences in Out-Degree Distributions for FF-500 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 139

● 1.0 ●

0.8 ●

0.6 ●

Frequency ●

● 0.4 ●

0.2 ●

● 0.0

0.0 0.5 1.0 1.5 2.0

Difference in Average Path Length

Figure C.45: Distribution of Differences in Average Geodesic Path Lengths for FF- 1000

● 1.0 ●

0.8 ●

0.6 ●

Frequency ●

● 0.4 ●

0.2 ●

● 0.0

0.0 0.1 0.2 0.3 0.4

Difference in Clustering Coefficient

Figure C.46: Distribution of Differences in Clustering Coefficients for FF-1000 APPENDIX C. DISTRIBUTION OF FITNESS RESULTS 140

● 1.0 ●

0.8 ●

0.6 ●

Frequency ●

● 0.4 ●

0.2 ●

● 0.0

0.0 0.2 0.4 0.6 0.8

Difference in In−Degree Distribution

Figure C.47: Distribution of Differences in In-Degree Distributions for FF-1000

● 1.0 ●

0.8 ●

0.6 ●

Frequency ●

● 0.4 ●

0.2 ●

● 0.0

0.0 0.2 0.4 0.6 0.8 1.0

Difference in Out−Degree Distribution

Figure C.48: Distribution of Differences in Out-Degree Distributions for FF-1000 Appendix D

GP Convergence Plots

In this appendix, the average best fitness values by generation are presented for all runs of each experiment. The fitness values have been normalized using the maximum and minimum value of the averages. This was done so that each fitness value was easily plotted in the same chart.

141 APPENDIX D. GP CONVERGENCE PLOTS 142

● 1.0 F1 ● F2 ● F3 ● F4 0.8 0.6

● 0.4

Normalized Fitness ●

● 0.2

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0

0 5 10 15 20 25 30

Generations

Figure D.1: GR-100 Convergence Plot APPENDIX D. GP CONVERGENCE PLOTS 143

● ● 1.0 F1 ● F2 ● F3 ● F4 0.8

● 0.6

● ● ●

0.4 ● ● ● ● ● ● ● ● ● Normalized Fitness

● ● ● ● ● ● ● ● ● ●

0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0

0 5 10 15 20 25 30

Generations

Figure D.2: GR-250 Convergence Plot APPENDIX D. GP CONVERGENCE PLOTS 144

● ● 1.0 F1 ● ● F2 ● F3 ● F4 0.8

● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4 Normalized Fitness ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0

0 5 10 15 20 25 30

Generations

Figure D.3: GR-500 Convergence Plot APPENDIX D. GP CONVERGENCE PLOTS 145

● ● 1.0 F1 ● ● F2 ● F3 ● F4

● 0.8

● ●

● 0.6 ● ●

● ●

● 0.4 ● Normalized Fitness ●

● ● ● ● ● ● ● ●

● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0

0 5 10 15 20 25 30

Generations

Figure D.4: GR-1000 Convergence Plot APPENDIX D. GP CONVERGENCE PLOTS 146

● 1.0 F1 ● F2 ● F3 ● F4

● 0.8

0.6 ● ● ● ●

● 0.4

Normalized Fitness ●

● ● ● ● ● ● ● ● ● ●

0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0

0 5 10 15 20 25 30

Generations

Figure D.5: BA-100 Convergence Plot APPENDIX D. GP CONVERGENCE PLOTS 147

● ● 1.0 F1 ● F2 ● F3 ● F4 0.8

● ● ● ● ●

● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4 ● ●

Normalized Fitness ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● 0.0

0 5 10 15 20 25 30

Generations

Figure D.6: BA-250 Convergence Plot APPENDIX D. GP CONVERGENCE PLOTS 148

● ● 1.0 F1 ● F2 ● F3 ● F4 ● 0.8

● ●

● 0.6

● ● 0.4 ● ● Normalized Fitness ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0

0 5 10 15 20 25 30

Generations

Figure D.7: BA-500 Convergence Plot APPENDIX D. GP CONVERGENCE PLOTS 149

● ● ● ● ● ● 1.0 ● ● F1 ● F2 ● F3 ● ● ● ● F4 ● ● 0.8

● ● ● ● ● 0.6

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ● Normalized Fitness

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0

0 5 10 15 20 25 30

Generations

Figure D.8: BA-1000 Convergence Plot APPENDIX D. GP CONVERGENCE PLOTS 150

● 1.0 F1 ● F2 ● F3 ● F4 0.8

● ● ● ● ● 0.6

● ●

0.4 ● Normalized Fitness ● ● ● ● ● ● ●

0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0

0 5 10 15 20 25 30

Generations

Figure D.9: FF-100 Convergence Plot APPENDIX D. GP CONVERGENCE PLOTS 151

● ● 1.0 F1 ● F2 ● ● F3 ● ● F4

● 0.8 ●

● ● ●

0.6 ●

● ● ● ●

● ● ● 0.4 Normalized Fitness ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0

0 5 10 15 20 25 30

Generations

Figure D.10: FF-250 Convergence Plot APPENDIX D. GP CONVERGENCE PLOTS 152

● 1.0 F1 ● F2 ● ● F3 ● F4

0.8 ●

0.6 ● ● ●

● ● ● ● ● ● ● 0.4

Normalized Fitness ● ●

● ● ● ● ● ● ● ● ● ● ● ●

0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0

0 5 10 15 20 25 30

Generations

Figure D.11: FF-500 Convergence Plot APPENDIX D. GP CONVERGENCE PLOTS 153

● ● ● ● 1.0 ● F1 ● ● ● ● F2 ● F3 ● ● ● ● ● ● F4 ●

● ● ●

0.8 ● ● ●

● ● ●

● 0.6 ●

● ● ● ● ● ● ● ● ● ● ● ● 0.4

Normalized Fitness ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ●

● ● ● ● ● 0.0

0 5 10 15 20 25 30

Generations

Figure D.12: FF-1000 Convergence Plot APPENDIX D. GP CONVERGENCE PLOTS 154

● 1.0 F1 ● F2 ● F3 ● F4 0.8

● 0.6

● 0.4 Normalized Fitness

● ● ● ● ● ● ● ● ● ●

0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0

0 5 10 15 20 25 30

Generations

Figure D.13: APA-100 Convergence Plot APPENDIX D. GP CONVERGENCE PLOTS 155

● 1.0 ● F1 ● F2 ● F3 ● F4 ●

0.8 ● ●

● ●

● 0.6 ●

● ● ● 0.4 ●

Normalized Fitness ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● 0.0

0 5 10 15 20 25 30

Generations

Figure D.14: APA-250 Convergence Plot APPENDIX D. GP CONVERGENCE PLOTS 156

● 1.0 F1 ● F2 ● F3 ● F4 0.8

● ● 0.6 ● ● ●

● ● ● 0.4

Normalized Fitness ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0

0 5 10 15 20 25 30

Generations

Figure D.15: APA-500 Convergence Plot APPENDIX D. GP CONVERGENCE PLOTS 157

● ● 1.0 F1 ● F2 ● F3 ● F4

● 0.8

● ● ● ●

● ● ● 0.6 ●

● ● ● ● ● ● ●

● ● ● ● ● 0.4 ● ● ● ● ● ● ● Normalized Fitness ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● 0.0

0 5 10 15 20 25 30

Generations

Figure D.16: APA-1000 Convergence Plot