<<

arXiv:1909.13243v2 [nlin.AO] 21 Sep 2020 hyaenwapida h rvddeape rmthe from examples provided illustrate. the which literature as in of- scientific those applied which were now from by techniques are discipline different way The they a one in feature. invented than given ten more any measure often and, to is system) complex there single a therefore, within even scientific and (and systems domains complex take different can in feature forms A wedifferent examples. article, by of this accompanied each them of quantify them part to available main means the mathematical In discuss [1]. to in together. identified and used tems field, be the can of kinds concepts different and of claims measures allow basic the clar- to bring com- ity and disagreement quantifying unnecessary to avoid to approach plexity should practitioner’s whole a any as inform phenomenon the never and complexity this respecting while article complexity This insight. quantifying to p.1]. guide [2, ideas a complexity” intuitive is by our mea- meant Murray all is different capture laureate what of to about Nobel required variety be “A would implies on, late sures This early the do noted As cannot them. Gell-Mann complexity of to all justice. number in it single found a are assigning which that phe- features of multi-faceted of all variety a a not have is systems Complexity complex the and and growing. nomenon, systems, still complex is study list to began first scientists ∗ [email protected] h eto h ok laeawy ieti article this cite always Please book. the book. with the together of to contrast rest in the knowledge, separately mathematical available requires chapter it [1]. this as complex Press make a to University decided is Yale We “What with book published our system?”, of chapter a fications, ebgnb umrsn h etrso ope sys- complex of features the summarising by begin We of aspect an only measures measure any that fact The since proposed been have complexity of measures Many etosI Io hstx r,wt ml modi- small with are, text this of XI – II Sections i o uniyn etrso opeiyars h scien the across complexity of features comple and quantifying measure 1980s for the they kit from how measures and measur -theoretic purported whether classic of explain exa selection and by a analyse application illustrate, we We and complexity. quantified, be of dev recently measures a understand use to We self-organisation. and nonlinearity, opeiyi ut-aee hnmnn novn vari a involving phenomenon, multi-faceted a is Complexity eateto hlspy nvriyo rso,U.K. Bristol, of University Philosophy, of Department colo ahmtc,Uiest fBitl U.K. Bristol, of University Mathematics, of School esrn complexity Measuring Dtd etme 2 2020) 22, September (Dated: aoieWiesner Karoline ae Ladyman James indaoe oeaeol rsn nfntoa rliv- or men- functional as in particular, present only In are are products some systems. above, all complex tioned Not the all lists in identified. present I complexity Table of ‘conditions features non-equilibrium). the feedback, interactions, are and disorder, latter elements of The (numerosity en- complexity’ the way. for to some open in are that vironment systems from in feedback in- ‘emergent’ interactions the disordered and previous the many parts many the are the of between products teractions because the arise nutshell, that com- properties a of In products and distinguish plexity. We complexity for systems. conditions complex only between living some systems, or list some complex functional a all systems; by by complex distilled exhibited by have are exhibited We features are that [1]. features the sciences of across social applicable and is natural which ‘complexity’ derstanding o h rttm.Frhrmteaia akrudis background mathematical Appendix. Further the in time. given first the for us complexity. of help features now measuring complex- They and measuring ity between decades. distinction the the over understand- understand our complexity played of of have development they the ing However, in role practice. tools important in not be an used are to be they measures can Hence, that as systems. real-world than to rather applied constructed experiments were thought measures ‘classic’ as These and depth. complexity, logical statistical complexity, on complexity, measure effective discussions including effective many discussed, in are from mentioned subject, complexity the 1990s, of and measures ar- classic, 1980s this now the of by section few, final com- a the of ticle, In ‘measure makes purported measures. insight any actually what this plexity’ rather ask Accepting to complexity necessary such. of it as aspects complexity com- only than of quantify measures can quantitative all plexity phenomenon, single a nrcn ok ehv eeoe rmwr o un- for framework a developed have we work, recent In hogot emnlg sepandwe ti used is it when explained is terminology Throughout, than rather features of collection a is complexity Since lpdrgru rmwr o complexity for framework rigorous eloped ces. 90.Ti okgvsterae tool a reader the gives work This 1990s. .WA SACMLXSYSTEM? COMPLEX A IS WHAT I. pe o etrso opeiycan complexity of features how mple, so opeiyta aefudwide found have that complexity of es t ffaue nldn disorder, including features of ety iy eas ics oeo the of some discuss also We xity. ∗ 2

ence, and counting is among the most basic scientific TABLE I. The features of complexity, as identified in [1] where methods. Counting is the foundation of measurement they have been grouped into ‘conditions for complexity’ and ‘products of complexity’. because quantities of everyday relevance such as length, mass and time can be counted in units such as metres, grams and seconds. Counting alone does not tell us what Conditions for complexity counts as ‘more’ in the sense of ‘more is different’, be- Numerosity of elements cause, when we consider emergent behaviour, how many Numerosity of interactions is ‘enough’ depends on the system. For some systems it is the high number of elements that is relevant for com- Disorder plexity, as in fluid dynamical systems; for others it is Non-equilibrium (openness) the high number of interactions, as in a small group of Feedback swarming animals or small insect colonies; or it is both, as in the brain and many (if not most) complex systems. Products of complexity The number of interactions is as important as the number Nonlinearity of elements in the system. Self-organisation Robustness of order III. DISORDER AND DIVERSITY Nestedness Disorder and diversity are related, and the words used Robustness of function to describe them overlap and are often not clearly de- Adaptive behaviour fined. ‘Disorder’ usually refers to randomness, which is to Modularity say lack of correlation or structure. Disorder is therefore just the lack of order. It is worth stating this explicitly Memory since it follows that any measure of order can be turned into a measure of disorder and vice versa. A disordered system is one that is random in the sense of lacking correlations between parts over space or time ing complex systems (robustness of function, adaptive or both, at least to some extent. It is worth remem- behaviour, modularity, memory). This is true by defi- bering that complex systems are never completely dis- nition of these properties. Examples of non-functional / ordered. In complex systems, disorder can exist at the non-living systems are the universe and many condensed- lower level in terms of the stochasticity in the interac- matter systems, in particular when they exhibit phase tions between the parts, as well as at the higher level, transitions. in terms of the structure which emerges from them and There is an important distinction between the order of which is never perfect. Thermal fluctuations are a form a complex systems and the order produced by a complex of disorder relevant to the dynamics of complex systems. system. An example of order produced by a complex sys- For example, thermal fluctuations are necessary for most tem is a snowflake produced by the weather and climate biochemical processes to take place. The term ‘noise’ or system. Complex systems are always dynamic, but they ‘thermal noise’ is used more frequently than ‘disorder’ in often produce static order. Another example of a com- this context. plex system that produces order is a honey bee hive; the A real or purely mathematical random system would order of the hive is the self-organised patterns of labour not be described as ‘diverse’. Instead, the term ‘diversity’ distribution for example; the (static) order produced by is often used to describe inhomogeneity in element type the hive are honey combs with their intricate hexagonal – that is types of different kinds. Measures have been structure. In short, a is a system that designed specifically to address diversity in this sense. exhibits all of the conditions for complexity and at least Some of these are discussed at the end of this section. one of the products emerging from the conditions. Here, Interactions can be disordered in time or in their na- we will not discuss these features much further. For de- ture. Elements can be disordered in terms of type. The tails, see [1]. structure formed by a complex system can be disordered We now go through each feature in Tab. I and give in its spatial configuration. All these kinds of disorder examples of measures that quantify them. are relevant, and all are quantifiable. Mathematically, disorder is described with the lan- guage of probability (see Appendix A for a brief introduc- II. NUMEROSITY tion to probability theory). The elements or interactions which are disordered are represented as a random vari- The most basic measure of complexity science is the able X with probability distribution P over the set of counting of entities and of interactions between them. possible events x (events are elements or interactions).X Numerosity is the oldest quantity in the history of sci- A standard measures of disorder is the variance. The 3 variance can be used for events that are numeric, such as functions of n, M, and p only. These regularities emerge the number of edges per node in a network, but not for out of the disorder in the formation process. types, such as species in a population. The variance of a The variance can be used to quantify the disorder of random variable X, the network-formation process after assigning numeric values to the events ‘edge’ and ‘no edge’ – for example 2 Var X := E[(X E[X]) ] , (1) 1 and 0, respectively. The variance of the binary prob- − ability distribution P = p, 1 p of the Erd¨os-Renyi measures the average deviation from the mean. The random graph model is Var{ X −= p(1} p), which is max- equivalent notation in the physics literature is Var X = imal for p = 1/2. The Shannon entropy− of the network- (X X )2 . The broader a distribution of possible h − h i i formation process can be computed without assigning event values is the higher, in general, the variance. A sec- numerical values to the events. The Shannon entropy ond standard measure of disorder, the Shannon entropy, of the binary probability distribution P = p, 1 p is a function from information theory (see Appendix B for of the Erd¨os-Renyi random graph model is {H(X−) =} a brief introduction to information theory). The Shan- p log p (1 p) log(1 p), which is also maximal for non entropy of a random variable X with probability −p = 1/2.− Both− measures− are zero when p = 0 and, due distribution P over events x is defined as to symmetry, when p = 1. If one were to measure the disorder in the final network structure itself, the variance H(X) := P (x) log P (x) . (2) − and the Shannon entropy should be computed from the xX∈X probability distribution over the node degrees. The re- sult would be equivalent to the previous one in the sense The Shannon entropy measures the amount of uncer- that the degree distribution is trivial for p = 0 (com- tainty in the probability distribution P . In the case of all pletely disconnected) and p = 1 (fully or nearly fully probabilities being equal, the distribution is a so-called connected), in which case both measures yield the value uniform distribution. In this case all events are equally 0. For non-trivial network structures both measures are likely, and the uncertainty, and hence the Shannon en- non-zero. It was remarked above that a measure of order tropy, over events is maximal. The Shannon entropy is can be used as a measure of lack of disorder and vice zero when one probability is one and the others are zero. versa. Hence, any of the existing measures of network If, for example, the events x were the possible outcomes structure, such as average path length or clustering, can of an election, then H(X) would quantify the difficulty be used to measure disorder by monitoring their change. in predicting the actual outcome. This approach to measuring disorder has been used in To illustrate these measures of disorder consider a net- the study of Alzheimer’s disease and its effect on neural work, such as the World Wide Web or a neural network. connectivity in the human brain (see [5] and references The disorder relevant to a network is structural disorder. therein). A network with many nodes and edges between every Temporal disorder in a sequence of events, such as the pair of nodes is considered a network with no disorder. The origin of a given network structure is often studied sequence of daily share prices on a stock market, is de- scribed with the language of stochastic processes. Dis- with network-formation models. For an overview of this and other network-formation models see, for example, order in a stochastic process is the lack of correlations between past and future events. A stochastic process is [3]. One of the first network-formation models is the so- called Erd¨os-Renyi random graph model (or just random defined as a sequence of random variables Xt ordered graph model, Poisson model, or Bernoulli random graph) in time (see Appendix A for more details). Disorder [4]. The Erd¨os-Renyi model is parametrised by the num- is the lack of predictability of future events when past ber of nodes n, the maximum number of edges M, and a events are known. To quantify disorder in a sequence parameter p which is the probability of an edge being cre- X1X2 ...Xn, the joint probability over two or more of the ated between two existing nodes. Initially, the network random variables is required, written as P (X1X2 ...Xn). has n nodes and no edges. In a subsequent formation pro- This is the probability of the events occurring together cess, with probability p, two nodes are connected by an (jointly). When a joint probability of two events is edge. When p = 0, the resulting network after many rep- known, then, in addition to their individual probability, etitions is a set of nodes without any edges. For p = 1, it is known how likely they are to occur together. An example is the probability of certain genetic mutations the result is a highly connected network. For p some- where in between 0 and 1, the formation process yields a being present and the probability of two mutations being present in the same genome. The joint Shannon entropy network with links between some nodes and some nodes having more links than others. In this case, the proba- H(X1X2 ...Xn) over this distribution, bilistic nature of the link formation results in a disordered H(X X ...X ) := P (x x ...x ) structure of the network. Hence, the disorder of the for- 1 2 n − 1 2 n mation process is taken as a proxy for the disorder of the xnX∈X n (3) final network structure. Several properties of the fully log P (x1x2 ...xn) , formed network, such as the average path length and the · average number of edges per node, can be expressed as captures the lack of correlations. A measure of average 4 temporal disorder is the so-called Shannon entropy rate, means. A distribution with a variance of 10 and a mean of 20 might be considered more diverse than a distribu- 1 hn := H(X1X2 ...Xn) . (4) tion with a variance of 10 and a mean of 1, 000. Their n coefficient of variation would reflect this difference. The Shannon entropy rate measures the uncertainty in Scott Page, in his book ‘Diversity and Complexity’ [9], the next event, X , given that all n 1 previous events distinguishes between three kinds of diversity: diversity n − X1 ...Xn−1 have been observed. The lower the entropy within a type, diversity across types, and diversity of rate, the more correlations there are between past and community composition. All three are measured by the future events and the more predictable the process is. Shannon entropy. In fact, they differ only in what consti- A fly’s brain is an example of a complex system where tutes an event in the definition of the random variable. temporal disorder has been measured experimentally. [6] Other measures of diversity are the so-called ‘distance’ recorded spike trains of a motion-sensitive neuron in the measures and ‘attribute’ measures. Distance measures of fly’s visual system. From repeated recordings of neu- diversity, such as the Weitzman Diversity, take into ac- ral spike trains, they constructed a probability distri- count not only the number of types, but also how much bution P (X1X2 ...Xk) over spike trains of some length they differ from each other [10] and therefor require a k. From this probability distribution they computed the mathematical measure of distance. Attribute-diversity joint Shannon entropy H(X1X2 ...Xk) and the entropy measures assign attributes to each type and numerically 1 rate k H(X1X2 ...Xk). They repeated the experiments weigh the importance of each attribute. For example, after exposing the fly to the controlled external stimulus to compute an attribute diversity of phenotypes more of a visual image and computed the Shannon entropy and weight is put on traits with higher relevance for survival entropy rate again. They interpreted the difference in the (see [9] for more details). entropies between the two experiments as the reduction in disorder of the neural firing signal when a stimulus is present. IV. FEEDBACK One speaks of the ‘diversity’ of species in an ecosys- tem or of diversity of stocks in an investment portfolio The interactions in complex systems are iterated so rather than ‘disorder’. In the language of diversity, the that there is feedback from previous interactions, in the elements, species or stocks, are called ‘types’. The sim- sense that the parts of the system interact with their plest measure of diversity is the number of types or the neighbours at later times depending on how they inter- logarithm of that number. A more informative measure acted with them at earlier times. And these interactions takes into account the frequency of each type, this being take place over a similar time scale to that of the dy- the number of individuals of each species in a habitat namics of the system as a whole. There is no measure of or the number of each stock in the portfolio. Treating feedback as such. Instead, the effects of feedback such as such frequencies as probabilities, a random variable X of nonlinearity or structure formation are measured. Hence, types can be constructed, and the Shannon entropy the mathematical tools that are used to measure order H(X)X is used as a measure of type diversity. In ecology, and nonlinearity can also be indicators of feedback. diversity is measured using the entropy power, 2H(X) (if A common way to study feedback is to construct a the entropy is computed using log base 2 or eH(X) if the mathematical model with feedback built into it. If the entropy is computed using log base e) [7]. It behaves model reproduces the observed dynamics well, this sug- similar to the entropy itself but has a useful interpreta- gests the presence of feedback in the system that is being tion: the entropy power is the number of species in a modelled. An example is the dynamics of a population hypothetical population in which all species are equally of predator and prey species such as foxes and rabbits. abundant and whose species distribution has the same The growth and decline of these species can be modelled Shannon entropy as the actual distribution. If the types by the Lotka-Volterra differential equation model. It de- are numeric, such as the size of pups in an elephant seal scribes the change over time in population size of two colony [8], diversity can be measured using the variance. species, the prey x and its predator y, using the four pa- Often a normalised form of variance, the coefficient of rameters A, B, C, and D. A and C are parameters for variation, is used: the speed of growth of the respective species. B and D quantify the predation. The change over time in popula- √VarX cv := . (5) tion size,x ˙ andy ˙, is given by the two coupled equations X h i x˙ = Ax Bxy , − (6) The coefficient of variation is the square root of the vari- y˙ = Cy + Dxy . ance (also known as the standard deviation) divided by − the mean. Its behaviour is equivalent to that of the vari- The fact that x and y appear in both equations ensures ance. Broader distributions, such as a larger range of that there is a feedback between the size of each popula- pup sizes in an elephant seal colony, result in a higher tion. If B or D are zero, there is no feedback. coefficient of variation. However, it allows the compari- For certain values of the parameters A, B, C, and son of distributions with the same variance but different D the number of individuals of each species oscillates. 5

When the overabundance of predators reduces the num- of the cell [16]. ber of prey to below the level needed to sustain the preda- The above tools for analysing feedback have in common tor population but the resulting decline in the number of that they do not assign a number to the phenomenon, as predators allows the prey to recover, a cycle of growth is done in the case of disorder or diversity. Instead, in and decline results. For such oscillations to happen the most practical applications feedback is a tunable interac- time scale of growth, captured by A and C, needs to be tion parameter of a model or an observable consequence similar to the time scale of predation, captured by B and of the interactions which are programmed into a model. D. Oscillations in predator-prey populations is a classic example of feedback. A widely used computational tool for studying feed- V. NON-EQUILIBRIUM back are so-called agent-based models. These models are computational simulations of agents undergoing repeated Complex systems are open to the environment, and interactions following simple rules. In such a simulation they are often driven by something external. Non- a usually large set of agents is equipped with a small set equilibrium physical systems are treated by the theo- of actions that each agent is allowed to execute and a ries of non-equilibrium thermodynamics [17] and stochas- small set of (usually simple) rules defining the interac- tic processes [18]. Stochastic complex systems, such as tion between the agents. In any given round of a sim- chemical reaction systems, are often studied using the ulation an agent and an action, or two agents and an statistics of Markov chains. Consider a system repre- interaction, are picked at random. If the action (interac- sented by a set of states, S, through which the system tion) is allowed, it is executed. An agent-based simula- evolves in discrete time steps. Let Pij be a matrix { } tion usually consists of many thousands of such rounds. of time-independent probabilities of transitioning from One of the first agent-based models was the sugarscape state i to state j, with Pij = 1,[19] for all i S. Let j ∈ model, pioneered by the American epidemiologist Joshua πi be the probability ofP being in state i. If there exists a Epstein and computational, social and political scientist probability distribution π∗ such that, for all j, Robert Axtell [11]. The sugarscape model is a grid of ∗ ∗ cells, some of which contain ‘sugar’; the others contain πj = Pij πi , (7) nothing. Agents ‘move’ on this landscape of cells and Xi ‘eat’ when they find a cell containing sugar. Even this it is called the invariant distribution. In a stochastic very simple setup produces emergent phenomena such as model of a system of chemical reactions, for example, the feedback effect of the-rich-get-richer. the chemical composition is represented as a probability Agent-based models are frequently used to study feed- distribution, and chemical reactions are represented as back in the coherent dynamics of animal groups [12]. [13] stochastic transitions from one reactant to another. A describe observations of army ants in Soberania National system is in chemical equilibrium if the chemical compo- Park in Panama. Army ants make an excellent study sition is time-invariant. Reactions are still taking place case for collective phenomena since they are able to form in chemical equilibrium, but the depletion of one reac- large-scale traffic lanes to transport food and building tant is compensated by other transformations such that material over long distances. They even form bridges out the overall concentrations remain largely unchanged. A of ants to avoid ‘traffic congestion’. These collective phe- general framework to model non-equilibrium stochastic nomena are impossible without the presence of feedback. dynamical systems is that of stochastic differential equa- The authors set up an agent-based simulation with sim- tions [20]. ple movement and interaction rules for individual ants. For systems for which a description in terms of chem- Feedback is built in as an ant’s tendency to avoid collision istry or thermal physics is unhelpful, such as the brain or with other ants and in its response to local pheromone the World Wide Web, information theory is often used concentration. The simulations reproduce the observed to describe the equivalent of a non-equilibrium state. lane formations and the minimisation of congestion. Such Based on the probability distribution over the relevant a simulation is not to be confused with the measurement state space, a measure related to the mutual information of actual feedback in a real system. (see Section VI) quantifies the distance to an equilib- There are other notions of feedback in the literature on rium state. Consider the probability distribution P over complex systems. The computational notion of feedback current state space and the corresponding equilibrium is to ‘feed back’ the output of a computation as input distribution Q. TheX amount of non-equilibrium is then into the same computation. In this way, the outcome of quantified by the so-called Kullback-Leibler divergence future computations depends on the outcome of previous (or relative entropy) D(P Q): computations. This kind of feedback is particularly im- || P (x) portant for those who view nature to be inherently com- D(P Q)= P (x) log , (8) putational [14, 15]. On this view, any loop in the compu- || Q(x) xX∈X tational representation of a natural system indicates the presence of feedback. Nobel Laureate Paul Nurse made which is zero only if P = Q, in other words only if the a similar point when presenting his computational view system is in equilibrium. Examples where this has been 6 used are the predictive brain model [21] and stochastic time sequences [22]. cov(X, Y ) corr(X, Y ) := , (10) σX σY where σ is the square root of the variance, known as the VI. SPONTANEOUS ORDER AND SELF-ORGANISATION standard deviation. A measure of correlation derived from information the- ory is the mutual information. For two random variables Perhaps the most fundamental idea in complexity sci- X and Y , the mutual information is a function of the ence is that of order in a system’s structure or behaviour Shannon entropy H (see Section III): that arises from the aggregate of a very large number of disordered and uncoordinated interactions between ele- I(X; Y )= H(X)+ H(Y ) H(X, Y ) . (11) − ments. Such self-organisation can be quantified by mea- suring the order that results – for example, the order in The mutual information measures the difference in un- some data about the system. However, measures of order certainty between the sum of the individual random vari- are not measures of self-organisation as such since they able distributions and the joint distribution. If there are cannot determine how the order arose. This is because any correlations between the two variables, the uncer- the order in a string of numbers is the same regardless of tainty in their joint distribution will be lower than the its source. Whether the order is produced spontaneously sum of the individual distributions. This is a mathe- as a result of uncoordinated interactions in the system matical version of the often repeated statement that ‘the or whether it is the result of external control cannot be whole is more than the sum of its parts’. If the whole inferred from measuring the order without background is different from the sum of the parts, it means that knowledge about the system. For example, the orderly there are correlations between the parts. For two com- traffic lanes to and from food sources formed by an ant pletely independent random variables, on the other hand, colony are considered the result of a self-organising pro- H(X)+ H(Y )= H(X, Y ) and the mutual information is cess since there is no mechanism which centrally controls zero. the ants’ behaviour, while the orderly checkout lines in a An example of covariance as a measure of order is the supermarket are the result of a centrally managed con- study of bird flocking by William Bialek, Andrea Cav- trol system. A high measure of order, even when self- agna, and colleagues [23]. They filmed flocks of starlings organised, is not to be confused with a high level of com- in the sky of Rome (containing thousands of starlings) plexity since order is but one aspect of complexity. How- and extracted the flight paths of the individual birds ever, the plethora of measures of order which are labelled from these videos. Each bird’s different flight directions as measures of complexity reflects the ubiquity of order over time were represented as a random variable, and the in complex systems and explains the frequent use of order random variables of all birds were used to compute their as a proxy for complexity. pairwise covariances.[24] This list of covariances was fed into a computer simulation that modelled the flock of Complex systems can produce order in their environ- birds as a condensed matter system, which is defined by ment. It is important to remember that the order pro- the interaction between close-by ‘atoms’ only. The com- duced by the system is different from the order in the puter simulation of such a very simple model with pair- system itself. For example, the order of hexagonal wax wise interactions only and no further parameters, pro- cells built by honey bees is order produced by the system, duced a self-organising system that very closely resem- while division of labour in the hive is order in the sys- bled the self-organising movement originally observed. A tem. The hexagonal honeycomb structures are a form of similar analysis was been done on network data of cul- spatial correlation which can be quantified by correlation tured cortical neurons, corroborating the idea that the measures, some of which are discussed in the following. brain is self-organising [25]. A correlation function is a means to measure depen- The order in a flock of starlings is a spatial order persis- dence between random variables; therefore, it is a sta- tent over time. Systems in which the focus is more on the tistical measure. The covariance is a standard measure temporal aspect of the order are neurons and their spik- of correlation. For any two numeric random variables X ing sequences, for example, or the monsoon season and its and Y , the covariance, patterns. Order in these systems is studied by represent- ing them as sequences of random variables X1X2 ...Xt cov(X, Y )= E[XY ] E[X]E[Y ] , (9) with a joint probability distribution P (X X ...X ). − 1 2 t Such sequences we encountered above in the study of is the difference between the product of the expectations disorder. Several authors, independently, introduced the and the expectation of the product. If the two random mutual information between parts of a sequence of ran- variables are uncorrelated, this difference is zero. From dom variables as a measure of order in complex sys- the covariance a dimensionless correlation measure is de- tems, under the names of effective measure complexity rived, the so-called Pearson correlation. It is the most (EMC) [26], predictive information (Ipred) [27], and ex- standard measure of correlation and defined as follows: cess entropy (E) [28]. Consider the infinite sequence of 7 random variables X−tX−t+1 ...X0X1X2 ...Xt, which is and references therein). Because 3/4 is less than 1, a also called a stochastic process. The information theo- mammal’s metabolism is more efficient the bigger the retic measure Ipred (or EMC or E) of correlation between mammal; an elephant requires less energy per unit mass the two halves of a stochastic process is defined as the than a mouse. This is a nonlinear effect since doubling mutual information between the two halves: the body size does not double the energy requirements. It is also another instance of the often repeated, but con- Ipred := lim I(X−tX−t+1 ...X−1; X0X1 ...Xt) . (12) fused, statement that, in complex systems, the whole is t→∞ more than the sum of its parts. The whole is never more There is, of course, never an infinite time course of data, than the sum of its parts when interactions are taken into and the limit t is never taken in practice. account. [29] measured→ the ∞ predictive information in retinal gan- The relation between taxpayer bracket and number of glion cells of salamanders. Ganglion cells are a type of people in this bracket is an instance of a statistical dis- neuron located near the inner surface of the retina of tribution that exhibits a power-law behaviour. Other the eye. In the lab, the salamanders were exposed, al- examples of statistical distributions with a power-law be- ternatively, to videos of natural scenes and to a video haviour are the number of metropolitan areas relative to of random flickering. While a video was showing, the their population size, the number of websites relative to researchers recorded a salamander’s neural firings. Re- the number of other websites linking to them, and the peated experiments allowed them to infer the joint prob- number of proteins relative to the number of other pro- ability distribution P (X−t ...Xt) over the ganglion cell teins that they interact with (for reviews, see [32, 33]). firing rates and to compute the predictive information Statistical distributions with a power-law behaviour Ipred contained in it. They found that Ipred was highest are defined in terms of random variables. Consider a dis- when a salamander was exposed to naturalistic videos of crete random variable X with positive events x> 0 and underwater scenes. This shows that the order in the nat- probability distribution P . The distribution P follows a ural scenes is reflected in the order of the neural spike power law if sequences. The authors also think that it shows the neu- −γ ral system not only responds to a visual stimulus, but P (x)= cx , (13) also makes predictions about it. for some constant γ > 1 and normalisation constant Quantifying predictability and actually predicting 1−γ c = (γ 1)/(xmin ), where xmin is the smallest of the x what a system is going to do are, of course, two dif- values.− A cumulative distribution with a power-law form ferent things. In order to make a prediction one first is also called a Pareto distribution; a discrete distribution has to have a model, for example, inferred from a set of with a power-law form is also called a Zipf distribution measured data. (for a review, see Mitzenmacher 34). Eq. 13 can be writ- ten as log P (x) = log c γ log x, which says that plotting log P (x) versus log x yields− a straight line with slope γ. VII. NONLINEARITY Therefore, the presence of a power law in real-world− dis- tributions is often determined by fitting a straight line There are several different phenomena addressed with to a log-log plot of the data. Although this is common the same label of ‘nonlinearity’. Each phenomenon re- practice, there are many problems with this method of quires its own measure. Power laws are probably the identifying a power-law distribution [35]. most prominent examples of nonlinearity in complex sys- A power-law distribution has a well-defined mean for tems. But correlations as a form of nonlinearity are γ 1 over x [1, ) and a well-defined variance for equally important, and these two are not completely sep- γ ≤2. Power-law∈ distributions∞ are members of the larger arate phenomena. family≤ of so-called fat-tailed distributions. These prob- ability distributions are distinct from the most common distributions, such as the Gaussian or normal distribu- A. Nonlinearity as Power Laws tion, in that events far away from the mean have non- negligible probability. Such rare events have obtained A power law is a relation between two variables, x the name ‘black swan’ events; they come as a surprise and y, such that y is a function of the power of x – for but have major consequences [36]. example, y = xµ. Quite a few phenomena in complex systems, such as the relation between metabolism and body mass or the number of taxpayers with a certain B. Nonlinearity versus Chaos income and the amount of this income, follow a power law to some extent. The power law of metabolism for Nonlinearity in complex systems is not to be confused mammals was first discussed by Max [30] in 1932. It is with nonlinearity in dynamical systems. Nonlinear dy- now well established that, to a surprising accuracy, the namical systems are sets of equations, often determinis- metabolic rate of mammals, R, is proportional to their tic, describing a trajectory in phase space, either contin- body mass, m, to the power of 3/4: R m3/4 (see [31] uous or discrete in time. Some of these systems exhibit ∝ 8 chaos, which is the mathematical phenomenon of extreme against perturbation in the sense of maintaining its struc- sensitivity of the trajectory on initial conditions. An ex- ture or its function upon perturbation, which some refer ample of a discrete dynamical system exhibiting chaos is to as ‘stability’. Alternatively, a system might be robust the logistic map. The logistic map, xt+1 = rxt(1 xt) in the sense that it is able to recover from a perturbation; where t indexes time, is a simple model of population− this is also called ‘resilience’. dynamics of a single species, as opposed to two species, Strictly speaking, robustness is the property of a discussed above in the context of feedback. This map is model, an algorithm, or an experiment that is robust now a canonical example of chaos. against the change of parameters, of input, or of assump- Actual physical systems studied by dynamical systems tions. But usually, in the context of complex systems, ro- theory, such as a chaotic pendulum, need not have any bustness refers to the stability of structure, dynamics or of the features of complex systems. Certainly, chaos and behaviour in the presence of perturbation. All order and complexity are two distinct phenomena. On the other organisation must be robust to some extent to be worth hand, the time evolution of many complex systems is de- studying. Several tools are available for studying robust- scribed by nonlinear equations. Some climate dynamics, ness; the most frequently used are tools from dynamical for example, are modelled using the deterministic Navier- systems theory and from the theory of phase transitions. Stokes equations, which are a set of nonlinear equations Brief descriptions are given outlining the role of these describing the motion of fluids. Another example of a tools in the study of complex systems. nonlinear equation used to describe many complex sys- tems is the Fisher-KPP differential equation [37]. Origi- nally introduced in the context of population dynamics, A. Stability Analysis its application ranges from plasma physics to physiology and ecology. The system of predator and prey species sharing a habitat, which was discussed above (see the Lotka- Volterra population model in Section IV and Sec- C. Nonlinearity as Correlations or Feedback tion VII), is an example of a stable dynamical system. After some time the proportion of the two species be- comes either constant or oscillates regularly, independent For some the notion of nonlinearity in complex systems of the exact proportion of species in the beginning. A is synonymous with the presence of correlations (for in- dynamical system is called ‘stable’ if it reaches the same stance, [38]). If two random variables X and Y are in- equilibrium state under different initial conditions or if it dependent, their joint probability distribution P (XY ) is returns to the same equilibrium state after a small pertur- equal to the product distribution P (X)P (Y ). When this bation. Stability analysis is prevalent in physics, nonlin- equality does not hold, then there must be correlations ear dynamics, chemistry, and ecology. A reversible chem- between X and Y . ical reaction, for example, might be stable with respect Defining ‘nonlinearity’ in terms of the presence of cor- to forced decrease or increase of a reactant, which means relations is not to be confused with linear versus nonlin- the proportion of reactants and products returns to the ear correlations. In the language of statistical science, same value as before the perturbation. Other examples of two variables X and Y are linearly correlated if one can complex systems which are represented as dynamical sys- be expressed as a scaled version of the other, X = a+cY , tems are food webs with more than two species [40, 41], for some constants a and c. The Pearson correlation coef- genetic regulatory networks [42], and neural brain regions ficient, for example, detects linear correlations only. The [43]. mutual information, on the other hand, detects all corre- For any given dynamical system described by a state lations, linear as well as nonlinear. vector x and a set of (possibly coupled) differential equa- To others, mainly social scientists, ‘nonlinearity’ means tions dxi/dt = fi(x), a stable point, a so-called fixed that the causal links of the system form something more point, is a solution to the equations dxi/dt = 0. Stabil- complicated than a single chain. A system with causal ity analysis classifies these fixed points into stable and loops, indicating feedback, would count as ‘nonlinear’ in unstable ones (or possibly stable in one direction and this view [39]. unstable in another). Assuming the system is at one of The different definitions of nonlinearity discussed here its fixed points, the effect of a small perturbation on the are all ubiquitous in complex systems research, so it is system’s dynamics is found by analysing the Jacobian not surprising that nonlinearity is often mentioned as es- matrix J, a linearisation of the system, which is defined sential to complex systems. as

∂fi [Jij ]= . (14) VIII. ROBUSTNESS ∂xj  If the eigenvalues of the Jacobian evaluated at a given Several phenomena are often grouped together under fixed point all have real parts that are negative, then the umbrella of ‘robustness’. A system might be robust this point is a stable fixed point and the system returns 9 to the steady state upon perturbation. If any eigenvalue at which the magnetisation diverges and the system un- has a real part that is positive, then the fixed point is dergoes a phase transition. unstable and the system will move away from the fixed Another signature of a nearby tipping point is an in- point in the direction of the corresponding eigenvector, crease in fluctuations. In general, a perturbed system usually towards another, stable, fixed point. For an in- fluctuates around a steady state before settling back troduction to fixed-point analysis of dynamical systems, down. The larger the length or time scale on which the see, for example, [44]. Any stable fixed point is embed- fluctuations are correlated, the closer the system is to a ded in a so-called basin of attraction. The size of this tipping point. basin quantifies the strength of the perturbation which Experimentally, one might expose a system to increas- the system can withstand and, therefore, is a measure of ingly strong perturbations and measure the time it takes the stability of the system at the fixed point [44]. Stabil- the system to come back to its steady state. Such mea- ity analysis is widely used in ecology [45, 46]. surements yield the response of the system as a random Viability theory combines stability analysis of deter- variable S as a function of spatial coordinate x and time ministic dynamical systems theory with control theory t. The covariance cov(S(x,t),S(x + r,t + τ)) (see Sec- [47, 48]. It extends stability analysis to more general, tion VI) between the random variable at time t and spa- non-deterministic systems and provides a mathematical tial location x and the same variable at some later time framework for predicting the effect of controlled actions t+τ and some displaced location x+r is a measure of the on the dynamics of such systems, with the aim of reg- temporal and spatial correlations in time. The equivalent ulating them. Viability theory has been applied to the measure in physics is the so-called auto-correlation func- resilience of social-ecological systems [49, 50]. tion, denoted by C(r, τ), defined, in physics notation, as A similar, though mostly qualitative, use of the ideas C(r, τ)= S(x,t)S(x + r, t + τ) . (15) of stability and viability is found in the analysis of tipping h i points in climate and ecosystems. Tipping points are the It differs from the covariance by not subtracting the prod- points of transition from one stable basin of attraction to uct of the marginal expectations, S(x,t) S(x+r, t+τ) another, instigated by external perturbations [46]. (in statistics notation, E[S(x,t)]Eh[S(x +ihr, t + τ)]), fromi the expectation of the product (cf. eq. 9). When correlations decay exponentially in time, C is B. Critical Slowing Down and Tipping Points proportional to e−kτ , where k is an inverse time. Af- ter time τ =1/k correlations have decayed to a fraction The time it takes for a system to return to a steady 1/e of the value they had at time t, and τ = 1/k is the state after a perturbation is a stability indicator comple- so-called characteristic time scale. Equally, when corre- mentary to the fixed-point classification and the size of lations decay exponentially with distance, the distance the attractor basin. The longer it takes the system to r at which they have decayed to a fraction 1/e of the recover after a perturbation the more fragile the system |value| at r = 0 is the characteristic length scale. Crit- is. An increase in relaxation time can indicate a critical ical slowing| | down is accompanied by fluctuations that slowing down and the vicinity to a so-called tipping point decay slower than exponentially. The signature in the or phase transition. When a system is close to a tipping auto-correlation function C is a power-law decay either point, it does not recover anymore from even very small in time or in space, C r −α or C τ −α. Theoret- perturbations and moves to a different steady state which ically, at the point of a∝ phase | | transition∝ the correlation is possibly very far away from its previous state. Finding length becomes infinite. At that point the system has measurable indicators for nearby tipping points has been correlations on all scales and no characteristic length nor of considerable interest, in particular since ecological and time scale anymore. A correlation length which captures climate systems have begun to be characterised by sta- nonlinear correlations has been based on the mutual in- bility analysis and their fragility is being recognised more formation [52]. and more [46, 51]. An example of a complex system where critical slowing Mathematically, the vicinity to a tipping point is recog- down has been measured is a population of cyanobacte- nised by the functional dependency of the recovery time ria under increasing irradiation. The bacteria require on the perturbation strength. A system which is close to light for photosynthesis, but irradiation levels that are a tipping point exhibits a recovery time that grows pro- too high are lethal. For protection against destructively portional to perturbation strength to some power. This high irradiation levels, bacteria have evolved a shielding scaling law, associated with critical slowing down, is a mechanism. Annelies Veraart and colleagues exposed cell well-known phenomenon in the statistical mechanics of cultures of cyanobacteria to varying intensities of irradi- phase transitions. The standard example of a phase tran- ation and studied the subsequent shielding process [53]. sition in physics is the magnetisation of a material as a When the irradiation was relatively weak the bacterial function of temperature. The magnetisation density m is population quickly recovered after enacting the mutual proportional to the power of the temperature difference shielding mechanism by which the bacteria protect each −α to a critical temperature, m T TC . TC is the other. The stronger the radiation, the longer it took the critical temperature, the equivalent∝ | − to a| tipping point, population to build up the necessary shielding and re- 10 cover afterwards. Veraart and her colleagues measured ent instability of systems close to a tipping point remains a critical slowing down with a power-law-like behaviour. unresolved. For a review of self-organised criticality, see Once the light stress reached a certain threshold, equiv- [63] and [64]. alent to a critical point, the population collapsed. The new steady state that the population had tipped into was that of death. There are many other complex sys- D. Robustness of Complex Networks tems where critical slowing down has been suspected or observed – for example, in the food web of a lake after Network structures are ubiquitous in the interactions introduction of a predator species [54], in marine ecosys- tems in the Mediterranean after experimental removal of within a complex system. It is therefore not surprising that complex networks have grown into their own subfield the algal canopy [55], and in paleoclimate data around the time of abrupt climatic shifts [56]. For a review of of complex systems research. Many examples of networks have been mentioned in this book, from protein-protein critical slowing down in ecosystems, see [51]. For many more examples of criticality in complex systems, ranging interactions and neural networks to financial networks and online social networks. A network is a collection of from geological to financial systems, see [33]. nodes connected via edges. The degree of a node is the number of edges connected to it. The nature of nodes and edges differs for each system. In protein-protein net- C. Self-Organised Criticality and Scale Invariance works the nodes are proteins; two nodes are connected by an edge if they interact, either biochemically or through Power laws are an example of nonlinearity, as discussed electrostatic forces. A path is a sequence of nodes such in Section VII. Power-law behaviour is also an example that every two consecutive nodes in the sequence are con- of instability since a power-law behaviour in the recov- nected by an edge. The path length is the number of ery time is the signature of a system being driven to- edges traversed along the sequence of a path. The aver- wards a critical point, as discussed. It is, therefore, un- age shortest path is the sum of all shortest path lengths expected that many complex systems exhibit a power-law divided by their number. The phrase ‘six degrees of sepa- behaviour without any visible driving force and that they ration’ refers to the average path length between nodes in are nevertheless relatively stable. It appears that such social networks. This goes back to a now famous experi- systems stay close to a critical point ‘by their own choice’, ment performed by Stanley Milgram and his team in the a phenomenon called self-organised criticality. When it 1960s [65]. Milgram gave letters to participants randomly was discovered in a one-dimensional lattice of coupled chosen from the population of the United States. The let- maps [57] and later observed in a computer model of ters were addressed to a person unknown to them, and avalanches [58], it sparked a whole surge of studies into they were tasked with handing their letter to a person the mechanism behind self-organised criticality. This they knew by first name and who they believed would be surge was fueled by experimental observations of power- more likely to know the recipient. This led to a chain of law-like behaviour in a range of different systems, such passings-on for each letter. Surprisingly, letters reached as the Earth’s mantle and the magnitude and timing of the addressee, on average, after only five intermediaries. earthquakes and their afterquakes, or the brain and the The stability of average path length is one proxy for the timing of neurons [59–61]. In these systems, the rele- robustness of a network. When edges or nodes are re- vant observable, magnitude or timing, was measured as moved from the network and the average path length a histogram of frequencies of events. The probability stays more or less the same, the network is considered distribution P (x) of events x, constructed from the data, robust (in this respect). Reka Albert, Hawoong Jeong decays approximately as a power law, P (x) = cx−γ . As and Albert-L´asl´oBarab´asi found that the Internet and remarked above, the true functional form of these decays the World Wide Web are very robust in precisely this way is still debated; it is rarely more than an approximate [66]. The shortest path is hardly affected upon the ran- power law [35]. A power law implies so-called scale in- dom removal of nodes. Albert and her colleagues studied variance, since ratios are invariant to scaling of the ar- the structure of the World Wide Web and the Internet gument: P (cx1)/P (cx2) = P (x1)/P (x2). Scale invari- by taking real-world data and artificially removing nodes ance has been observed in many natural as well as social in a computer simulation. Plotting the shortest path complex systems [33], including scale invariance of the against the fraction of nodes removed from the network statistics of population sizes of cities [62]. revealed that the path length initially stayed approxi- While a power law in the auto-correlation function in- mately constant. Only once a large fraction of the nodes dicates instability and the vicinity of a critical point, a had been removed did the length suddenly and dramat- power law in a statistical distribution may indicate self- ically increase. This sudden increase is a form of phase organised criticality which is associated with stability. transition between a well-connected phase and a discon- Three decades after the discovery of self-organised crit- nected phase. It is seen already in the simplest model icality, there still is no known mechanism for it. The of networks, the Erd¨os-Renyi random graph model dis- seeming contradiction between the robustness of a com- cussed above (see Newman 3). Other real-world networks plex system, one of its emerging features, and the inher- exhibiting this structural form of robustness are protein 11 networks [67], food webs [68], and social networks [69]. can be interpreted as the joint probabilities Pr(i, j) for Robustness is always with respect to a feature or func- the event of an edge to be attached to a node in clus- tion. Robustness with respect to one feature might not ter i and the joint event of this edge to end on a node imply robustness with respect to another. The Internet, in cluster j. If these two events are independent, the for example, is robust against random removal of nodes joint probability distribution is equal to its product dis- (servers), but it is considerably less robust to targeted tribution, Pr(i, j) = Pr(i) Pr(j). If, on the other hand, removal of the highest-degree nodes. Pr(i, j) = Pr(i) Pr(j), then· the probability Pr(i, j) is de- pendent6 on whether· i and j are the same cluster (i = j) or not. With such a dependence present, there is modular- IX. NESTED STRUCTURE AND ity in the network. This condition of a joint probability MODULARITY distribution being a non-product distribution was a con- dition for ‘nonlinearity as correlations’ (Section VII C). Nested structure and modularity are two distinct phe- Newman and Girvan use this dependency condition nomena, but they may be related. ‘Nested structure’ to define modularity Q as any deviation of the joint refers to structure on multiple scales. ‘Modularity’ is a probability distribution Pr(i,i) of edges connecting nodes functional division of labour, or specialisation of function within the same cluster from the product distribution Pr(i) Pr(i). In this sense, modularity is a form of non- among parts, or a structural modularity and frequently · all of these together. linearity as correlations. In the above matrix notation, the probability Pr(i) = e . This can be understood Structural modularity is a property much discussed es- j ij pecially in the context of networks, where it is referred as the so-called marginalP probability of picking any edge to as ‘clustering’. A cluster in a network is a collec- in the network and for that edge to start in cluster i. tion of nodes that have many edges between one another Modularity is then defined as: compared to only few edges to nodes in the rest of the 2 network. A simple example is the network of online so- Q := eii  eij   . (16) cial connections such as the network of ‘friends’ on Face- − book. This network of social connections tends to be Xi Xj     highly clustered since two ‘friends’ of any given user are   more likely to also be ‘friends’ than to be unrelated. This measure of modularity is also taken as an opti- Finding clusters in networks has received considerable misation function for community detection algorithms, attention, and many so-called clustering algorithms have but limitations to its effectiveness have been pointed out been proposed. For an introduction to clustering algo- [72, 73]. rithms, see, for example, [3]. All clustering algorithms Many natural systems exhibit structure that is re- follow a similar principle. Given a network, they ini- peated again on a smaller scale; the structure is nested tially group the nodes into arbitrary communities, and, within itself. A cauliflower exhibits this particular form by some measure unique to each technique, they quan- of spatial scale invariance in the structure of the florets tify the linking strength within each community and that consisting of smaller florets and so forth. BenoˆıtMan- in between communities. Information theoretic distance delbrot discovered the mathematics of such nested struc- is one such measure [70]. The algorithms then optimise tures, for which he coined the term fractal. Fractals are the communities by moving nodes between them until mathematical objects with a perfect scale invariance, a the linking strength within each community is maximised repetition of structure at an infinite number of scales [74]. and the linking strength in between communities is min- Mandelbrot’s now famous book, The Fractal Geometry of imised. There is usually no unique solution to this opti- Nature [75], revealed the ubiquitous presence of fractal misation problem, and the identified clusters might differ structure in natural systems, both living and nonliving. from algorithm to algorithm. The presence of clusters Fractals have the mathematical property of a non-integer alone is not sufficient for modularity since the network dimension, and therefore fractal dimension is sometimes could consist of one gigantic cluster, with every node be- used as an indicator of nested structure (e.g., in ecology; ing connected to most other nodes, and have no modu- [76]). For example, a circle has dimension 2, a sphere larity at all. has dimension 3, and the dimension of a cauliflower is Once a community structure of a network has been estimated at 2.8 [77]. identified, the extent to which it is modular can then be Another indicator for multiple scales is the power-law quantified. One of the first measures designed to quan- decay in a correlation function (see Section VIII). For ex- tify structural modularity is the modularity measure by ample, the number of websites in the visible World Wide Mark Newman and Michelle Girvan [71]. It assumes that Web as a function of their degree approximately follows a community structure of a given network has been iden- a power law with an exponent γ, which, in 1999, was tified and that k clusters of nodes have been found. From estimated at 2.1 [78]. This power-law decay is due to these k clusters, a k k matrix e is constructed in which clusters of websites being nested within bigger clusters × the entries eij are the fraction of edges that link nodes of websites. The World Wide Web has tens of billions of in cluster i to nodes in cluster j. The matrix entries web pages, but only a few dozen domains own most of the 12 links.[79] These central domains are linked to each other, complex systems require a long history to develop. Any as well as to web pages within their own domain, and measure of correlations in time, including the statistical they also connect to large clusters of less-well-connected complexity discussed below, can be considered a measure domains. Each of these clusters has, again, a few highly of memory. connected domains. This structure of clusters of sites with a few highly linked domains repeats at ever smaller scales. This self-similar nesting of clusters is much stud- XI. COMPUTATIONAL MEASURES ied in complex networks [80, 81]. Methods based on sta- tistical inference for identifying nested clusters have also Many of the growing number of measures of complex- been developed [82]. ity are based on computational concepts such as algorith- The presence of scale invariance in the degree distri- mic complexity and compressibility.[84] The previous sec- bution of a network can be reproduced by a model of tion showed that complexity measures capture features network growth first considered by Derek [83]. Starting of complexity but not complexity as such. This section with a small network, new nodes are added and con- discusses measures that consider complex systems to be nected by an edge to an existing node with a probabil- computational devices with memory and computational ity proportional to the existing node’s degree. Hence, power. All of these measures are reminiscent of thought any new edge will affect the probability of future edges experiments in that they are not implementable in prac- being added. Connecting a large number of nodes fol- tice or even in principle. Although these measures are lowing this rule results in a network where a few nodes now decades old (and none measure complexity as such) have a very high number of edges, and most nodes have they are included here because they have had a consider- very few. The algorithm is called the preferential attach- able influence on thinking about complexity. We explain ment algorithm. It is a variant of a random graph model what feature of complexity each measures. (see Section III), and it describes the rich-get-richer ef- fect seen in economics. It clearly has feedback built into it. The initial degree distribution might be uniformly random, but, after many iterations, it gets locked into A. Thermodynamic Depth a very skewed distribution due to the feedback of previ- ously formed edges on future edge formation. The prefer- Thermodynamic depth was introduced by ential attachment mechanism illustrates why power laws Seth Lloyd and [85]. Lloyd and Pagels have been, and still are, a central theme in many stud- started out with the intuition that a complex system is ies of complex systems. Power-law-like behaviour can neither perfectly ordered nor perfectly random and that serve as an indicator for several of the features of com- a complex system plus a copy of it is not much more plex systems identified in this book: nonlinearity, (lack complex than one system alone. To specify the order of) robustness, nested structure, and feedback. This also of a complex system they consider the physical state of suggests that these phenomena are not isolated from each the system at time tn, calling it sn. In any stochas- other. tic setting, a given state can be preceded by more than one state. In other words, the set of states a system was in at times t to t − , a trajectory of length n 1, 1 n 1 − X. HISTORY AND MEMORY is not unique. Assigning a probability to any such tra- jectory which leads to state sn, Pr(s1,s2,...,sn−1 sn), the thermodynamic depth of state s is defined| as The various measures of complexity measure different n k ln Pr (s ,s ,...,s − s ) averaged over all possible features of complex systems, all of which arise because − 1 2 n 1| n trajectories s ,s ,...,s − , of their histories. Hence, other measures can be used as 1 2 n 1 proxies for history. For example, a network may have (s )= k Pr (s ,s ,...,s − s ) a definite growth rate so that the size of the network D n − 1 2 n 1| n can be used as a measure of its age. Another way to s1,...,sXn−1 (17) measure the history of a complex system is to measure ln Pr (s1,s2,...,sn−1 sn) , the structure it has left behind, because the more of it · | there is, the longer the history required for it to sponta- where k is the Boltzmann constant from statistical me- neously arise as a result of the complex system’s internal chanics. In this view, the complexity of a system is given dynamics and interaction with the environment. How- by the thermodynamic depth of its state. The intuition ever, in some cases, the structure in the world is built that the thermodynamic depth is intended to capture is very quickly and deliberately rather than arising sponta- that systems with many possible and long histories are neously, like a beaver’s dam or a ploughed field. Back- more complex than systems which have short, and thus ground knowledge is needed to know how to relate such necessarily fewer possible, histories. What this definition structure to history. There are no direct measures of his- leaves open, and arguably subjective, is how to find the tory used in practice, but the logical depth discussed in possible histories, their lengths and what probabilities the next section was introduced to capture the idea that to assign to them [86]. Thus, practically, the measure 13 is not implementable. The rate of increase of thermo- of the process, the transition probabilities between the dynamic depth when considering histories further and causal states are stationary and form a stochastic ma- further back in time is mathematically an entropy rate, trix. Hence, the computational representation obtained which is a measure of disorder (see Section III). Thus, by this algorithm is a stochastic finite state automaton while the intention was for thermodynamic depth to be or, equivalently, a hidden Markov model [92, 93] and is a measure of history, it is in fact a measure of disorder. called ǫ-machine.[94] This was pointed out in [87]. The stationary probability distribution P of the ǫ- machine’s causal states s , which is the left eigen- vector of its stochastic transition∈ S matrix with eigenvalue B. Statistical Complexity and True Measure 1, is used to define the statistical complexity, Cµ, of a Complexity process:

The quantitative theory of self-generated complexity, C := P (s) log P (s) , (19) µ − 2 introduced by Peter Grassberger [26], and com- Xs∈S putational mechanics, introduced by physicists James Crutchfield and Karl Young [88, 89], are similar frame- where is the set of causal states. Cµ is the Shannon works that go beyond providing a measure to inferring entropyS of the stationary probability distribution. This a computational representation for a complex system. reflects the computational viewpoint of the authors since, The former comes with a measure called true measure technically, the Shannon entropy is the minimum num- complexity, the latter with a measure called statistical ber of bits required to encode the set with probabil- complexity. Since computational mechanics has been de- ity distribution P . Thus, the statisticalS complexity is veloped in more detail (see [90]), we focus on it here. a measure of the minimum amount of memory required The assumption of computational mechanics is that a to optimally encode the set of behaviours of the com- complex system is an information-storing and -processing plex system. It is worth noting that, for a given string, entity. Hence, any structured behaviour it exhibits is the the statistical complexity is lower bounded by the ex- result of a computation. The starting point of the in- cess entropy/predictive information (see eq. 12 above), ference method is a representation of the system’s be- Cµ Ipred (eq. 12) [28, 95, 96]. This mathematical fact haviour as a string such as, for example, a time sequence agrees≥ with the intuition that a system must store at least of measurements of its location.[91] The symbols in this as much information as the structure it produces. The measurement sequence generally form a discrete and fi- statistical complexity has been computed for the logistic nite set (for background, see Appendix A). Once a string map [88], for protein configurations [97, 98], atmospheric of measurement data has been obtained, the regulari- turbulence [99], and for self-organisation in cellular au- ties are extracted using an algorithm which is briefly ex- tomata [100]. plained below, and a computational representation is in- [89, p. 24] writes that “an ideal random process has ferred which reproduces the statistical regularities of the zero statistical complexity. At the other end of the spec- string. Computational representations can, in principle, trum, simple periodic processes have low statistical com- be anything from the Chomsky hierarchy of computa- plexity. Complex processes arise between these extremes tional devices [92], but in concrete examples they usually and are an amalgam of predictable and stochastic mech- are finite-state automata. The size of this automaton is anisms.” This statement, though intuitive, is obscuring the basis for the statistical complexity measure. the fact that the statistical complexity increases mono- The algorithm for inferring the computational repre- tonically with the order of the string. For a proof consider sentation of a string assumes that a stationary stochas- the following. For a given number of causal states the tic process Xt t∈Z has generated the string in question statistical complexity has a unique maximum at uniform { } (for a definition of stationary stochastic process, see Ap- probability distribution over the states. This is achieved pendix A). As a next step, statistically equivalent strings by a perfectly periodic sequence with period equal to the ′ are grouped together. Two strings, x and x , are statisti- number of states. When deviations occur, the probabil- cally equivalent if they have the same conditional proba- ity distribution will, in general, not be uniform anymore, bility distribution over the subsequent symbol a : and the Shannon entropy and with it the statistical com- ∈ X plexity will decrease. On the other hand, increasing the −1 −1 ′ P (X = a X = x)= P (X = a X ′ = x ), 0 | t 0 | t (18) period of the sequence requires an increased number of for all a . causal states and, thus, implies a higher statistical com- ∈ X plexity. Hence, the statistical complexity scores higher The two strings do not have to be of the same length. for highly ordered strings than for strings with less order The equivalence class of a substring x is denoted by ǫ(x), or with random bits inserted. The statistical complexity and it contains all strings statistically equivalent to string is a measure of order produced by the system, as well as x, including x itself. These classes are called ‘causal a measure of memory of the system itself. The strength states’, a somewhat unfortunate name since no causal- of the framework of computational mechanics lies in de- ity is implied in any strict sense. Due to the stationarity tecting order in the presence of disorder. 14

C. Effective Complexity The instruction “find the ensemble (of a rain forest, for example) and determine its typical members” leaves too Effective complexity was introduced by physicists Mur- many things unspecified for this measure to be practica- ray Gell-Mann and Seth Lloyd [101]. Gell-Mann and ble [103]. Lloyd’s starting point is common to many measures of complexity of that time: the measure should capture the property of a complex system of being neither completely D. Logical Depth ordered nor completely random. They assume the com- plex system can be represented as a string of bits; call it Computer scientist Charles Bennett introduced logi- s. This string is some form of unique description of the cal depth to measure a system’s history [104]. Bennett system or of its behaviour or the order it produced. The argues that complex objects are those whose most plau- algorithmic complexity (for a definition, see Appendix C) sible explanations involve long causal processes. This of this string of bits is a measure of its randomness or lack idea goes back to Herbert Simon’s influential paper, ‘The of compressibility. The more regularities a string has, the Architecture of Complexity’ [105]. To develop a mathe- lower is its algorithmic complexity. Hence, Gell-Mann matical definition of causal histories of complex systems, and Lloyd consider the algorithmic complexity not of the Bennett replaces the system to be measured by a descrip- string itself, but of the ensemble (a term taken from sta- tion of the system, given as a string of bits. This pro- tistical mechanics) of strings with the same regularities as cedure should be very familiar by now. He equates the the string in question. Let E be this ensemble of strings causal history of the system with the algorithmic com- with the same regularities. The effective complexity of plexity of the string (the length of the shortest program the string (and thus the system which it represents) is which outputs the string; see Appendix C). The shorter defined as the algorithmic complexity of the ensemble E the program which outputs a system’s description, the in which it is embedded as a typical member. (‘Typical’ more plausible it is as its causal history. A program con- is a technical term here, but it captures exactly what we sisting of the string itself and the ‘print’ command has intuitively think ‘typical’ should mean.) Ensemble mem- high algorithmic complexity, and it offers no explanation bers s E are called typical if log Pr(s) KU (E), whatsoever. It is equivalent to saying ‘It just happened’ ∈ − ≈ where KU (E) is the algorithmic complexity of E (see and so is effectively the null-hypothesis. A program with Appendix C).[102] Assigning a probability to each string instructions for computing the string from some initial in a set is less arbitrary than it sounds. It has been shown conditionsmust contain some description of its history that the probability Pr(s) of a string s is related to its and thus is a more explanatory hypothesis. algorithmic complexity as log Pr(s) KU (s E) where In addition to considering a program’s length as a mea- − ≈ | KU (s E) is the algorithmic complexity of the string s sure of causal history, Bennett also takes the program’s given| a description of the set E. The effective complexity running time into account. A program which runs for ε(s) of a string s is defined as the algorithmic complexity a long time before outputting a result signifies that the of the ensemble E of which it is a typical member, string has a complicated order that needs unravelling. The definition of logical depth is then as follows. Let x ε(s)= KU (E) . (20) be a finite string, and KU (x) its algorithmic complexity. For example, the ensemble of a string which is perfectly The logical depth of x at significance level s is defined as random is the set of all strings of the same length. This the least time T (p) required for program p to compute x set allows for a very short description, by giving the and then halt where the length of program p, l(p), cannot length of the strings only. This trick of embedding the differ from KU (x) by more than s bits, string in a set of similar strings exactly achieves what Ds(x) := min T (p): l(p) KU (x) s,U(p)= x . Gell-Mann and Lloyd set out to do. A string with many p { − ≤ } regularities over many different length scales, which is (21) how we think of a complex system, will be assigned a high effective complexity. Random systems, in their structure Logical depth combines the features of order and his- or behaviour, will be assigned very low effective complex- tory into a single measure. Consider the structure of a ity. According to [101], the effective complexity can be protein, for example. One possible program prints the high only in a region intermediate between total order electron and nuclear densities verbatim, with discretised and complete disorder. However, replacing some of the positional information, which is a very long program run- regular bits in a string by random bits decreases its reg- ning very fast. Another program computes the structure ularities and hence its effective complexity. Just like the ab initio by running quantum chemical calculations. This true measure complexity and the statistical complexity, would be a much shorter program but running for a long the effective complexity increases monotonically with the time. The latter captures the protein’s order and his- amount of order present. This places the effective com- tory. The real causal history of a protein is, of course, plexity among the measures of order. Gell-Mann and very long, starting with the beginning of life on Earth, Lloyd note that this measure is subjective, since what or even with the beginning of the universe. The logical to count as a regular feature is the observer’s decision. depth captures our intuition that complex systems have 15 a long history. A practical problem is that the time point tures (which is part of [1]) is intended to bring clarity to when the history of a system starts is not well defined. the discussion of complexity across all disciplines. The Another aspect which makes it impractical to use is that examples cited from economics and neuroscience of pur- the algorithmic complexity is uncomputable in principle, ported complexity measures are prominent in the litera- although approximations exist such as the Lempel-Ziv ture. We clarified the interpretation of some of them and algorithm [106]. incorporated them all into our account as measures of particular features of complexity. Any measure in com- plexity science can be interpreted as measuring an aspect XII. CONCLUSION of one of the features of complexity, and any identifying a feature of complexity that is important for a given system The study of coupled human and natural systems is makes it easier to select, modify or design an appropriate vital to our survival. The science that is required to un- measure for it. derstand and predict phenomena such as climate change For further and in-depth analysis and discussion we re- and migration is necessarily multidisciplinary. For sci- fer the interested reader to our book, “What is a complex ence to progress developing the right mathematical tools system?”, published with Yale University Press in 2020 is essential, including in the social sciences. A unify- [1]. ing framework for complexity facilitates this collective endeavour. Furthermore, it facilitates quantitative ap- proaches to issues such as economic growth and income inequality. However, there is confusion about the na- ACKNOWLEDGEMENT ture of complexity even within the natural sciences. The analysis in terms of complexity developed in [1] and the We thank Yale University Press for kindly agreeing to above analysis of mathematical measures of these fea- us making this chapter publicly available.

[1] J. Ladyman and K. Wiesner, What is a complex system? [15] P. Davies and N. H. Gregersen, eds., Information and the (Yale University Press, 2020). Nature of Reality: From Physics to Metaphysics (Cam- [2] M. Gell-Mann, What is complexity?, Complexity 1, 16 bridge University Press, 2014). (1995). [16] P. Nurse, Life, logic and information, Nature 454, 424 [3] M. Newman, Networks: An Introduction, 1st ed. (Oxford (2008). University Press, 2010). [17] S. R. De Groot and P. Mazur, Non-Equilibrium Thermo- [4] P. Erd¨os and A. R´enyi, On the evolution of random dynamics (Courier Corporation, 2013). graphs, Publication of the Mathematical Institute of the [18] N. G. Van Kampen, Stochastic Processes in Physics and Hungarian Academy of Sciences 5, 17 (1960). Chemistry, Vol. 1 (Elsevier, 1992). [5] E. Bullmore and O. Sporns, The economy of brain net- [19] Some scientific fields use the reverse order, {Pji}. work organization, Nature Reviews Neuroscience 13, 336 [20] N. Ikeda and S. Watanabe, Stochastic Differential Equa- (2012). tions and Diffusion Processes (North-Holland, 2014). [6] R. van Steveninck et al., Reproducibility and variability [21] K. Friston, The free-energy principle: A unified brain in neural spike trains, Science 275, 1805 (1997). theory?, Nature Reviews Neuroscience 11, 127 (2010). [7] L. Jost, Entropy and diversity, Oikos 113, 363 (2006). [22] S. Still, D. A. Sivak, A. J. Bell, and G. E. Crooks, Ther- [8] A. Fabiani, F. Galimberti, S. Sanvito, and A. R. Hoelzel, modynamics of prediction, Physical Review Letters 109, Extreme polygyny among southern elephant seals on sea 120604 (2012). lion island, falkland islands, Behavioral Ecology 15, 961 [23] W. Bialek et al., Statistical mechanics for natural flocks (2004). of birds, Proceedings of the National Academy of Sci- [9] S. E. Page, Diversity and Complexity, 1st ed. (Princeton ences 109, 4786 (2012). University Press, 2010). [24] They used the convention from statistical mechanics [10] M. L. Weitzman, On diversity, The Quarterly Journal of in which the uncorrelated average product is not sub- Economics 107, 363 (1992). tracted. Thus, their covariance is the statistical mechan- [11] J. M. Epstein and R. Axtell, Growing Artificial Societies: ical correlation function E[XY ]. Social Science from the Bottom Up (Brookings Institu- [25] E. Schneidman, M. J. Berry, R. Segev, and W. Bialek, tion Press and MIT Press, 1996). Weak pairwise correlations imply strongly correlated net- [12] I. D. Couzin and J. Krause, Collective memory and spa- work states in a neural population, Nature 440, 1007 tial sorting in animal groups, Theoretical Biology 218 (2006). (2002). [26] P. Grassberger, Toward a quantitative theory of self- [13] I. D. Couzin and N. R. Franks, Self-organized lane forma- generated complexity, International Journal of Theoret- tion and optimized traffic flow in army ants, Proceedings ical Physics 25, 907 (1986). of the Royal Society of London B: Biological Sciences [27] W. Bialek, I. Nemenman, and N. Tishby, Predictability, 270, 139 (2003). complexity, and learning, Neural Computation 13, 2409 [14] S. Lloyd, Programming the Universe (Knopf, 2006). (2001). 16

[28] J. P. Crutchfield and D. P. Feldman, Regularities un- [52] A. J. Dunleavy, K. Wiesner, R. Yamamoto, and C. P. seen, randomness observed: Levels of entropy conver- Royall, Mutual information reveals multiple structural gence, Chaos: An Interdisciplinary Journal of Nonlinear relaxation mechanisms in a model glass former, Nature Science 13, 25 (2003). communications 6, 6089 (2015). [29] S. E. Palmer, O. Marre, M. J. Berry, and W. Bialek, Pre- [53] A. J. Veraart et al., Recovery rates reflect distance to a dictive information in a sensory population, Proceedings tipping point in a living system, Nature 481, 357 (2012). of the National Academy of Sciences 112, 6908 (2015). [54] S. R. Carpenter et al., Early warnings of regime shifts: A [30] M. Kleiber, Body size and metabolism, Hilgardia 6, 315 whole-ecosystem experiment, Science 332, 1079 (2011). (1932). [55] L. Benedetti-Cecchi, L. Tamburello, E. Maggi, and [31] G. B. West, J. H. Brown, and B. J. Enquist, A general F. Bulleri, Experimental perturbations modify the per- model for the origin of allometric scaling laws in biology, formance of early warning indicators of regime shift, Cur- Science 276, 122 (1997). rent Biology 25, 1867 (2015). [32] M. Newman, Power laws, Pareto distributions and Zipf’s [56] V. Dakos et al., Slowing down as an early warning signal law, Contemporary Physics 46, 323 (2005). for abrupt climate change, Proceedings of the National [33] D. Sornette, Critical Phenomena in Natural Sciences: Academy of Sciences 105, 14308 (2008). Chaos, Fractals, Selforganization and Disorder: Con- [57] J. D. Keeler and J. D. Farmer, Robust space-time inter- cepts and Tools, 2nd ed. (Springer, 2009). mittency and 1f noise, Physica D: Nonlinear Phenomena [34] M. Mitzenmacher, A brief history of generative mod- 23, 413 (1986). els for power law and lognormal distributions, Internet [58] P. Bak, C. Tang, and K. Wiesenfeld, Self-organized crit- Mathematics 1, 226 (2004). icality, Physical Review A 38, 364 (1988). [35] A. Clauset, C. R. Shalizi, and M. Newman, Power-law [59] G. Z¨oller, S. Hainzl, and J. Kurths, Observation of grow- distributions in empirical data, SIAM Review 51, 661 ing correlation length as an indicator for critical point (2009). behavior prior to large earthquakes, Journal of Geophys- [36] N. N. Taleb, The Black Swan: The Impact of the Highly ical Research: Solid Earth 106, 2167 (2001). Improbable (Random House, 2007). [60] D. Sornette, Predictability of catastrophic events: Mate- [37] R. A. Fisher, The wave of advance of advantageous genes, rial rupture, earthquakes, turbulence, financial crashes, Annals of Eugenics 7, 355 (1937). and human birth, Proceedings of the National Academy [38] R. S. MacKay, Nonlinearity in complexity science, Non- of Sciences 99, 2522 (2002). linearity 21, T273 (2008). [61] E. Bullmore and O. Sporns, Complex brain networks: [39] H. M. Blalock, ed., Causal Models in the Social Sciences Graph theoretical analysis of structural and functional (Routledge, 1985). systems, Nature Reviews Neuroscience 10, 186 (2009). [40] S. L. Pimm, Food Webs (Springer, 1982). [62] L. M. A. Bettencourt, The origins of scaling in cities, [41] N. Rooney, K. McCann, G. Gellner, and J. C. Moore, Science 340, 1438 (2013). Structural asymmetry and the stability of diverse food [63] G. Pruessner, Self-Organised Criticality: Theory, Mod- webs, Nature 442, 265 (2006). els and Characterisation (Cambridge University Press, [42] H. de Jong, Modeling and simulation of genetic regula- 2012). tory systems: A literature review, Journal of Computa- [64] N. W. Watkins et al., 25 years of self-organized criticality: tional Biology 9, 67 (2002). Concepts and controversies, Space Science Reviews 198, [43] K. Friston, Causal modelling and brain connectivity in 3 (2016). functional magnetic resonance imaging, PLoS Biology 7, [65] S. Milgram, The small world problem, Psychology today e1000033 (2009). 2, 60 (1967). [44] S. H. Strogatz, Nonlinear Dynamics and Chaos: With [66] R. Albert, H. Jeong, and A.-L. Barab´asi, Error and at- Applications to Physics, biology, chemistry, and engi- tack tolerance of complex networks, Nature 406, 378 neering (Westview Press, 2014). (2000). [45] C. S. Holling, Resilience and stability of ecological sys- [67] H. Jeong, S. P. Mason, A.-L. Barab´asi, and Z. N. Oltvai, tems, Annual Review of Ecology and Systematics 4, 1 Lethality and centrality in protein networks, Nature 411, (1973). 41 (2001). [46] M. Scheffer, Complex systems: Foreseeing tipping points, [68] J. A. Dunne, R. J. Williams, and N. D. Martinez, Food- Nature 467, 411 (2010). web structure and network theory: The role of con- [47] J.-P. Aubin, A survey of viability theory, SIAM Journal nectance and size, Proceedings of the National Academy on Control and Optimization 28, 749 (1990). of Sciences 99, 12917 (2002). [48] J.-P. Aubin, Viability Theory (Springer Science & Busi- [69] M. Newman, S. Forrest, and J. Balthrop, Email networks ness Media, 2009). and the spread of computer viruses, Physical Review E [49] C. B´en´eand L. Doyen, From resistance to transforma- 66, 035101 (2002). tion: A generic metric of resilience through viability, [70] M. Rosvall and C. T. Bergstrom, Maps of random walks Earth’s Future 6, 979 (2018). on complex networks reveal community structure, Pro- [50] G. Deffuant and N. Gilbert, Viability and Resilience of ceedings of the National Academy of Sciences 105, 1118 Complex Systems: Concepts, Methods and Case Studies (2008). from Ecology and Society (Springer Science & Business [71] M. Newman and M. Girvan, Finding and evaluating com- Media, 2011). munity structure in networks, Physical review E 69, [51] M. Scheffer, S. R. Carpenter, V. Dakos, and E. H. van 026113 (2004). Nes, Generic indicators of ecological resilience: Inferring [72] S. Fortunato and M. Barthelemy, Resolution limit the chance of a critical transition, Annual Review of Ecol- in community detection, Proceedings of the National ogy, Evolution, and Systematics 46, 145 (2015). Academy of Sciences 104, 36 (2007). 17

[73] U. Brandes et al., On modularity clustering, IEEE Trans- [97] C.-B. Li, H. Yang, and T. Komatsuzaki, Multiscale com- actions on Knowledge and Data Engineering 20, 172 plex network of protein conformational fluctuations in (2007). single-molecule time series, Proceedings of the National [74] K. Falconer, Fractal Geometry: Mathematical Founda- Academy of Sciences 105, 536 (2008). tions and Applications (John Wiley & Sons, 2004). [98] D. Kelly, M. Dillingham, A. Hudson, and K. Wiesner, [75] B. B. Mandelbrot, The Fractal Geometry of Nature (W. A new method for inferring hidden markov models from H. Freeman, 1983). noisy time sequences, PloS One 7, e29703 (2012), [76] G. Sugihara and R. M. May, Applications of fractals in http://www.mathworks.com/matlabcentral/ ecology, Trends in Ecology & Evolution 5, 79 (1990). fileexchange/33217. [77] S.-H. Kim, Fractal structure of a white cauliflower, arXiv [99] A. J. Palmer, C. W. Fairall, and W. A. Brewer, Complex- preprint cond-mat/0409763 (2004). ity in the atmosphere, IEEE Transactions on Geoscience [78] A.-L. Barab´asi and R. Albert, Emergence of scaling in and Remote Sensing 38, 2056 (2000). random networks, Science 286, 509 (1999). [100] C. R. Shalizi, K. L. Shalizi, and R. Haslinger, Quanti- [79] A domain is the name you need to buy or register, such fying self-organization with optimal predictors, Physical as google.com. Any web page with a url within this do- Review Letters 93, 118701 (2004). main name, such as www.google.com/maps is part of this [101] M. Gell-Mann and S. Lloyd, Information measures, ef- domain. fective complexity, and total information, Complexity 2, [80] M. Newman, The structure and function of complex net- 44 (1996). works, SIAM Review 45 (2003). [102] The idea of replacing entropy (the average of − log Pr(s) [81] E. Ravasz and A.-L. Barab´asi, Hierarchical organiza- is an entropy) by algorithmic complexity goes back to tion in complex networks, Physical Review E 67, 026112 Wojciech [110]. (2003). [103] J. W. McAllister, Effective complexity as a measure [82] A. Clauset, C. Moore, and M. Newman, Hierarchical of information content, Philosophy of Science 70, 302 structure and the prediction of missing links in networks, (2003). Nature 453, 98 (2008). [104] C. H. Bennett, Logical depth and physical complexity, in [83] D. Price, A general theory of bibliometric and other cu- The Universal Turing Machine – a Half-Century Survey, mulative advantage processes, Journal of the Association edited by R. Herken (Oxford University Press, 1991) pp. for Information Science and Technology 27, 292 (1976). 227–257. [84] [107] produced a ‘non-exhaustive list’ of over forty, and [105] H. A. Simon, The architecture of complexity, Proceed- many more measures have been defined since then. ings of the American Philosophical Society 106, 467 [85] S. Lloyd and H. Pagels, Complexity as thermodynamic (1962). depth, Annals of Physics 188, 186 (1988). [106] J. Ziv and A. Lempel, A universal algorithm for sequen- [86] J. Ladyman, J. Lambert, and K. Wiesner, What is a tial data compression, IEEE Transactions on Information complex system?, European Journal for Philosophy of Theory 23, 337 (1977). Science 3, 33 (2013). [107] S. Lloyd, Measures of complexity: A nonexhaustive list, [87] J. P. Crutchfield and C. R. Shalizi, Thermodynamic Control Systems Magazine, IEEE 21, 7 (2001). depth of causal states: Objective complexity via mini- [108] C. R. Shalizi and K. Klinkner, An algorithm for mal representations, Physical Review E 59, 275 (1999). building markov models from time series (2003), [88] J. P. Crutchfield and K. Young, Inferring statistical com- http://bactra.org/CSSR/. plexity, Physical Review Letters 63, 105 (1989). [109] Computational Mechanics Group, (2015), [89] J. P. Crutchfield, The calculi of emergence: Computa- http://cmpy.csc.ucdavis.edu/. tion, dynamics and induction, Physica D: Nonlinear Phe- [110] W. H. Zurek, Algorithmic randomness and physical en- nomena 75, 11 (1994). tropy, Physical Review A 40, 4731 (1989). [90] C. R. Shalizi and J. P. Crutchfield, Computational me- [111] C. E. Shannon, A Mathematical Theory of Communica- chanics: Pattern and prediction, structure and simplic- tion, Tech. Rep. (Bell Labs, 1948). ity, Journal of Statistical Physics 104, 817 (2001). [112] T. M. Cover and J. A. Thomas, Elements of Information [91] Of course, measurement is crucial to computational me- Theory, 2nd ed. (Wiley-Blackwell, 2006). chanics, and it raises many practical questions left aside [113] M. Li and P. Vit´anyi, An Introduction to Kolmogorov here. Complexity and Its Applications, 3rd ed. (Springer, [92] J. E. Hopcroft, R. Motwani, and J. D. Ullman, Introduc- 2009). tion to Automata Theory, Languages, and Computation, 2nd ed. (Addison-Wesley, 2001). [93] A. Paz, Introduction to Probabilistic Automata (Aca- demic Press, 1971). [94] The inference algorithm is available in various languages, see, for example, [98, 108, 109]. [95] K. Wiesner, M. Gu, E. Rieper, and V. Vedral, Information-theoretic lower bound on energy cost of stochastic computation, Proc. R. Soc. A 468, 4058 (2012). [96] J. P. Crutchfield, C. J. Ellison, and J. R. Mahoney, Time’s barbed arrow: Irreversibility, crypticity, and stored information, Physical Review Letters 103, 094101 (2009). 18

APPENDIX The ratio of variance to expectation value is called the coefficient of variation, In this Appendix, the mathematical terminology used √VarX in the main text is defined, and some more background is cv := . (A5) X given to the mathematical formalism of probability the- h i ory, information theory, algorithmic complexity, and net- The covariance of two numeric random variables X work theory. and Y is defined as

Cov XY := (X X )(Y Y ) h − h i − h i i Appendix A: Probability Theory = XY X Y . (A6) h i − h ih i A stochastic process X ∈ is a sequence of ran- An alphabet is a set of symbols, numeric or sym- { t}t T bolic, continuousX or discrete, finite or infinite. Symbolic dom variables Xt, defined on a joint probability space, alphabets are discrete; continuous alphabets are numeric taking values in a common set , indexed by a set T N Z X and infinite. denotes the size of set . An exam- which is often or and thought of as time. This ple of a numeric|X | finite, discrete alphabetX is the binary book only discusses discrete time processes. A stochas- set 0, 1 ; an example of a symbolic, finite alphabet is tic process is called a Markov chain if Xt (sometimes a set{ of} Roman letters a,b,c,...,z . An example of a called ‘the future’) is probabilistically independent of numeric infinite, discrete{ alphabet is} that of the natural X0 ...Xt−2 (‘the past’), given Xn−1 (‘the present’); in numbers N, an example of a continuous alphabet is the other words, set of real numbers R. This book discusses only discrete P (X X ...X − )= P (X X − ) , for all t T. (A7) alphabets. t| 0 t 1 t| t 1 ∈ A discrete random variable X is a discrete alpha- A stochastic process is stationary if bet equipped with a probability distribution P (X) X ≡ Pr(X = x), x . We denote the probabilities P (XtXt+1 ...Xt+m)=P (Xt′ Xt′+1 ...Xt′+m), { ∈ X} (A8) Pr(X = x) by P (x) or sometimes, to avoid confusion, for all t,t′ T,m N. by P (x). The uniform distribution of a set is the ∈ ∈ X X distribution P (x)=1/ for all x . For two dis- A hidden Markov model X , Y ∈ is a stationary |X | ∈ X { t t}t T crete random variables, X and Y , the joint probabilities stochastic process of two random variables Xt and Yt Pr(X = x, Y = y) on alphabet are denoted by which forms a Markov chain in the sense that Y depends X × Y t P (xy) or sometimes, to avoid confusion, by PXY (xy). only on Xt, and Xt depends only on Yt−1 and Xt−1: The joint probability distribution induces a conditional probability distribution P (x y) Pr(X = x Y = y), P (Yt X0 ...XtY0 ...Yt−1)= P (Yt Xt) (A9) | ≡ | | | which is a probability distribution on conditioned on and Y taking particular value Y = y. AnyX joint probability P (X X ...X − Y ...Y − )= P (X X − Y − ) , P (xy) can be written as t| 0 t 1 0 t 1 t| t 1 t 1 for all t T. (A10) ∈ P (xy)= P (x y)P (y) . (A1) | The graphical representation of a hidden Markov model is a directed graph where the states are the out- The expectation value of a discrete numeric random comes x of the random variable X and the state variable X, denoted by X , is defined as ∈ X t h i transitions are labelled by the outcomes y of the ∈ Y random variable Yt and the corresponding conditional X := P (x)x . (A2) h i probability P (Yt+1 = y,Xt+1 = x Xt = x). xX∈X |

Another common notation for the expectation value of Appendix B: Shannon Information Theory X is EX. The variance of a numeric random variable X is the In the 1940s, the American engineer Claude Shannon, average deviation of X from its expectation value. De- working for Bell Labs, introduced a mathematical theory noted by Var X, it is defined as of communication that is now at the heart of every digi- tal communication protocol and technology, from mobile Var X := (X X )2 . (A3) h − h i i phones to e-mail encryption services and wireless net- works [111]. Shannon was concerned with defining and The square root of the variance of a random variable measuring the amount of information communicated by a X is called the standard deviation σ: message transmitted over a noisy telegraph line. He saw a message as communicating information if the receiver √ σ = Var X. (A4) of the message could not predict with certainty which 19 message out of a set of possible ones she would receive. For stationary stochastic processes, h = h′. The entropy By setting the amount of information communicated by rate H(Xn X1 ...Xn−1), for finite n, is denoted by hn. a message x as proportional to its inverse log probability Shannon| introduced the mutual information as a 1/ log P (x), Shannon axiomatically derived a measure of measure of correlation between two random variables X information, now called Shannon entropy. The Shannon and Y , defined as follows: entropy, a function of a probability distribution P but often written as a function of a random variable X, is PXY (xy) defined as follows [112]: I(X; Y ) := PXY (xy) log . (B7) PX (x)PY (y) xy∈X×YX H(X) := P (x) log P (x) , (B1) − The mutual information is a measure of the pre- xX∈X dictability of one random variable when the outcome of where the log is usually base 2 and 0 log 0 := 0. The the other is known. Note that the mutual information equivalent definition is symmetric in its arguments and hence measures the

n amount of information ‘shared’ by the two variables. The H(P ) := p log p , (B2) mutual information is a general correlation function for − i i two random variables, measuring both linear and non- Xi=1 linear correlations. In contrast to the covariance and where P = p1,p2,...,pn , makes it explicit that H is many other correlation measures, it is also applicable to a function of{ the probabilities} alone, independent of the non-numeric random variables such as the distribution alphabet . This book discusses only the entropy of fi- of words in an English text or the distribution of amino nite probabilityX distributions, but the definition of the acids in a DNA sequence. This is one reason why it is Shannon entropy extends to infinite but discrete, as well widely used in complex systems research. as to continuous probability distributions. Taking the logarithm to base 2 is a convention dating back to Shan- non, due to a bit being the essential unit of computation. Appendix C: Algorithmic Information Theory For a given set of messages , the Shannon entropy is X maximum for the uniform distribution and proportional A mathematical formalisation of randomness and in- to the logarithm of the total number of messages. This formation without reference to probabilities was devel- illustrates that the Shannon entropy is a measure of ran- oped independently by the Soviet mathematician Andrey domness. If one of the messages has probability 1 and Kolmogorov and the American mathematicians Ray J. the others have probability 0, then the message is per- Solomonoff and Gregory Chaitin in the 1960s. They con- fectly predictable, and the Shannon entropy is zero. The sidered information as a property of a single message, Shannon entropy is also precisely the expectation value rather than of a set of messages and their probabilities. of the function 1/ log P (x). A message is a string of letters from an alphabet, such The joint entropy of n random variables X1,...,Xn as the Roman alphabet or the binary characters 0 and 1. with joint probability distribution P (X1X2 ...Xn) is de- An example of a string is ‘Hello, World!’. The string is fined as composed of letters from the Roman alphabet and from a set containing the comma and space characters and the H(X X ...X ) := P (x x ...x ) 1 2 n − 1 2 n exclamation mark. x1X...xn∈ X1×···×Xn The algorithmic information content of a string is, roughly speaking, the length of the shortest com- log P (x1x2 ...xn) . · puter program which outputs the string. For the string (B3) ‘Hello, World!’, this is probably a program of the form Consider two random variables X and Y and joint ‘print(“Hello, World!”)’ which has roughly the same probability distribution PXY . The conditional en- length as the string itself. However, for a string of tropy of X given Y , H(X Y ), is defined as 10,000 zeros and ones alternating, the shortest program | is shorter than the string itself, and the string is called H(X Y ) := P (xy) log P (x y) . (B4) ‘compressible’. The notion of compressibility is mean- | − XY XY | xy∈X×YX ingful only with longer strings. Only perfectly random strings are completely incompressible, therefore algorith- The entropy rate of a stochastic process X ∈ is { t}t T mic information is a measure of randomness. It can be defined as confusing that the term ‘information’ is used for random- 1 ness, but one may think of randomness as the amount of h = lim H(X1X2 ...Xn) . (B5) n→∞ n information which has to be communicated to reproduce the string ‘exactly’, irrespective of how interesting the A different definition of entropy rate is as follows: string is in other respects. ′ The precise definition of algorithmic information is as h = lim H(Xn X1 ...Xn−1) . (B6) n→∞ | follows [113]. Consider a string x, a computing device 20

and programs p of length l(p). The algorithmic infor- of a node is the number of edges directed to the node, Umation of the string, K(x), is the length of the shortest and the out-degree is the number of nodes directed away program p, which, when fed into a machine , produces from it. output x, (p)= x, U Mathematically, a network of n nodes is represented by U an adjacency matrix, A, which is an n n matrix where × KU (x) = min l(p) . (C1) each non-zero entry Aij represents an edge from node i p:U(p)=x to node j [3]. In an unweighted network the Aij are 1 if The minimisation is done over all possible programs. an edge exists from node i to node j and 0 otherwise. A There is a fundamental problem with carrying out the weighted network assigns a real number to each edge, A R. Such weights could, for example, represent minimisation procedure: whether an arbitrary program ij ∈ will finish or run forever cannot be known in general. the volume of data traffic between two servers. In an This is called the halting problem. It is one of the deepest undirected network, Aij = Aji since Aij and Aji refer to results in computer science, and due to the British math- the same object. The in-degree of a node i is the number of non-zero entries in the jth column of A. The out-degree ematician Alan Turing. As a consequence, the algorith- th mic information is not computable in principle, though of a node i is the number of non-zero entries in the i it can often be approximated in practice. Other names row of A. In an undirected network, these numbers are for algorithmic information are ‘algorithmic complexity’ equal. or ‘Kolmogorov complexity’. The degree distribution of a network is the fre- The fundamental insight of Kolmogorov, Solomonoff quency distribution over node degrees. A uniform degree and Chaitin is that the minimum length of a program distribution, for example, means that nodes of degree 1 is independent of the computing device on which it is are equally likely as nodes of degree n. A path is a se- run (up to some constant which is independent of the quence of nodes such that every two consecutive nodes string). Hence, the definition of algorithmic information in the sequence are connected by an edge. In a directed refers to a universal computer, which is a fundamental network the nodes have to be connected by edges that all notion introduced by Alan Turing in the 1940s. Algo- point in the forward direction. The path length is the rithmic information is therefore a ‘universal’ notion of number of edges traversed along the sequence of a path. randomness for strings because it is context- (machine-) The shortest path between two nodes is the sequence independent. On the other hand, the Shannon entropy is with the minimum number of traversed edges to get from context-dependent, since it may assign different amounts one node to the other. The average shortest path is the of information to the same string when it is embedded in sum of all shortest path lengths divided by their number. different sets with different probabilities. The diameter of a network is the longest of all shortest The length is not the only important parameter of a paths. program; its running time is of equal importance. There are very short programs that take a long time to run, while the print program might be long but finished very quickly. This trade-off is relevant to the measures of complexity, which include the well-known logical depth.

Appendix D: Complex Networks

A network, or a graph, is a set of nodes and, for simplicity here, there is at most one edge between any ordered pair of nodes. Nodes and edges are also called vertices and links, respectively. In a directed network each edge has a directionality, beginning at one node and ending at another. In an undirected network there is no such distinction between the start and end node of an edge. An example of an undirected network is the Inter- net. The servers are nodes, and edges between them are the physical wirings. An example of a directed network is an ecological food web. Two animals are linked if one of them feeds on the other so that a predator has a directed edge to its prey. The degree of a node in a network is the number of edges attached to it. In a directed network, one distin- guishes between in-degree and out-degree. The in-degree