ON THE NATURE OF NEURAL CAUSALITY IN LARGE-SCALE BRAIN

NETWORKS: FOUNDATIONS, MODELING, AND NONLINEAR

NEURODYNAMICS

by

Michael Mannino

A Dissertation Submitted to the Faculty of

Charles E. Schmidt College of Science

In Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

Florida Atlantic University

Boca Raton, FL

December 2018

Copyright 2018 by Michael Mannino

ii

ACKNOWLEDGEMENTS

This work would really not have been possible without the many people have helped me a great deal along the way. I am so deeply indebted to my committee for their assistance, guidance, kindness, scientific brilliance, and support. Dr.

Blanks was so very kind and helpful, and so thoughtful, right from the beginning.

Her guidance these 5 years was just invaluable. And easily one of the most memorable courses I ever took was her TBI course! Dr. Jirsa’s scientific insight and intelligence was amazing to witness. I feel very grateful to have had him on my committee. In my first 2-3 years here, Dr. Barenholtz and I had many conversations which definitely changed my thinking! I remember one time he told me: “You are going to be doing so much computation, math and science, you will have forgotten the word ‘philosophy’!” He was right. Finally, Dr. Bressler has just been a true mentor. I am so grateful for the fact that he is so well versed in philosophy and that he was open to that part of my thinking. So many hours and hours of great conversation, learning so much about science, philosophy, life, and especially about myself. He willingness to guide me, teach me, and especially listen to me will not be forgotten. Dr.B, reality exists and is independent of our perception, at some level at least!

iv I’ve had so many great discussions with people that sometimes even changed

the course of my research! Conversations with Roxana Stefanesu were

amazingly illuminating, they gave me…understanding. John Griffiths discussed

at length with me so many necessary concepts for my research. I thank them

both! I would like to thank all of my current and previous lab mates, especially

Tim Meehan, who taught a great many things about our research and what its like to do PhD in neuroscience, and also Bryan Conklin and Tim West who very

convincing at just the right times! And Keyla Thamsten, her superb support

(moral as well as administrative) was instrumental during my time here! Rhona

was so kind and helpful throughout these 5 years! I also extend a sincere “thank

you” to Dr. Kelso, who founded the Center and the field of coordination

dynamics, which had an immense impact on my thinking about so many things!

All my parents (Mom, Bill, Dad and Ange) have been so supportive throughout

the years. Words can never true convey how I feel about them, their love, and

how they gave me the courage to continue and try to achieve my dreams.

To my best friends, Jordi, James, and Don. I never could’ve done this without

them. Jordi’s continued and unconditional encouragement was one of the things

that really carried me through to completion! I will always remember that

conversation on the beach we had so many years ago, during one of my (many)

moments of doubt.

Last, my wife Andrea. How can I ever thank her for the 5 years of sacrifices she

made, giving me the chance to go after my dreams? Only by promising (sung in

the voice of Captain Sisko) that, “The Best is Yet to Come.” Without her, none of

v this would have been possible, or even conceivable. She has given me the strength, confidence, support I needed to be able to do this.

vi ABSTRACT

Author: Michael Mannino

Title: On the Nature of Neural Causality in Large-Scale Brain Networks: Foundations, Modeling and Nonlinear Neurodynamics

Institution: Florida Atlantic University

Dissertation Advisor: Dr. Steven L. Bressler

Degree: Doctor of Philosophy

Year: 2018

We examine the nature of causality as it exists within large-scale brain networks by first providing a rigorous conceptual analysis of probabilistic causality as distinct from deterministic causality. We then use information-theoretic methods, including the linear autoregressive modeling technique of Wiener-Granger causality (WGC), and Shannonian transfer entropy (TE), to explore and recover causal relations between two neural masses. Time series data were generated by Stefanescu-Jirsa 3D model of two coupled network nodes in The Virtual Brain

(TVB), a novel neuroinformatics platform used to model resting state large-scale

networks with neural mass models. We then extended this analysis to three

nodes to investigate the equivalence of a concept in probabilistic causality known

as ‘screening off’ with a method of statistical ablation known as conditional

vii Granger causality. Finally, we review some of the empirical and theoretical work of nonlinear neurodynamics of Walter Freeman, as well as metastable coordination dynamics and investigate what impact they have had on consciousness research.

viii DEDICATION

This dissertation is dedicated to my mother, Marcy. This would not have been attainable without her unwavering encouragement, support, unconditional love,

and compassion.

ON THE NATURE OF NEURAL CAUSALITY IN LARGE-SCALE BRAIN

NETWORKS: FOUNDATIONS, MODELING, AND NONLINEAR

NEURODYNAMICS

LIST OF TABLES ...... xiv

LIST OF FIGURES ...... xv

LIST OF EQUATIONS ...... xix

CHAPTER 1: INTRODUCTION ...... 1

1.1 Neurocognitive Networks: A Paradigm Shift ...... 1

1.2 Modes of Brain Connectivity ...... 2

1.3 The need for computational modeling in brain research and the role of

Brain Simulation ...... 4

1.4 Two-Fold Approach: Nonlinear Dynamical Modeling in The Virtual

Brain and Information Theoretic Causal Analysis ...... 5

1.5 Overview of dissertation ...... 7

CHAPTER 2: FOUNDATIONAL PERPSECTIVES ON CAUSALITY IN LARGE

SCALE BRAIN NETWORKS ...... 10

2.1 Introduction ...... 10

2.2 Concepts of Causality ...... 13

2.3 The Ontology and Epistemology of Causality: Aristotle, Hume, Kant,

and Russell ...... 15

x 2.3 Causality in Classical and Modern Physics ...... 19

2.4. The Distinction Between Deterministic and Probabilistic Causality ...... 26

2.5. Causality in Complex Systems ...... 30

2.6. Causality in the Brain ...... 32

2.7 Causality in Large-Scale Brain Networks ...... 34

2.8 Quantification of Probabilistic Causality in the Brain ...... 38

2.9 Conclusions ...... 46

CHAPTER 3: MEASURING CAUSALITY IN LARGE-SCALE BRAIN

NETWORKS AT REST: NEURONAL POPULATION MODELING

WITH THE VIRTUAL BRAIN ...... 49

3.1 Introduction ...... 49

3.2 Results ...... 52

SJ3D Model Output of Summed Modes ...... 52

Parametric Variation ...... 57

K12 (excitatory to inhibitory connectivity) ...... 58

K21 (inhibitory to excitatory connectivity) ...... 59

K11 (excitatory to excitatory connectivity) ...... 59

r (adaptation parameter controlling slow variables) ...... 60

Conduction Delay ...... 63

Coupling Scaling Factor (Global Coupling Strength) ...... 64

3.3 Discussion...... 66

3.4 Methods and Modeling ...... 72

Modeling with The Virtual Brain (TVB) ...... 72

xi The Neural Mass Model, local and global parameterizations ...... 73

Causal Analysis ...... 79

3.5 Conclusion ...... 87

CHAPTER 4: ANALYZING CAUSAL RELATIONS BETWEEN THREE NODES

USING THE VIRTUAL BRAIN...... 88

4.1 Introduction ...... 88

4.2 Methods ...... 91

4.3 Results ...... 93

4.3.1 Causal structure 2, the temporal case...... 93

4.3.2 Causal structure 1, the sequential case...... 95

4.4 Discussion...... 101

4.5 Conclusions ...... 105

INTERLUDE: Nonlinear Neurodynamics, Cortical Coordination Dynamics and

Modeling: [Nonlinear Neurodynamics, K-Sets and The Virtual Brain]

...... 106

CHAPTER 5: THE WAVE PACKET IN MULTI-AREA CORTICAL MODELING:

HISTORY, THEORY AND EMPIRICAL EVIDENCE ...... 107

5.1 Introduction ...... 107

5.1.1 What is the Wave Packet? ...... 108

5.2 The History, Motivation and Theory of the Wave Packet ...... 110

Bressler [2007] makes the further point that ...... 111

5.2.1 Freeman’s Neural Pattern Formation of Meaning ...... 112

5.3 Interacting Neocortical Areas in Cognition ...... 115

xii 5.4 Cortical Coordination Dynamics ...... 117

5.5 Inter-areal Beta-Frequency Phase Coupling ...... 118

5.6 Conclusions ...... 121

CHAPTER 6: FREEMAN’S NONLINEAR BRAIN DYNAMICS AND

CONSCIOUSNESS ...... 123

6.1 Introduction ...... 123

6.2 Cortical Coordination Dynamics: building a framework ...... 128

6.2.1 The emergence of unified neurocognitive state ...... 131

6.3 Freeman’s Nonlinear Neurodynamics ...... 133

Likewise, Bressler (2007) has pointed out that ...... 136

6.4 Comments on the Freeman-Kelso Dialogue ...... 139

6.5 Classical Cognitivism, Freeman’s Pragmatism and The Action-

Perception Cycle ...... 141

6.6 How does Freeman’s nonlinear neurodynamics contribute to

understanding consciousness? ...... 147

6.7 Conclusions ...... 150

CHAPTER 7: CONCLUSIONS ...... 154

7.1 Tying it all together: Implications and Interpretations ...... 154

7.2 Future Considerations ...... 155

APPENDICES ...... 156

APPENDIX A ...... 157

APPENDIX B ...... 163

REFERENCES ...... 222

xiii

LIST OF TABLES

Table 1: Parameter values used in TVB model...... 86

xiv LIST OF FIGURES

Figure 3.1. Output of the SJ3D model for an uncoupled node with intrinsic

dynamics determined by appropriate selection of parameter

values in The Virtual Brain (TVB)...... 53

Figure 3.2. Nodal output time series of the TVB model with two

unidirectionally coupled network SJ3D nodes, called IPSc and

DLPFc because of their position in the structural connectivity

matrix of TVB...... 53

Figure 3.3a. Condition 1: both SJ3D nodes contain the same parameter

values, thus having identical intrinsic dynamics, and are

uncoupled. (left) WGC results; (right) TE results. (red) from IPSc

to DLPFc; (blue) from DLPFc to IPSc...... 55

Figure 3.3b. Condition 2: both SJ3D nodes contain the same parameter

values, thus having identical intrinsic dynamics, and are

unidirectionally coupled from IPSc to DLPFc ...... 56

Figure 3.3c. Condition 3: the SJ3D nodes contain unique parameter values,

thus having different intrinsic dynamics, and are unidirectionally

coupled from IPSc to DLPFc...... 56

xv Figure 3.3d. Condition 4: both SJ3D nodes contain unique parameter values,

thus having different intrinsic dynamics, and are bi-directionally

coupled...... 57

Figure 3.4. Parametric variation of K12 showing correct recovery of causal

relations...... 𝛥𝛥 ...... 58

Figure 3.5. Parameterization of K11 showing stabilization of causality...... 60

Figure 3.6. Parameterization of 𝛥𝛥r showing recovery of causal relations...... 62

Figure 3.7. Parameterization of Conduction𝛥𝛥 Delay ...... 63

Figure 3.8. Parameterization of coupling scaling factor’s (global coupling)

effect on recovery of causal relations...... 65

Figure 3.9. Two node neural mass model of SJ3D nodes showing the pattern

of unidirectional driving (condition 2)...... 79

Figure 3.10. The three modes of the SJ3D model for the main excitatory

state variable () of both nodes...... 80

Figure 3.11. System Model of Unidirectional Causal Influence Between Two

Cortical Nodes...... 84

Figure 3.12. Graphical workflow representation of sequential steps taken to

generate the simulated time series data in TVB, and the analysis

that was subsequently performed in R...... 85

Figure 4.1. Two causal structures: sequential (causal structure 1) and

temporal delay (causal structure 2) ...... 91

Figure 4.2. Time series for main excitatory state variable Xi of each node

showing different intrinsic dynamics ...... 94

xvi Figure 4.3. A) Correct bWGC recovery from IPSc to DLPFc, showing

significance beginning at lag 10, the red line. The blue shows no

significance in other direction, since there is no directed

connectivity from DLPFc to IPSc. B) Correct bWGC recovery from

IPSc to V1, showing significance beginning at lag 2. C) Spurious

bWGC from V1 to DLPFc, beginning at lag 10. D) The spurious

causality has been removed using cWGC...... 95

Figure 4.4. A) Correct bWGC recovery from IPSc to DLPFc, showing

significance beginning at lag 10, the red line. The blue shows no

significance in other direction, since there is no directed

connectivity from DLPFc to IPSc. B) Correct bWGC recovery from

DLPFc to V1, showing significance beginning at lag 10. C) No

spurious bWGC from IPSc to V1, beginning at lag 20 ...... 96

Figure 4.5. Six simulations with the accompanying parameter values for each

of the three nodes showing the result of the connection IPSc to

V1, red line, and the other direction, the blue line ...... 97

Figure 4.6. Time series showing distinct dynamics. IPSc has bursting

behavior and V1 has the spiking behavior...... 98

Figure 4.7. No spurious causal relation for the inter-nodal difference of

dynamics, where the driver and receiver have their dynamics

reversed...... 99

Figure 4.8. A) Spurious causality from IPSc to V1 at lag 20. B) Using cWGC,

the spurious causality disappears...... 99

xvii Figure 4.9. The gradual decreasing of the coupling scaling factor shows that

the spurious causality between IPSc and V1 disappears at a value

of .015...... 101

Figure 4.10. Increasing the global coupling csf has the effect of producing

spurious causality in both directions between IPSc and DLPFc,

and between DLPFc and V1...... 101

Figure 4.11. A meteorological example of the temporal case...... 103

Figure 6.1. Concurrent LFP records from 2 olfactory bulb and 2 prepyriform

cortex sites, demonstrating increased gamma wave amplitude in

both structures in response to an olfactory stimulus ...... 152

Figure 6.2. Freeman’s schematic showing the flow of neural activity in the

construction of meaning, with emphasis on two main structures in

the limbic system: the entorhinal cortex and the hippocampus...... 152

Figure 6.3. Revision of Freeman’s schematic diagram, illustrating the neural

basis for intentionality in the mammalian action-perception cycle

as 4 processing stages...... 153

xviii LIST OF EQUATIONS

Equation 2.1 Granger causality autoregressive model ...... 40

Equation 2.2 General State question for DCM...... 43

Equation 3.1 Hindmarsh- Rose Model ...... 74

Equation 3.2 The Stefanescu-Jirsa Model ...... 76

Equation 3.3 The general evolution equation that determines the global dynamics of the two node neural mass model...... 77

Equation 3.4 The unrestricted and restricted models are used to compute Granger causality...... 81

Equation 3.5 Transfer Entropy from Sender S to Receiver R as a function of five parameters...... 82

xix CHAPTER 1: INTRODUCTION

“The brain is not using a language at all…There is no code…”—Walter Freeman, addressing Von Neumann’s comment that “Whatever language the brain is using, it’s not mathematics or logic.”

Computational modeling is becoming increasingly important for studies of large- scale brain networks and its importance is likely to grow even more in the future. – Bressler and Menon, 2010 Trends in Cognitive Sciences

1.1 Neurocognitive Networks: A Paradigm Shift

The paradigm of “brain networks” as the structural and functional basis of human cognition is well underway in neuroscience (Luria, 1967; Mesulam, 1981;

Bressler and Menon, 2010; Sporns, 2011, Bressler and Kelso, 2016). The largely reductionist, modular model of isolated, non-interacting cortical and subcortical regions giving rise to cognitive processes like attention, language, perception, memory, akin to a phrenological perspective has run up against various anomalies, and cannot explain the overwhelming evidence coming from many subfields within neuroscience. A Kuhnian paradigm shift has taken place, and there exists now a general scientific consensus that the key to understanding the organization and nature of the mind and its behaviors lies in the complex and non-reductionist interplay and coordination of the segregation and integration of brain regions, that is, interregional connectivity. It has been observed that the old social aphorism “You are defined by your connections” applies and defines how

1 the brain operates as well. These connections allow for complex coordinated neural activity to take place across multiple spatial (micro, meso, and macro) and temporal scales (micro, milli, hourly, yearly), as network dynamics change.

Neurocognitive networks have various kinds of structure, function and dynamics, but ultimately, how nodes in the networks are connected, how information flows through the system, will generate their function and how they operate in real time, and thus define cognitive processes.

1.2 Modes of Brain Connectivity

It has been observed that various kinds of network connectivity operate in the brain, namely, structural connectivity, functional connectivity, and effective or

“causal connectivity”. Networks are structured according their nodes (themselves structural or functional) and their edges which in turn can be directed or undirected (Bressler and Menon, 2010; Sporns, 2008). Structural networks refer to the anatomical connectivity between brain regions, including the white mater tracts of corticocortical connectivity as well as cortico-subcortical connections

(Bressler, 2008). The patterns of physical connections in the brain, having been shown to exhibit small-world network properties, then serve to govern the functional attributes of neurocognitive networks such as, self-organization, phase synchrony, multistability and metastability, among others (Bressler, 2006, 2010;

Tognoli and Kelso, 2009, 2014; Jirsa and Kelso, 2000; Jirsa et al., 1994). How brain regions are functionally connected is fundamentally a statistical notion, i.e.,

2 if two or more nodes in a neurocognitive network exhibit functional connectivity, then there exist statistical dependencies or correlations in the neural activity between the regions. These correlations can be undirected, and purely correlational, or directed, i.e., having a causal effect.

Directed functional connectivity has also been referred to as effective connectivity, especially in the context of model-based approached to connectivity such as dynamical causal modeling. However, in this research, the terms

“directed functional connectivity (DFC)” and “causal relations” refer to the same phenomena, that is, the activity of one neuronal population has an effect, drives, or causes change in, the activity of another neuronal population.

A plethora of “big data” within the network paradigm now exists, including information about existent nodes, edges, and their operational principles (Tognoli and Bressler, 2006); i.e., what networks exists to subserve various cognitive functions, how they are structured, and how they function and change over time.

However, a unified, complete theory of brain dynamics, akin to Maxwell’s theory of electromagnetism, or Einstein’s theory of relativity, remains elusive. And although it may be the case that one does not actually exist, research in the fields of computational and theoretical neuroscience moves forward.

3 1.3 The need for computational modeling in brain research and the role of Brain

Simulation

Computational neuroscience has been defined as the use of computational and mathematical models and methods, and information processing theory to understand the behavior of neurons at various spatial and temporal levels of

organization (Erdi, 2014). Given the rise in computational power and efficiency

over the past several years, the modeling approach has become very useful, and

possibly necessary, for understanding the operational principles of

neurocognitive networks. As such, developing models which find a balance

between neural complexity and computational efficiency has been of paramount

importance. Biological neuron models, including the neural mass models used in

this research, provide great insight into the operational principles of brain

function. And even though they are abstract, simplified representations they

explain, describe, and predict a great deal of real world phenomena.

Computational modeling is thus, as Bressler and Menon (2010) note, a valuable

endeavor to pursue as it can bridge the gap between various modalities of brain

data. It can also contribute to building a larger, unified theory of brain dynamics

(Jirsa, 2008).

However, an important question remains: can computer simulation in

neuroscience yield true knowledge and real insight, as veridical experiments do?

If so, in what way? A general, valid question: does this work based on simulated

4 data contribute to an understanding of how the brain works? While the scope of this question is outside the purview of this work, a brief digression is warranted since this research provides empirical knowledge based solely on simulated brain data. The term “ground truth” has been used quite extensively in neuroscience and refers to the real, underlying neurobiological phenomenon under investigation. The issue is that sometimes, this phenomenon is and remains unknown. Thus, if a computational model with the appropriate variables and parameters is created which reproduces empirical measurements, we will say the model represents a ground truth (Wang et al., 2014) While there are many disadvantages to modeling, e.g., the simplifications of real world phenomena, there are several advantages, namely, the ability to change and optimize parameters of the generative model to investigate effects. In this sense, simulated data is crucial for testing hypotheses and warranting belief concerning brain function and operation.

1.4 Two-Fold Approach: Nonlinear Dynamical Modeling in The Virtual Brain and

Information Theoretic Causal Analysis

The empirical part of this research consisted of a two-fold approach: first a neural mass model (NMM), or brain network model (BNM) represented by a complex nonlinear was employed to produce the oscillatory behavior of a two neuronal populations in a directed network (Sanz Leon, et el., 2015);

5 second, information theoretic causal analyses of Granger causality (GC) and

transfer entropy (TE) were performed to analyze causal, directed relations.

The NMM was implemented in The Virtual Brain (TVB). TVB is a neuroinformatics platform used for investigating resting state brain networks at

both the mesoscopic (neural population) and macroscopic (inter-areal interaction) levels. The core of TVB is the simulator, where generative neural mass models can be used to create time series data. The NMM can be chosen with specific goals in mind, and intrinsic network parameters can be manipulated based on what behaviors are being considered. In addition, TVB allows for manipulation of

conduction velocities, various coupling functions, as well as various forward

models implementing brain data modalities, including BOLD, LFP, EEG and

MEG (Sanz-Leon, et al., 2015). The NMM that was used in this research was

the Stefanescu-Jirsa 3-dimensional model. For a detailed explanation of this

model and why it was chosen, see Methods and Modeling section in Chapter 3.

In order to understand network dynamics, it is necessary to understand the flow

of information between network nodes, specifically, how the nodes causally effect

or drive one another. However, it has been argued that a distinction exists

between information flow and information transfer (Lizier and Prokopenko, 2010),

the latter being a measure of improvements on predictability, and the former

actual causal effect. Both GC and TE use the concept of predictability to make

an inference whether two or more time series are causally related to one another;

6 however, it shall be argued here that this research also allows for interpretation of direct causal effects as well given that the model connectivity parameters are designed a priori.

1.5 Overview of dissertation

Some of the central questions this research asks are: Can causal relations be recovered using GC and TE in a large scale brain network designed in TVB?

What is the relationship between the parameters of the generative model and the models used for causal analysis? Is there a range of parameters (what we term as a causal parameter space) where causal relations are better recovered? And, can a coherent conceptual and philosophical framework of causality be provided for interpretations of the results offered by methods like GC and TE?

As the title suggests, this dissertation and research, like the field of neuroscience itself, is quite interdisciplinary and covers many, sometimes overlapping fields, including: network neuroscience and brain connectivity, nonlinear dynamical systems, coordination dynamics and complexity science, philosophy, neural time series analysis, and biological neurons models. Following this introduction,

Chapter 2 is adapted from a paper, Mannino and Bressler, 2015, and will provide a rigorous conceptual framework of neural causality. Causality has a rich philosophical history, and this is a central topic in network neuroscience; however, a survey of the literature in neuroscience reveals scant evidence of

7 how causality is to be interpreted within the brain. That is, the question of how

functional nodes within a network affect each other is key to understanding the

function of brain networks, yet, a gap exists in the understanding of causality in the way neuroscientists apply the word to explain their data. Part of this research

is an attempt to fill this gap in understanding. It is argued that the nature of neural

causality must be understood from a probabilistic, not a deterministic

perspective. Chapter 3 is an empirical investigation of measuring causal relations

in a two node large-scale resting state, simulated brain network. Among other

things, the relationship between intrinsic (intra-node) and extrinsic (inter-node)

parameterizations and directed functional connectivity is explored. While the

model used in this research is (quasi)deterministic, given that noise imparted to

the system is relatively small, the causal analysis performed on the data is

inherently statistical in nature, and thus, understanding causality from a

probabilistic perspective validates the use of methodological procedures like

Granger causality and transfer entropy. However, as stated above, the argument

is more than methodological: although the brain may exhibit deterministic

properties on some spatiotemporal scales, given the ubiquitous convergence and

bi-directionality of connectivity, we argue that causal relations between brain

areas is probabilistic. Chapter 4 extends the approach in chapter 3 to a trivariate

network following, Ding et al., 2006, by eliminating the causal influence of a third

node, thereby removing any spurious causal interactions. As we shall see, this

project provides a more direct empirical motivation for understandings neural

casualty in terms of probability. We demonstrate that the notion of

8 Reichenbach’s screening off (see Chapter 4) in probabilistic causality is equivalent to the notion of statistical ablation in time series analysis, using the method of so-called conditional Granger causality.

Given that the subsequent chapters of the dissertation are indirectly related to the previous chapters, a brief interlude is provided to connect the ideas and concepts they have in common, namely, coordination dynamics, information processing in the brain, and Walter Freeman’s nonlinear neurodynamics.

Chapter 5 discusses the history, theory and evidence for the wave packet in

Walter Freeman’s work in nonlinear neurodynamics. Chapter 6 addresses what import Freeman’s nonlinear neurodynamics has for a scientific theory of consciousness and cognitive science, as well as the relationship between

Freeman’s work, Scott Kelso’s and Steven Bressler’s metastable coordination dynamics in the brain (the Freeman-Kelso dialogue). Chapter 7 will conclude with some thoughts tying ideas together as well as future directions for this work.

Finally, it is noted that chapters 2, 5, and 6 of this thesis have been published in open access journals, Physics of Life Reviews, Chaos and Complexity Letters, and Journal of Consciousness Studies, respectively, and are available online.

9

CHAPTER 2: FOUNDATIONAL PERPSECTIVES ON CAUSALITY IN LARGE

SCALE BRAIN NETWORKS

The law of causality, I believe, like much that passes muster among philosophers, is a relic of a bygone age, surviving, like the monarchy, only because it is erroneously supposed to do no harm. – Bertrand Russell (1913)

2.1 Introduction

By virtue of what is one thing or even the cause of another thing or event? The question is an old and familiar one, and has been camouflaged in various cloths throughout the history of science, mathematics and philosophy. It is closely related to various other problems concerning the foundations of causality: what is the fundamental nature of causal relations? Is causality real, and a fortiori, ontologically independent of the mind, or is it merely an epistemic limitation? In the context of statistical analysis, are causal relations inherently deterministic or

probabilistic? Finally, and perhaps most importantly for what follows: is causality

affected by complexity? That is, is it necessary to expand our conception of

causality to cover causal influences in the human brain, which may be affected

by its properties as a complex adaptive biological system? Or, more germane to

the current discussion, must the causal influences between neuronal

10 populations1 in complex brain systems be described in a more comprehensive

way – with a different foundational conception – than influences in simple

physical systems?

Historical as well as modern attempts to formulate an unambiguous conceptual

description of causality are rich and plentiful; modern philosophers such as David

Hume, Immanuel Kant, Bertrand Russell, Nancy Cartwright, Patrick Suppes, and

Christopher Hitchcock, among others, have developed both ontological and

epistemological accounts of causality. Moreover, modern mathematicians,

statisticians, and economists, such as Austin Bradford Hill, Norbert Weiner and

Clive Granger have developed methodologies for measuring causality with

statistical tools from the perspective of stochastic processes. Nevertheless, the

basic nature of causality within modern conceptions remains to be clarified,

especially for complex systems such as the human brain. In this review, we

address classical and contemporary work in philosophy, cognitive neuroscience,

and statistics, and propose future research avenues of approach to a central, but

unanswered question: what is the nature of causal influences in the human brain,

specifically in large-scale brain networks? And more generally, what is the nature

of causal influences in complex systems? It is highly doubtful that Hume or Kant

considered what sorts of causality occur in organized complex systems, for

1 This paper discusses neural causality at the level of neuronal populations because it is the neuronal population that is thought to represent the unit of interaction in large-scale brain networks, whose operations are proposed to underlie cognition in the brain (see [56]). Neural causality may also apply to the interactions of individual neurons within a population wherever a similar connectional topology prevails at the single-neuron level.

11 example, coupled neuronal populations in the human brain; nevertheless, it is time, once again in the history of science, for the notion of causality to be conceptually expanded, this time into the study of brain networks. Given this ambitious claim, the central goal of this review is to examine, in the context of a complex biological system, whether the classical notion of causality is valid.

The classical concept of causality is discussed in more detail below. Here we note that even before the advent of modern physics, Hume was well known for his skeptical elimination of the concept of necessity from causality, that is, for claiming that the only basis for a causal relation between two events is simply the mind’s perception that the events are repeatedly (or constantly) conjoined. Hume converted the objective regularity between events into a subjective

(representationalist) experience. For Hume, there is no causality apart from the perception of events that are strictly correlated in time. Kant, in a stunning foreshadowing of modern perceptual neuroscience, argued that causality is a synthetic a priori truth. According to Kant, mental representations, including causality, are not simply reflections of the world, but are categories of

understanding used to actively interpret events in the world. The implication is

that causality need not be determined by external events. Going even further,

Bertrand Russell argued that modern science has demonstrated that causality

need not be intrinsically tied to determinism. Thus, since the time of Russell, a

distinction is made between deterministic causality and probabilistic causality. As

we shall see, motivations for probabilistic theories of causality stem from

12 difficulties with so-called regularity theories, which originate in Hume’s idea of

constant conjunctions.

This review considers the proposition that probabilistic causality is well suited for

understanding causal influences in the brain, where bi-directional and convergent

pathways play a major role in processing. Such complex organizational features

of brain connectivity imply that interaction models based on linear transmission

from unitary senders to unitary receivers are too rigid. The first half of the review

consists of a discussion of classical notions of causality, in order to provide a

brief philosophical background for the ensuing discussion. The concept of

causality is recast in a new light to hopefully illuminate the nature of causality in

brain networks and the methods used to uncover it. The second half is a

discussion of the nature of causality in large-scale brain networks, and the

measurement of causal influences in these networks. We argue that classical

conceptions of causality do not apply to causality in the brain, which must be

conceived in a fundamentally different manner than causality in non-complex

physical systems.

2.2 Concepts of Causality2

The standard, or classical, definition of causality is given by:

2 The terms ‘causality’ and ‘causation’ are generally considered to be equivalent. We use ‘causality’ here, although ‘causation’ is often used to represent a causal relation in the philosophical literature. 13 (1) For C to “cause” E it is both necessary and sufficient that when C happens, E

happens. That is, if C causes E, E must follow C [1].

This definition includes concepts of both the causal relation, i.e., the relationship or link between C and E, and the causal relata, the items that share this link. The basis for the causal link, and the type of thing that are causal relata, e.g., objects, events, etc., are by no means agreed upon [2, 3, 4]. Moreover, the words “must” and “necessary” in the above definition may, as Bertrand Russell stated, be

“relics of a bygone age.” For instance, how should one make sense of the nature of causality in following statements?

(2) The baseball caused the window to shatter.

(3) Smoking causes lung cancer.3

(4) This physical state caused that mental state.

(5) The patient’s flu was caused by her exposure to someone else with the flu.

(6) Several factors, working together, caused the hurricane to form.

(7) Everything that has a beginning of its existence must have a cause of its

existence.4

Is the causality in (2) identical to that in (3) or (7)? Indeed, do all six of these

statements refer to causality in the same way? Do they mean the same thing? If

not, can they be formally distinguished? Furthermore, does (1) serve as the basis

for (3)?

3 Austin Bradford Hill was the first to measure a correlation between smoking and lung cancer [5, 6]. 4 This comes from the Kalam Cosmological Argument for God’s existence, an argument based on causality. 14 Classically, causality has been wedded to determinism. However, the aforementioned propositions, in addition to many of the empirical results from modern physics, and, as we demonstrate in this review, methods that quantify causal influence in complex biological systems like the human brain, demonstrate that determinism is inadequate to describe causality in many situations. Thus, the classical deterministic description may be inappropriate as a general formulation of causality, and so, we will examine motivations for discarding the classical deterministic conception of causality. First, however, it will be useful to briefly examine the Aristotelian conception of causality, and then concentrate on three more recent historical attempts to uncover the nature of causality: those of Hume, Kant, and Russell.

2.3 The Ontology and Epistemology of Causality: Aristotle, Hume, Kant, and

Russell

Causality in Aristotelian philosophy is much broader in scope than are modern conceptions, but Aristotle believed that causality is an objective, deterministic feature of the world [7, 8]. Although some elements of Aristotle’s philosophy are still accepted today, the ancient Aristotelian definition of causality has been reevaluated, particularly after Hume and Kant later espoused more philosophically sophisticated conceptions, and thus we believe that Aristotle’s account of causality is somewhat removed when considered in the context of foundational concepts of causality in large-scale brain networks.

15 Hume’s account of causality stemmed, like all his philosophical contributions, from his belief that human knowledge derives from subjective perceptual experience. Although he famously referred to causality as “the cement of the universe”, he was skeptical of classical non-subjective notions of causality. Hume defined a cause to be “an object, followed by another, and where all the objects similar to the first are followed by objects similar to the second” (cited in [9]).

Thus, Hume’s is a skeptical account commonly referred to as a regularity theory of causality5 [10, 11, 12]. It is a regularity theory because the central feature of

Hume’s account concerns what he called constant conjunctions, that is, for event

A to cause event B, B is observed to always (regularly) follow A. Classical

examples of regularity abound: boiling water is observed to always follow from

application of a specific amount of heat; the motion of one billiard ball is observed

to always follow from being struck by another. Causality in Hume’s philosophy is

reduced to correlation: occurrence of the first event (A) is correlated with

occurrence of the second event (B). Hume thought that all beliefs in regular

connection need to be explained; and his explanation came directly from the

experience of human beings.

For Hume, we experience, not the causal relation itself, but rather constant

conjunctions of events from the moment we are born, and the regularity of these

experiences creates a sense of regularity in perception. According to Hume, the

5 Various interpretations of the Humean notion of causality exist, and there seems to be no consensus on which one, if any, are correct; for the purposes of this paper, we take the most general interpretation. 16 sense of regularity of causality comes from subjective experience, and is not a property of the objective external world. In short, causal relations are subjective phenomena that are projected onto the world [9, 13, 14]. Thus, although Hume rejected causal necessity, an objective property, he embraced causal regularity, a subjective attribute. Even though Hume’s view is that causality depends on subjective experience, and thus might allow for probabilistic effects, his emphasis on the regularity of cause makes his causality theory deterministic.

A number of problems exist for Hume’s theory, and for regularity theories in general [10, 11, 15, 16]. As an example, spurious regularities may give rise to false conclusions: if a single cause is regularly followed by two effects at different

times, Hume’s notion of causality would lead one to conclude that the earlier

effect is causal to the later one, even though they are not directly related. This is

an important point, and one of the primary motivations for a probabilistic notion of

causality that we shall return to later in our discussion of Granger causality.

Although problems such as spurious regularity have given philosophers and

scientists reason to abandon regularity theories of causality, it was Kant who

famously tried to rescue the notion of causality as existing in the external world

from Hume’s skepticism. Kant is well known for providing what turned out to be

foundational and conceptual justification for many of the results of the modern

science of visual perception. He stated that, “Up to now it has been assumed that

all our cognition must conform to the objects; but… let us once try whether we do

17 not get farther with the problems of metaphysics by assuming that the objects must conform to our cognition” [12]. And so Kant believed that perception is not simply a passive perceptual process, but rather an active interpretation of sensory input. Like Hume, Kant believed that not all knowledge stems from experience, but, unlike Hume, he believed that some knowledge derives from what he called concepts or categories of the understanding, of which causality is a prime example. For Kant, these categories are a priori elements of the human mind, that is, innate cognitive features that endow humans with certain perceptual capacities. These categories allow us to experience what the world is really like, and a priori knowledge exists alongside empirical knowledge [13].

Kant concluded that the principle of causality is an a priori truth, but is nonetheless determined (by both a priori and empirical knowledge). Kant’s notion of causality was the dominant viewpoint in Western philosophy until Russell provided a modern understanding that included a critique of the deterministic causality to which both Hume and Kant were dedicated.

Russell’s account is a rejection of the classical ‘law of causality’. This is important, not only for grounding notions of causality in modern physics, but also for motivating a concept of causality that is useful for understanding causality in brain networks. In his essay, On the Notion of Cause, Russell was ahead of his time when he reasonably claimed that scientific notions of causality should not be linked to determinism [18]. Russell asserted, based on conceptual analysis, and

18 possibly on the results of modern physics6, that not every event must have a

direct cause. Causality, for Russell, is to be “understood in terms of functional

relations” [18]. He claimed that scientific laws do not reveal the causal effects of

one event on a following one, but rather state functional relations between the

events. Russell concluded by recommending that the notion of causality itself be

removed from scientific discourse.

The modern notion of causality has undergone, in light of empirical results from a

number of scientific disciplines, particularly physics, biology, economics and

statistics, a drastic conceptual shift, with a major contribution by Russell.

Although Russell did not explicitly consider causality in biologically complex

systems such as the human brain, his arguments foreshadow that application.

Thus, Russell’s concept of causality presaged later developments in

understanding causality in complex systems and brain networks.

2.3 Causality in Classical and Modern Physics

It is doubtful that Russell could have foreseen what impact modern science,

especially physics, biology, economics and statistics would have on the concept

of causality; nevertheless, it is clear that he had an idea that a change was

coming. Twentieth (and twenty-first) century scientific concepts have both

6 It is arguable whether Russell had considered what implications “modern” physics, namely that of relativity and quantum mechanics, would have on causality when his 1913 paper was written. Both of those theories were very new, and their implications for causality were not well known. 19 challenged and clarified foundational concepts of causality in physics and other sciences. Concepts in modern physics that involve causality include the relativity of simultaneity, time dilation, the space-time interval, nonlocality, quantum entanglement, and backward causality (e.g., the delayed choice quantum eraser experiment) [19, 20, 21]. Biological concepts include biological plausibility, evolutionary mechanisms, epigenetics, genetic regulatory networks, and artificial and biological neural networks. We next provide a brief conceptual analysis of the role of causality in modern physics.

Mechanics is a central topic in classical physics, and implicit in the classical concept of mechanics is the concept of deterministic causality. Classical physics is based on concepts such as non-quantum or classical logic, Newtonian mechanics, determinism, objectivity, and certainty. By the end of the nineteenth century, the classical Newtonian legacy had become embodied in several fundamental epistemological and metaphysical assumptions about the material world [22]. Five assumptions were:

1) All motion has a cause. If a body exhibits motion, it is possible to

determine what caused the motion.

2) If the state of a system is known at one time – say the present – it can

be determined at any other time in the future or past. Thus, as a

consequence of an earlier cause, subsequent system states are certain.

3) The properties of light are completely described by Maxwell’s

electromagnetic theory and confirmed by experiment. Two equivalent

20 physical models represent energy, one based on particles, the other on

waves. At every time, energy exists as either particle or wave.

4) It is possible in principle to measure to any degree of accuracy the

properties, such as temperature or speed, of a system.

5) Causality is deterministically certain.

The first assumption above derives from the metaphysical doctrine of mechanism, which holds that the universe operates as a complicated machine.

The clock has been the most common metaphor used to portray mechanisms, and the mechanistic clockwork metaphor permeated classical physics for centuries. The operation of the clock system could be explained in totality by simply explaining its parts and how they moved within the system. Deterministic

causality was ascribed to the system because every motion by one part of the

system was considered to determine other motions by other parts. Furthermore,

just as the clock was a mechanical system that could be understood in terms of

its parts and their motions, so too the universe was considered to be a

mechanical system that could be explained in terms of its parts, namely material

objects, and their motions. Moreover, the metaphysical doctrine of mechanism

presupposed regularity, and thus determinism: the regularity exhibited by the

universe could be understood in terms of deterministic laws of nature. This

doctrine not only applied to the universe as a whole, but also to all phenomena

within the universe.

21 The second assumption above is best exemplified by the example of Laplace’s

Demon, which Laplace used to demonstrate causal determinism. In 1814, he stated,

We may regard the present state of the universe as the effect of its

past and the cause of its future. An intellect which at a certain

moment would know all forces that set nature in motion, and all

positions of all items of which nature is composed, if this intellect

were also vast enough to submit these data to analysis, it would

embrace in a single formula the movements of the greatest bodies

of the universe and those of the tiniest atom; for such an intellect

nothing would be uncertain and the future just like the past would be

present before its eyes [23].

This quote emphasizes the marriage of causality with determinism. However,

many subsequent developments in physics, including nonlinear dynamics, chaos

theory, and quantum physics (QP), have signaled a divorce. The advent of

quantum physics, as Clive Granger [24] notes in his essay Testing for Causality

(see discussion below) had a great impact on philosophical considerations of

causality and determinism. Several aspects of QP are particularly important in

this context, including Heisenberg’s uncertainty principle, Born’s rule, the

measurement problem, and even more modern, empirical results from

experiments involving Bell’s theorem, quantum entanglement, backward

causation and the delayed quantum choice eraser experiment [19, 20, 25-31].

We briefly discuss some of these examples to highlight the breakdown of

22 determinism and deterministic causality. Nonetheless, QP lends itself to a multitude of interpretations, and some have been interpreted in a completely deterministic framework [26]. Einstein, in particular, held this view, proposing that the theory of QP was incomplete and certain hidden variables were still to be found [30]. However, most commonly accepted interpretations, such as the

Copenhagen Interpretation, which Einstein famously opposed, have a strict indeterminism built in. Interestingly, the Schrödinger equation is completely deterministic, at least in providing the evolution of the wave function of a particle

[21]. However, it is here also that determinism breaks down, given Born’s rule, or statistical interpretation, which says that the wave function only gives the probability of finding a particle at a particular location at a particular time, and the probability is given mathematically by the wave function (as a function of space and time) squared [21]. This is precisely where indeterminism enters: QP only provides information, via the Schrödinger equation, about the wave function, and does not allow definite prediction of a particle’s location. QP can only offer statistical information about possible results. However, the origin of this indeterminacy has been questioned: is it a problem with the theory, inherent in nature, or something else? In his interpretation of the uncertainty principle, Bohm

[26] states that,

…the indeterminism is inherent in the very structure of matter and

that the momentum and position cannot even exist with

simultaneously and perfectly defined values. The term “uncertainty

23 principle” is therefore, somewhat of a misnomer. A better term would

be “the principle of limited determinism in the structure of matter”.

Fundamental to the question of causality in physics has been the phenomenon of quantum entanglement, The question has turned on whether or not causality is always local. Quantum entanglement is the phenomenon whereby pairs of particles that have previously interacted, but are now separated in space, exhibit correlated quantum properties and must be described and treated as a whole.

Moreover, any measurement on one particle has an instantaneous effect, or causal influence, on the other particle. Quantum entanglement has been interpreted as violating causality by violation of the principle of locality, which states that any event or change of any component in a physical system must have a physical cause that can be localized to its immediate spatiotemporal neighborhood. In other words, no physical influence can propagate faster than light. Thus, in modern interpretations of quantum mechanics, probabilistic causality may be supported by the nonlocality of quantum entanglement, which possibly violates deterministic concepts of causality. However, since entangled states have correlated outcomes, another possible interpretation is that they support deterministic causal influences that are informational in nature [21]. This issue is still under intense debate, and it may turn out that causality can be both nonlocal and deterministic. What is true is that the concept of causality in quantum physics, which describes events that do not have classical deterministic causes, has been expanded to include probabilities.

24 We argue below that these considerations from modern physics help to motivate, as Russell possibly foreshadowed, a probabilistic notion of causality. Causality in quantum physics has been expanded to include unpredictable probabilities, and, quantum mechanics describes events that do not have classical deterministic causes.

We now briefly consider whether the theories of special and general relativity also change our understanding of causality. Granger [24] specifies as axiomatic that, “The past and present may cause the future, but the future cannot cause the past.” We hold that Granger’s time-asymmetry axiom is indeed preserved in modern physics, and is not altered by either special or general relativity. In relativity, the notion of time and space are dependent on the observer and their inertial reference frame. Because of Einstein’s two postulates that the laws of physics must be the same in all inertial references frames, and that the speed of light is constant, perception of the simultaneity of events by observers in two different inertial reference frames is not absolute: whether two observers perceive two events to be simultaneous or not depends on how the observers are moving relative to one another. Thus, although the principle of time asymmetry of causality is expanded by the theories of relativity to include all inertial reference frames moving relative to one another, it is basically unchanged in its nature. Furthermore, since no signal can travel faster than the speed of light in relativity theory, causality, neither deterministic nor probabilistic, is not violated

[31-33]. Thus, although relativity expands the concept of causality, it does not

25 alter it. Quantum physics, by contrast, has more direct implications for the concept of causality.

The discussion in this section has highlighted the fact that the concept of causality has undergone drastic revision, specifically by abandonment of determinism, in the history of physics. In what follows, we will consider the modern understanding of probabilistic causality, and its implications for the conception of causal influence in large-scale brain networks.

2.4. The Distinction Between Deterministic and Probabilistic Causality

This section deals with the concept of probabilistic causality, which, we argue, is the best interpretation of causality in the context of large-scale brain networks.

We begin with the distinction between deterministic causality (DC) and probabilistic causality (PC). In the classical Humean sense of DC discussed above, two events in causal relationship are necessarily connected, with the cause determining the effect. ‘A causes B’ means, ceteris paribus, that if A happens, B always follows. In PC, however, an event does not necessarily determine another event, but rather changes the probability of occurrence of the other event. Note that, if an event lowers the probability of the other, then it is not considered a cause: for example, if a person eats ice cream, the chance of losing weight is lowered, and the act of eating ice cream does not cause the loss of weight. We thus conclude that one event causes a second just in case the

26 probability of the second given that the first occurs is greater than that of the second given that the first does not occur [11, 16, 34].

Among the reasons given by Hitchcock [11, 16] for abandoning regularity theories, which are based on deterministic causality, is the existence of imperfect regularities and spurious regularities. Imperfect regularities arise from the fact that effects do not invariably follow their supposed cause. An example of an imperfect regularity is the causal structure whereby exposure to the flu causes its contraction, even though some people who are exposed to the flu do not contract it. Under PC, flu exposure raises the probability of a person contracting the flu.

Probabilistic theories of causality recognize that there are occasions when effects occur without occurrence of the putative cause, and when a putative cause occurs without eliciting effects. Imperfect regularities generally arise for one of two reasons: heterogeneity of causality and the failure of physical determinism.

First, the fact that causal relations depend on many different contexts and circumstances means that effects are rarely seen to invariably follow causes.

Second, since many processes and events in the physical world are stochastic and nondeterministic, it is often difficult or impossible to assign the labels “cause” and “effect”.

Examples of spurious regularity are causal structures where multiple effects follow from a single cause. The effects may occur concurrently or with time delay.

A meteorological example given by Hitchcock is that of a drop in barometric

27 pressure causing a drop in the height of a column of mercury and, shortly thereafter, a storm (Figure 3). A deterministic interpretation of causality would mistakenly infer that the drop in mercury causes the storm. As discussed below, a condition may be included in the definition of PC so that it adequately addresses spurious regularities.

A long tradition exists in philosophy of analyzing causality in terms of probability raising. From the perspective of PC, A causes B if A raises the probability of B

[P(B|A) > [P(B)]. That is, “A causes B” means “the probability of B given A is greater than the probability of B occurring alone”. Alternatively, “A causes B” may be defined as “the probability of B given A is greater than the probability of B given that A does not occur”, or P(B|A) > P(B|~A). This PC formulation, as it stands, can resolve imperfect regularities: A may raise the probability of B even if occurrences of B do not always follow occurrences of A. However, spurious regularities remain a problem because occasions may arise where A appears to cause B but occurrences of both A and B are actually caused by a third factor, C.

For example, in Hitchcock’s meteorological example, with A as the drop in mercury and B as the storm, then P(B|A) > P(B|~A) is true, even though the drop in barometric pressure (factor C) is the actual causal factor.

A probability-raising solution to the problem of spurious regularities comes from

Reichenbach, who described a probabilistic relationship called “screening off”

[16]. When included as a condition in the PC formulation, this relationship allows

28 PC to address spurious regularities. Reichenbach said that factor C screens A off from B if the probability of B given both A and C is equal to the probability of B given C alone, i.e., P(B|(A&C)) = P(B|C). If A and B are “spuriously correlated”, then C is the common cause that does the screening off. If screening-off causes are excluded, then spurious regularities are avoided. Hitchcock explains this by formalizing Reichenbach’s theory: if At and Bt′ are events that occur at times t and

t’, with t’ following t, then At causes Bt′ if and only if two conditions are met:

1) P(Bt′ | At) > P(Bt′ | ~At)

2) No other event Ct″, occurring at time, t’’, which is either earlier than or at

the same time as t, screens Bt′ off from At.

The first condition is simply a statement of PC as described above. In

combination with the second condition, the possibility of spurious regularities is

removed. Thus, PC has the advantage of handling issues that DC cannot.

Despite the theoretical advantages of PC, a probabilistic notion of causality

raises new complications as to whether causal relations are objective, mind-

independent features of the world. This issue is important for the topic of this

review since, as discussed below, some quantification methods used to measure

causal influences in large-scale brain networks have an a priori component: the

causal influences are derived from mathematical equations. Since it is an open

question whether or not mathematical equations are subjective, the question is

raised of whether probabilistic causality itself is objective or subjective. If,

contrary to Hume, causality is thought to not be inherently perceptual, but rather

29 to exist outside of human perception, then, for probabilistic causality to be meaningful, it must be ontologically real, and not just an epistemological notion.

Objective interpretations of physical probability have generally fallen into two camps: frequency views and propensity views. Of the two, the most likely candidate for PC would seem to be the propensity view. In this view, probabilistic causality reflects the underlying physical propensity of a given outcome [35]. The nature of propensity may be unclear: for example, it may or may not be a property of the event or object exhibiting the probability. Nonetheless, propensities “seem to be measures of ‘casual tendencies’…” [16]. In fact, there have been attempts to formulate a “probabilistic causal calculus” based on objective, physical propensities [36]. The propensity interpretation of probability appears to correspond well with the probabilistic notion of causation.

2.5. Causality in Complex Systems

Nancy Cartwright, a prominent philosopher of science and economics, has

suggested a pertinent view of causality that we consider to be relevant to the

topic of causality in complex systems:

The term ‘cause’ is highly unspecific. It commits us to

nothing about the kind of causality involved nor about how the

causes operate. Recognizing this should make us more

cautious about investing in the quest for universal methods for

causal inference [37].

30 Cartwright claims that a variety of different “methods of causal inference” can be used to measure causality in various scientific contexts [38]. Although each of these methods displays unique sets of strengths and weaknesses, depending on the context, we argue that applications to large-scale brain networks require a probabilistic interpretation of causality. It should be noted that these causality methods are based on general accounts of causality, and stand in juxtaposition to singular accounts, which only describe particular objects or events that exist in one causal relation, e.g., ‘The brick caused the window to shatter’. General accounts, by contrast, describe “causal relations among appropriately general relata such as event-types or properties” [16]. We further note, in anticipation of

the discussion below, that accounts of causal influence in large-scale brain

networks may be general and still be probabilistic.

We focus here on complex systems as being composed of many interacting

parts, and maintain that causality in complex systems differs from that in non-

complex systems [41, 42]. We further postulate that complex systems are well

served by a probabilistic account of causal relation between the parts. A

probabilistic account of causality is better suited than a deterministic one for

explaining causality in complex systems because: (1) two interacting components

may be causal to each other; (2) the causal influence from one interacting

component to another may only be one of many causal agents; and (3) causal

influences do not always result in observable effects [43-46]. In the next section,

we consider the structural and functional complexity of the brain, and suggest

31 that a probabilistic account of causality is better suited than a deterministic one for explaining neural causality in large-scale brain networks.

2.6. Causality in the Brain

The brain is structurally organized into distinct systems having specialized inputs, outputs, and internal topology. The functions of these systems are unique, and support unique forms of cognition [47-51]. For example, visual cognition relies on the visual cortex, a brain system having specialized input pathways originating in the retina, multiple output processing streams that follow distinct pathways, and a complex internal topological organization of interconnected areas. Nonetheless, visual function can be forced into other cortical areas by re-routing retinal projections [52].

The anatomical pathway between a pair of interconnected neuronal populations in different brain areas consists of axons from cell bodies in the sending population having axon terminals that synapse on postsynaptic neurons in the receiving population to which the pathway is directed. See Bressler [49] for an exposition of why interactions between brain areas are thought to occur at the level of the neuronal population, and not the single neuron. The predominant pattern of anatomical connectivity between populations is that of reciprocal (bi- directional) projection, with a return pathway from the receiving population to the sending population. Thus, pairs of neuronal populations typically both send

32 projections to, and receive them from, one another, although individual neurons in those areas are not necessarily reciprocally connected. Anatomical projections in the brain support functional interactions, and projection pathways support the transmission of spike (action potential) trains on all the axons from a sending to a receiving population. Reciprocal axonal pathways thus support bi-directionally transmitted spike trains. Populations in two interconnected areas interact when they actively transmit spikes to one another that affect each other’s function.

Neuronal populations in multiple areas typically interact as part of the normal function of a brain system [54, 55].

The large-scale anatomical connectivity of a brain system refers to the pattern of axonal pathways that project between the neuronal populations in its different areas. Anatomical connectivity is primarily identified by anatomical tract-tracing techniques, and by mathematical analysis of axonal tracts using structural neuroimaging. Causally directed physiological influences are exerted by neuronal populations in different brain areas on one another. They are identified by stimulation techniques applied in the system, by statistical causality analysis of neural time series from the system, and by modeling of directed interactions between neuronal populations in the system. Anatomical connectivity supports causally directed physiological influences since neurons can only influence one another when viable axonal pathways exist between them. The measurement of causal influence between neuronal populations in different brain areas from the statistical analysis of neural time series can suggest, but not prove, the existence

33 of causally directed physiological influences. Their existence can be more firmly established if the results of statistical causality analysis are validated by other methods like (electrical, magnetic, or pharmacological) stimulation or interaction modeling.

2.7 Causality in Large-Scale Brain Networks

Our view is that the brain should be treated as a complex system, and that a probabilistic definition of causality is better suited than a deterministic one for explaining causality in the brain. We take it as axiomatic that neuronal populations in different brain areas causally influence one another in large-scale

brain networks [53]. Considerable empirical research from studies of human and

non-human primates engaged in a variety of cognitive functions supports the

idea of large-scale brain networks [56-67]. Such networks may be usefully described, based on graph theory [62, 66-68] in terms of specific brain regions, called nodes, and specific connections between those regions, called edges.

Nodes and edges are identified either structurally as anatomical brain areas and connecting pathways, or functionally as interacting brain regions and functional connections (interactions). Functionally, network nodes may be revealed by various neuroimaging methods, e.g., BOLD signals in fMRI, or metabolism in

PET. Functional nodes are thus brain regions that interact during brain function

[61, 72-75]. Crucially, defining the nodes and edges as precisely as possible allows network function and dynamics to be inferred. Evolving recording

34 technologies and analytical techniques are contributing greatly to the modern understanding of large-scale brain networks [58-64, 74-95].

As interconnected systems with many interacting components, large-scale brain

networks certainly qualify as complex systems. They are characterized by

reciprocal causal influences between network nodes brought about by bi-

directional physiological influences. Yet, the seemingly simple phenomenon of

interaction reciprocity greatly taxes our understanding of causality. Surely physiological influences are causal, but our usual understanding of causality breaks down if one node causally influences another while that second simultaneously causally influences the first. This phenomenon is called mutual causality [96]. In the brain, mutual causality occurs when one node influences the activity of another, and vice versa (A causes B and B causes A). Since it is logically impossible for there to be mutual determinism, we assert that mutual causality in the brain must be probabilistic, with the influence of each node on connected nodes having a certain likelihood of causing an effect, or equivalently, having a certain influence strength.

A second feature of complex brain networks that presents a challenge for the concept of causality is the convergence of projections in the large-scale anatomical connectivity of the brain [97, 98]. Unitary causal paths of influence cannot be traced in a complex system in which multiple network nodes concurrently send causal influences to a single receiving node. This topological

35 arrangement, which involves the convergence of multiple inputs onto a single receiving node, is common in large-scale brain networks. The causal influences between nodes must be probabilistic in such a system since the influence from a sending node to a receiving node is only one of many converging influences on the receiving node7.

A third feature of complex brain networks is the behavior of neurons as threshold

units. The sum of postsynaptic effects may be subthreshold, in which case no

observable change in the postsynaptic spike firing pattern occurs. This feature

means that causal influences from one neuronal population to another may have

no observable effect. This is not to say that the influences have no effect at all:

the postsynaptic membrane potentials of the receiving neurons may be raised

closer to threshold (in the case of excitatory synapses), or lowered away from

threshold (for inhibitory synapses). These subthreshold neuronal “priming”

influences may have subtle but important effects that are not observable by

standard methods.

Of course, probabilistic causes with probabilities that are close to one are not

different from deterministic causes. Nonetheless, we argue that the prevalence of

7 It may actually prove more fruitful in the long run to abandon the language of causality all together, and replace it by the language of pattern generation and constraint [56]. In fact, pattern constraint may be more closely related to the physiological mechanisms that support interactions in large-scale brain networks than causality. The concept of constraint thus deserves closer inspection since it may be closely aligned with the probabilistic notion of causality: both constraint and probability are graded quantities, ranging from weak to strong.

36 mutual causality, projection convergence in brain connectivity, and subthreshold effects necessitates that causality be treated as probabilistic. Given that the causal influences in a large-scale brain network may have probabilities that are significantly below one, then a probabilistic framework is called for.

The concept of Shannon entropy is also useful for describing causal influences in complex brain networks (see discussion on transfer entropy below). It has been proposed that network nodes express information by entering different activity states, and that any node can potentially express a multiplicity of such states [99,

100]. Shannon entropy quantifies the information expressed by the node, defining it as a reduction in the uncertainty of information expression. When all possible activity states of the node have approximately the same likelihood of expression, Shannon entropy is maximal because there is maximal uncertainty about what its state will be [99]. The strengthening of causal influences onto the node may cause some states to increase in likelihood and others to decrease. In that case, information expression by the node becomes more certain, and

Shannon entropy decreases. The strengthening of causal influences in the brain may come about with learning. In this scenario, learning causes certain states to be "selected" for expression to the exclusion of others, and thus the distribution of activity states to become more non-uniform. Since learning is expected to involve all the nodes in a large-scale brain network, the mean entropy level of the network is predicted to decrease, and the network to become more ordered, as a result of learning.

37 2.8 Quantification of Probabilistic Causality in the Brain

It is a well-known property of stochastic processes that the past values in their time series can have causal, but non-deterministic, effects on future values in their time series and in the time series of other related processes. Such probabilistic effects can be measured by methods that involve time series modeling [101]. When the time series are measured from the nodes of large- scale brain networks, a correspondence may be observed between model- derived causality measures and causal influences between network nodes [101-

104]. Directed functional edges in large-scale brain networks are thus commonly represented by statistically significant effects derived from time series models

(Figure 2). These edges, representing directed influences in the brain, are consistent with the directed and probabilistic manner in which network nodes affect one another. The success of these statistical measurement methods supports the probabilistic nature of the causal effects that they reveal. It has been argued that measuring causal influences in complex brain networks requires quantification methods that meet several underlying criteria [102]. These criteria include: having a measurable strength of association between cause and effect; the measured causal relation having consistency; the relation having specificity; and it having temporality (cause preceding effect). Mathematical analysis is at the heart of a number of different methods used to uncover causal influences in the brain. Prominent among these are Weiner-Granger Causality (WGC),

Dynamic Causal Modeling (DCM), and Transfer Entropy (TE) [24, 57, 71, 106].

38 We will discuss all three approaches, but we focus foremost on WGC as a quantification method that infers probabilistic causality by applying mathematical methods to neural time series data, thus combining empirical data with a priori analytic techniques.

Granger [24, 107-108] used a conceptual notion provided by Wiener [109] to apply an operational definition of causality to economic time series data based on time series prediction (also called forecasting). In the economic context, inference about causal influence is made solely from time series data, “without any direct reference to background economic theory” [24]. The same is true when applying WGC to neural time series data – direct reference to background neuroscientific principles can essentially be ignored.

Granger’s approach, for two time series, is to determine whether one time series is predictive of the other. More specifically, for time series X1 and X2, if one can

better predict X2 at one time using earlier time values of X2 and X1, than by

simply using earlier values of X2 alone, then X1 contains information useful for

predicting X2. Even if the causal mechanism is unknown, it is said that X1

Granger-causes X2. Granger used linear autoregressive time series modeling to

formalize the notion. Thus, suppose that X1(t) and X2(t) are modeled by the

summations on the right-hand sides of the equations:

39

Equation 2.1 Granger causality autoregressive model where X1(t) and X2(t) are the predicted values of time series X1 and X2 ; p is the

maximum time lag in the model (called the model order); A is the matrix

containing the coefficients of the model, X1(t-j) and X2(t-j) represent the past

values of X1 and X2, and E1(t) and E2(t) are the prediction errors of the models

(also called innovations) (adapted from [110, 111]). If the variance of the

prediction error of the model of X2 is significantly reduced by including past

values of X1, as compared to not including them, then X1 Granger-causes X2. It is

assumed here that the time series are generated by wide-sense stationary stochastic processes. Also, this technique is based on linear regression, and it is assumed that a linear model is adequate to represent the time series [110-112].

If the assumptions of stochasticity, wide-sense stationarity, and linearity are valid, then the null hypothesis of no significant Granger causal relation can be tested with an F-test [111].

Although the aforementioned autoregressive formulation is bivariate (X1 and X2), the technique can be generalized to a multivariate autoregressive (MVAR) model in which three or more time series are represented [98]. One use for the MVAR model is to compute the conditional WGC. If three time series are X1, X2, and X3,

then WGC can be computed from X1 to X2 conditional on X3 [109]. Interestingly,

the bivariate case is often not able to distinguish between similar causal 40 connectivity patterns (see discussion below). For instance, if a pairwise analysis reveals a causal connectivity pattern from X1 to X2, the causal influence could be

direct, or indirectly mediated by a third time series X3. The WGC from X1 to X2,

conditional on X3, can distinguish whether the influence from X1 to X2 is direct or

mediated by the third time series X3, where X3 may be any recorded time series

[111, 114]. Hence, MVAR analysis reveals a causal influence from X1 to X2 if

information from X1 is useful for predicting X2, and, by computing conditional

WGCs, we can determine if that influence is mediated by any number of other recorded processes. Consequently, MVAR analysis may be usefully employed

for investigating the causal influences at play between the nodes of large-scale

brain networks [69-71].

These considerations lead to the question of the meaning of Granger causality: is

saying X Granger-causes Y equivalent to saying X causes Y? To recapitulate the

preceding, we have examined various views of causality from philosophy,

physics, complex systems, and neuroscience that lead us to support the idea of

probabilistic causality. We now argue that WGC is entirely consistent with that

idea, and, in a probabilistic framework, is a measure of causality. We follow

Hoover [116] in noting that the probabilistic notion of causality is a “natural

successor to Hume”, and note that Hume’s notion of causality is, as mentioned

above, fraught with conceptual difficulties, including spurious regularities. We

follow Hitchcock’s foundational spurious regularity example in which a cause (a

drop in barometric pressure) is regularly followed by two apparent effects: a drop

41 in mercury in a barometer, and, shortly thereafter, a storm (Figure 3). Any deterministic regularity theory of causality must specify that the drop in mercury caused the storm. However, such a causal relation is spurious. Analogously,

Tang et al [111], in considering heterogeneous functional interactions between brain regions, deal with the problem of the causal influence from one sender to two receivers (Figure 1B). As in the Hitchcock example, one receiver may be spuriously identified as being causal to the other, when, in fact, the apparent causal influence is due to the sender alone. Likewise, in Figure 1A, a driver may causally influence a receiver by acting through a third node that does not provide any additional causal influence but may be spuriously identified as a cause. By using conditional Granger causality from the multivariate expansion of WGC, these situations may be disambiguated [97, 98]. In conclusion, we agree with

Granger’s claim that the “weight-of-evidence” justifies that causality can be inferred from statistical data [24]. We further propose that conditional WGC, and other quantification methods like it, do, in fact, measure causal influences that are defined probabilistically. As such, these measures are well suited to quantifying causal influences in the brain.

Like WGC, dynamical causal modeling (DCM) is a mathematically based, testable methodology used for inferring directed influences in large-scale brain networks. In DCM, segregated brain regions that show activation in various neuroimaging techniques are represented as coupled dynamical systems that influence one another. One main difference between DCM and MVAR-based

42 causal inference is that the external inputs to the system that generate the neuroimaging data are specified in DCM in order to uncover hidden state variables. In his original paper on DCM, Friston states that, “The use of designed and known inputs in characterizing neuroimaging data with… DCM is a more natural way to analyze data from designed experiments … given that the vast majority of imaging neuroscience relies upon designed experiments” [117-120].

Another difference is that causal influences in DCM are considered to be deterministic: Friston further states that “The central idea behind dynamic causal modelling (DCM) is to treat the brain as a deterministic nonlinear dynamic system that is subject to inputs and produces outputs” [121].

The general framework for DCM, and its notion of causality, is motivated by a concept from control theory, namely, the input-state-output system [122]. In it, mathematical models are first constructed for the causes of brain activations in order to uncover hidden influences that cannot be observed directly. Then, those brain activations are measured in a designed experiment. Finally, DCM uses

Bayesian statistics to make an inference from the observed activations back to the model that could possibly have caused those activations. DCM employs a generative and iterative process that begins by modeling a deterministic nonlinear dynamical system as an ordinary differential equation. This equation represents the hidden state, and is called the bilinear neural state equation:

Equation 2.2 General State question for DCM.

43 where x-dot is the instantaneous rate of change neuronal state vector, and equals a function of the state x, known deterministic inputs u that perturb the state vector, and a set of parameters θ of intrinsic connectivity. This model is then combined with another model, a set of differential equations that map brain activity to observed responses [118]. This combination of models, called the generative model or hypothesis, is the dynamical causal model, and it represents the possible sources of the activity in a brain network. Different versions of this model are then constructed, compared to empirical data, and the optimal version is selected using Bayesian statistics. The weight parameters connecting the nodes in the optimal model represent the effective connectivity between network nodes. The number of such weights that can realistically be estimated is low, meaning that DCM is typically applied to interacting nodes that have previously been identified by other methods.

A fundamental question about DCM concerns the way that it treats the concept of causality. Given that DCM considers the brain to be a deterministic system, can it accommodate a probabilistic notion of causality? The external inputs to the system in DCM are clearly deterministic, in that they are known and experimentally designed. Furthermore, the hidden states are also deterministic since they are governed by a nonlinear deterministic equation of motion. Thus, in its standard form, DCM retains the classical deterministic notion of causality.

However, we do not conclude that its inherent determinism makes DCM an invalid methodology for inferring large-scale brain networks and causal

44 mechanisms in the brain. Rather, we assert that, in its current form, DCM does not allow for a probabilistic interpretation of causality. This situation might be remedied by adding a stochastic term to the state equation in DCM.

As for the relation between the WGC and DCM methodologies, we agree with

Friston and Seth that they are complimentary [118]. Even though the nature of causality in the two methods is fundamentally different, we believe that each approach is valid, one for data exploration and the other for subsequent detailed modeling. In fact, we envision them being used in tandem to understand complex brain systems.

Lastly we briefly consider transfer entropy (TE), an information-theoretic measure of the transfer of information between two time series. TE has been shown to be equivalent to WGC under normal conditions [106, 123]. Whereas WGC assesses causality by the reduction of prediction error in autoregressive models, TE measures it as reduced uncertainty. The transfer of entropy from time series X2 to time series X1 is the extent to which X2 reduces the uncertainty of future X1

values, beyond the information provided by X1 itself [102]. Much of the

conceptual analysis discussed above for WGC also applies: TE provides a

probabilistic inference of causality among time series from neuronal populations

interacting in a large-scale brain network.

45 To summarize, methods like WGC and TE quantify the statistical likelihood that a change in activity in one node of a brain network has a causal effect in another.

Since these measures are inherently probabilistic, we believe that they are generally better suited than deterministic measures for identifying causality in the brain. Yet DCM, while rooted in determinism, may nonetheless be usefully employed to determine causal influences among a limited number of network nodes that have previously been identified by other means.

2.9 Conclusions

Granger [24] made a poignant remark concerning causality that has often been overlooked in the milieu of the now popular use of his methodology in neuroscience:

I believe that definitions should be allowed to evolve due to debate

rather than be judged solely on a truth or not scale.

We have argued here that classical deterministic notions of causality generally fail when describing causal influences, and particularly so for causal influences in the brain. Although deterministic causality may be useful under some circumstances for describing neural influences, probabilistic causality is the more conceptually appropriate basis for describing influences in complex systems such as the brain.

46 The brain is a self-organizing system with an astronomically high number of interacting components, with both hierarchical and heterarchical organization, operating in both linear and nonlinear domains, and on multiple timescales.

Classical deterministic notions of causality are simply not plausible for describing

and explaining causal influences in the brain. Statements like “Brain region A has

a causal influence on brain region B” and “The baseball causes the window to

shatter” have different meanings. In the second statement, there is an intuitive

notion (that Hume would prefer) about everyday causes, whereas in the first

statement there is not. What “causal influence” means in the context of large-

scale brain networks is not the same as the classical deterministic notion of

causality. Although some thinkers have deemed causality to be unanalyzable [1], we argue that causality does exist in the human brain, and can, in fact, be measured.

We have not tried to answer the question of whether the brain is inherently a deterministic or stochastic system. This question is still being debated. The brain is clearly a high dimensional system, but it is difficult to distinguish between high- dimensional determinism and stochasticity. Nevertheless, as we have argued, statistical techniques such as WGC have proven to be empirically useful in

prediction, description, and explanation. Moreover, even if the brain does employ

deterministic influences, its high dimensionality gives the appearance of

stochasticity, and thus requires that causal brain measures foundationally rely on

a probabilistic notion of causality. A probabilistic framework for describing

47 causality in the brain is not only consistent with modern trends in philosophy. It is also comprehensive: with probabilistic influences having a range of strengths, it is able to accommodate influences of different strengths, including those with a probability of one, which may also be considered to be deterministic.

48 CHAPTER 3: MEASURING CAUSALITY IN LARGE-SCALE BRAIN

NETWORKS AT REST: NEURONAL POPULATION MODELING WITH THE

VIRTUAL BRAIN

3.1 Introduction

Current understanding of the neural basis of human cognitive processes depends heavily on an understanding of the structure, function, informational exchange, and dynamics of large-scale brain networks (LSBNs; Arbib et al.,

2000). Yet, even now, as work under the brain network paradigm is well underway (Fuster 2004), the neurophysiological mechanisms underlying functional interactions in LSBNs are not well understood (Bressler and Menon,

2010; Tang et al., 2012). Prevailing functional approaches to understanding brain connectivity include the study of undirected functional connectivity, which identifies the statistical interdependency of time series generated by different brain regions, and directed functional (or causal) connectivity, which identifies the directed transfer of causal influence between brain regions. The present report deals with the latter approach.

A major challenge in modern computational cognitive neuroscience continues to be a lack of ability to link methodologies used to analyze neuroimaging data with

49 the brain’s neurophysiological mechanisms that produce the data. This gap in knowledge underscores the need for computational models (Bressler and

Menon, 2010), which have become essential tools for understanding the dynamics of brain networks, both active and resting. To maintain computational efficiency, these models have become increasingly complex, mimicking the spatial and temporal properties of veridical data, thus giving insight into the underlying neurophysiological mechanisms.

Although there has been a profusion of empirical research focused on inferring causality in LSBNs (Wang et al., 2014; Broveli et al., 2004; Kaminski et al., 2001;

Friston, 1994), there has been scant conceptual exploration of the nature of neural causality itself. Previously (Mannino & Bressler, 2015), we rigorously analyzed the concept of neural causality from a foundational perspective, arguing that causal influences between cortical areas are fundamentally probabilistic, meaning that the transmission of activity from a sending neuronal population increases the probability of there being activity in a receiving one. In this study, we test whether Granger Causality (GC) and Transfer Entropy (TE), information- theoretic techniques based in probability and statistics, reliably recover a priori connectivity structures. GC is widely used to measure directed functional connectivity in brain networks (Bressler and Seth, 2011). Likewise, TE is well- established as a method for inferring directional influence in computational complex systems (Schreiber, 2000; Palus et al., 2001; Wibral, et al., 2016). We demonstrate that GC and TE, shown to be equivalent under Gaussian conditions

50 (Barnett et al., 2009; Barnett and Bossomaier, 2013), yield equivalent measures of causal influence in large-scale brain networks (LSBNs).

The overall objective of this project is to model the interactions of neuronal populations, to generate a time series from each population that simulates the local mean-field of potential of the population, and then to apply GC and TE to the time series to test for causal relations between the populations. We report on a systematic exploration of causality in a large-scale brain network simulation,

with the aim of better understanding causality between brain network nodes.

First, we create a two-node resting-state LSBN model with causal influence exerted between the nodes. For this, we use The Virtual Brain (TVB), a recently created large-scale neuroinformatics platform. TVB simulates biologically realistic

neuronal population time series in a resting state large-scale distributed brain

network model (Sanz-Leon et al., 2013, 2015; Woodman et al., 2014). Second,

we generate mean-field time series from the network nodes in TVB. Third, we

apply GC and TE analytic techniques, outside of TVB, to test for causal

influences in the TVB model.

The neuronal population at each node was modeled by the Stefanescu-Jirsa 3D

(SJ3D) equations in TVB. With the aim of elucidating how brain regions interact

with each other in the resting state, we first determined the model’s parameter

space where causality is recovered, which we call the causal parameter space

(CSP). For this, we parametrically varied the SJ3D parameters to determine how

51 intrinsic excitatory and inhibitory neuronal coupling and the level of bursting in the neuronal population affect long-range causality between network nodes. We also varied global parameters to determine how long-range causality is affected by white-noise amplitude, the long-range coupling parameter, and the temporal transmission delay between nodes.

Our results reveal that the existence of causal relations between the SJ3D nodes strongly depends on the two nodes having heterogeneous intrinsic dynamics. We further report that it depends on the intrinsic coupling between neurons within a node, the parameters that govern bursting behavior of the node, the strength of the global coupling in the system, and temporal transmission delays between the nodes. Furthermore, our results support the investigation of causal relations in clinical applications, such as in the analysis of information flow in pathological network conditions (Bressler, 2003; Bressler, 2008).

3.2 Results

SJ3D Model Output of Summed Modes

Applying the parameter values in Table 1 (see Methods), and a second order

Butterworth filter, the SJ3D node model produces a mean field time series approximation at each node (Figure 1). This time series represents 4k milliseconds of summed mode activity of the ξ (Xi) state variable for an uncoupled network node.

52

Figure 3.1. Output of the SJ3D model for an uncoupled node with intrinsic dynamics determined by appropriate selection of parameter values in The Virtual Brain (TVB).

Figure 3.2. Nodal output time series of the TVB model with two unidirectionally coupled network SJ3D nodes, called IPSc and DLPFc because of their position in the structural connectivity matrix of TVB.

Causal Conditions

The results of our simulations are shown in Figure 3 for four different conditions involving two network SJ3D nodes. In each condition, there are two plots of probability: (1) from Weiner-Granger Causality (WGC) calculated from the estimation of coefficients from the unrestricted autoregressive model (see

Methods), and (2) from the Transfer Entropy (TE) calculated using Gaussian estimation of the joint probability density functions of the two time series. All plots have the negative log of the probability (p-value) on the y-axis, with the green

53 horizontal line representing the alpha significance level of –log (0.05) = 1.3. Thus, a p-value below this line is insignificant, and above is significant (see Methods section).

In Figure 3a, both nodes have the same intrinsic dynamics, obtained by setting identical intrinsic parameter values in the SJ3D model in TVB, and the nodes are uncoupled (there is no connectivity, and no effect from the coupling function). In this condition, we correctly detect the absence of causal relationship, i.e., the causal influence does not become significant in either direction for any model order (for WGC), or for any embedding dimension (for TE).

In Figure 3b, again, both nodes have the same intrinsic dynamics, obtained by setting identical intrinsic parameter values in the SJ3D model, and are connected by unidirectional coupling (there is a weighted connection from IPSc to DLPFc and no coupling in the opposite direction), with a linear coupling function, and a

csf of 0.01, from IPSc to DLPFc. The inter-node distance, conduction speed, and sampling period are set in TVB to 87.354 mm, 10 mm/ms, and 0.9765 ms/sample, respectively. Interestingly in this case, even though the nodes have the same dynamics and are unidirectionally coupled, spurious causality is observed in both directions. We hypothesized that, since the two nodes were driven by the same dynamics, noise, and initial conditions, the spurious causality was caused by artificial synchronization between them.

54 We then hypothesized that introducing heterogeneity into the nodes (i.e., breaking the symmetry [Bressler and Kelso, 2016]) would allow the correct directional relations to be recovered. The results of symmetry breaking are correctly observed in Figure 3c. The causal influence computed by the first method (WGC) correctly becomes significant (emerges) when the model order becomes long enough to match the conduction delay, i.e., when the emergent model order (emo) reaches 8. This is, the (interpolated) number of samples that must be in the model order so that the model is long enough to match the conduction delay. The computed causal influence remains significant for all higher model orders. Likewise, the second method (TE) shows a similar shift to significance at a similar emergent embedding dimension (eed). Thus, both methods detect TVB’s generative model’s causal delay. We also show that similar results are obtained for bidirectional causality in Figure 3d, except that there is earlier emergence in the forward direction (IPSc  DLPFc) than in the reverse direction (DLPFc  IPSc).

Figure 3.3a. Condition 1: both SJ3D nodes contain the same parameter values, thus having identical intrinsic dynamics, and are uncoupled. (left) WGC results; (right) TE results. (red) from IPSc to DLPFc; (blue) from DLPFc to IPSc.

55 Figure43.3b. Condition 2: both SJ3D nodes contain the same parameter values, thus having identical intrinsic dynamics, and are unidirectionally coupled from IPSc to DLPFc. Note that in this case, where the dynamics of the two nodes are identical, the results are significant (by both WGC and TE methods) in both directions between IPSc and DLPFc, whereas the TVB causal influence is only from IPSc to DLPFc. (left) WGC results; (right) TE results. (red) from IPSc to DLPFc; (blue) from DLPFc to IPSc.

Figure53.3c. Condition 3: the SJ3D nodes contain unique parameter values, thus having different intrinsic dynamics, and are unidirectionally coupled from IPSc to DLPFc. Note that, with different dynamics for the two nodes, causal influence is significant (by both WGC and TE methods) in the correct direction (IPSc to DLPFc) and at the correct delay (m.o. > 8). (left) WGC results; (right) TE results. (red) from IPSc to DLPFc; (blue) from DLPFc to IPSc. 56 Figure63.3d. Condition 4: both SJ3D nodes contain unique parameter values, thus having different intrinsic dynamics, and are bi-directionally coupled. Note that causal influence is significant (by both WGC and TE methods) in the correct direction (both) and at the

Parametric Variation

Figure 3 thus showed that, with sufficiently different dynamics, correct causal relations could be recovered in either or both directions between two SJ3D nodes. In the next sections, we show further parameterizations of the model, detailing intermediate steps between completely identical dynamics (producing spurious causality recovery) and sufficiently different dynamics (producing correct recovery). Figures 4, 5 and 6 show the results of intrinsic (from within the

SJ3D model) parameter variation. Figures 7 and 8 show the results of extrinsic parameter variation, varying two parameters outside the SJ3D model: the conduction delay and the global coupling strength. In each case, we introduced different dynamics between the nodes by parametrically increasing a specific parameter differential. In so doing, the plots gradually recovered the causal relations that were set in the TVB simulator.

57 K12 (excitatory to inhibitory connectivity)

We first changed the intrinsic excitatory to inhibitory connectivity in each node

(called K12 in TVB), yielding a differential K12 between nodes. The series of

plots in Figure 4 show that, as K12 increased,𝛥𝛥 the spurious causality from DLPFc

to IPSc, and the spurious lower𝛥𝛥-model-order causality (WGC) and lower- embedding-dimension causality (TE), correctly decreased to non-significance.

Figure73.4. Parametric variation of K12 showing correct recovery of causal relations. 𝛥𝛥

58 K21 (inhibitory to excitatory connectivity)

Holding all other parameters constant, we next varied K21, the inhibitory to excitatory connectivity, and found similar results (not shown). 𝛥𝛥 K11 (excitatory to excitatory connectivity)

Similar results were found with the parameterization of K11.

𝛥𝛥

59

Figure83.5. Parameterization of K11 showing stabilization of causality. r (adaptation parameter controlling slow𝛥𝛥 variables)

The original Hindmarsh-Rose model, and thus SJ3D model, is sensitive to the adaptation parameter r, also called the bursting parameter (REF) (see Methods). When this parameter is varied only slightly, the model exhibits different bursting behaviors, undergoing various bifurcations, including several period doublings, as well as chaotic dynamics. Thus, we see this sensitivity reflected in measures of how this behavior affects causal relations.

60

61

Figure93.6. Parameterization of r showing recovery of causal relations.

𝛥𝛥

62 Conduction Delay

Figure103.7. Parameterization of Conduction Delay

63 Finally, we show here that in both GC and TE, the EMO and peak in TE capture the parameterization of the conduction delay set in the TVB simulator. For these results simulations were run in which the two nodes have different dynamics, and they are coupled from IPSc to DLPFc, and no connectivity in the reverse direction.

Coupling Scaling Factor (Global Coupling Strength)

We used a linear coupling function to couple the two nodes, from IPSc to DLPFc, and no connectivity in the reverse direction. Interestingly, we found that the unidirectional causality was quite sensitive to coupling scaling factor (csf), as there exists small range of values for which, below the range no causality was recovered, and above, repeating results of spurious causality. Correct recovery seems to converge at a value of 0.01.

64 Coupling Scaling Factor with Linear Coupling Function (csf-L)

Figure113.8. Parameterization of coupling scaling factor’s (global coupling) effect on recovery of causal relations. 65 3.3 Discussion

Overall, the results of this study empirically evaluate the causal relations between two 18-dimensional Stefanescu-Jirsa 3D nodes in the The Virtual Brain. One

central question our study addresses is: what is the relationship between intrinsic

and extrinsic parameters in a biologically realistic neural mass model and the

directed information flow that can be reliably recovered from that model? What

can we learn from the results in this study? First, it is important to note that

directed functional connectivity revealing network structure is not completely

understood among cortical areas using a biologically realistic computational

model, and thus starting with two nodes is a justifiable simplification. Moreover,

as we found throughout our investigation, even causal dynamics between two

nodes can be complicated; that is, beyond simply recovering the a-priori

directionality in TVB, we sought to explore the intricate causal dynamics of our

two node model. We have thus accomplished two goals: empirically validating

TVB (especially as useful for theoretical exploration of directed functional

connectivity), and, for the first time to our knowledge, discovering what changes

both intrinsic and extrinsic parameterizations have on directed functional

connectivity, using both GC and TE. TVB has the power to simulate biologically

realistic large-scale resting state networks and thus, our examination of the

causal dynamics was performed on a resting state, task negative network (Bajaj,

et al., 2016).

66 In our first set of conditions, we explored the possibility of using TVB to recover directed functional connectivity, and did so in several ways: no coupling with same dynamics, unidirectional coupling with same dynamics, unidirectional coupling with different dynamics, and bidirectional with different dynamics. We first found that with no coupling – either in the structural connectivity matrix, or a value of zero for the linear coupling function – both WGC and TE did not find any significance values. We then found that if we kept the dynamics entirely

equivalent, and coupled the nodes with directed connectivity in one direction, it

was difficult to recover the directionality set, regardless of how strongly the

coupling was set. That is, with the same dynamics, we recovered spurious

connectivity even if there was none at various lags. We attribute this to the

uniform synchronization of the nodes, and the inability for WGC and TE to

distinguish directional causal relations. (REF) We then began to systematically

vary the intrinsic dynamics of one node, thus changing the inter-nodal dynamics.

With the dynamics sufficiently distinct from one another, both WGC and TE could

recover the correct causal relations between the two nodes. Throughout all of the

following parameterizations, we held all other parameter values constant, and

always arrived at the same final causal dynamics once the nodes have

sufficiently different dynamics. We chose to change only one parameter value

within the SJ3D node at a time since, given the complexity of the model, it would

be difficult to parse out which parameter has more of an effect, if any, on causal

relations, and since the entire causal parameter space is practically infinite.

67 In the case with different intrinsic dynamics, and using a linear coupling function value of .01 for the coupling scaling factor (see Methods), when we de- synchronize the nodes by introducing heterogeneity into the intrinsic dynamics between the two nodes, we gradually began to recover correct network structure.

We note here that in the field of metastable coordination dynamics (Bressler and

Kelso, 2001; 2016), introducing a difference in the intrinsic dynamics (i.e., intrinsic frequencies) between two oscillators breaks the symmetry of the model and has been shown to be crucially important for revealing metastability in the brain, as well as a central feature of applying nonlinear dynamical systems to large-scale cortical networks. Though, to our knowledge, no empirical research has explored whether metastability has effect on the directed flow of information in large scale brain networks. We found a direct mapping from how much we varying the dynamics to how much of the correct causal structure was recovered.

That is, as we slowly pushed the dynamics farther apart by gradually increasing the difference of parameter values, correct causal structure begins to emerge, until finally they are sufficiently different. Beyond this point, correct causality is always recovered, including in either direction, and the bidirectional case.

Moreover, we found that the emergent correct causal structure was differentially sensitive to specific intrinsic parameters of the SJ3D model, as well as extrinsic parameters like global long-range coupling. First, we held all other values constant and only varied K12. We found that as increased this differential, 𝛥𝛥 correct connectivity was recovered rather quickly, until its value was sufficiently

large, at 0.7. In the SJ3D model, this parameter controls excitatory to inhibitory

68 intrinsic connectivity between neurons in the neural mass. Comparatively, when we varied K11 (the excitatory to excitatory connectivity),we found the causal 𝛥𝛥 structure to be correctly recovered at 0.5. Finally, we explored what effect

parameterizing r would have on causality. As discussed in the Methods section, r

is a parameter in the bursting variable, which allows for increasing complexity of

the model. The model’s behavior is very sensitive to variation in r which can vary

on the order of 1e-3, and thus, as expected, we found the causality to be very

sensitive as well. Nevertheless, it had the same overall effect on the causal

dynamics of the model. In all of these cases, the red and blue lines, the direction

of causation and of no causation, respectively, begin to lower the p-values until

they “settle” into the a-priori causal structure of the model.

We then demonstrated that not only is the causal structure dependent on intrinsic dynamics of the neural mass, but also to the strength of the global coupling.

Coupling strength has been shown to be very important to network dynamics

(Becker et al., 2015). Using the long range linear coupling function in the TVB simulator, we found that we could only recover the causal dynamics within a small range of the coupling strength called the coupling scaling factor (csf). The csf represents the slope of the linear, long-range coupling function. This parameter effectively rescales the incoming activity into the node; small values allow for stronger local nodal coupling, whereas for larger values the global coupling dominates (Sanz-Leon, et al., 2013). At values of ~0.001 the coupling is too weak and neither WGC nor TE can recover the directional influence. This

69 begins to change at .005, and as we increase the csf to 0.01, we see correct structure. Beyond this, starting at 0.1, we begin to see spurious causality, but interestingly not without structure. Spurious causality in the direction where there is none (from DLPFc to IPSc) begins to emerge. We thus find, in TVB, that as far as directed functional connectivity can be measured, it will be crucial to understand the importance of the balance between the global coupling strength and the strength of the intrinsic connectivity of each SJ3D node.

Finally, we parameterized the conduction speed in the TVB simulator. Since

(see Methods) there is a finite constant distance between the two nodes (based on TVB connectivity datasets), varying the speed allows us to explore the causal influence in terms of a conduction delay (CD). Conduction delay is a central feature of TVB, and extremely important for understanding network dynamics.

Introducing a CD into the model allows for more biological realism as a transmission delay has significant impacts on large scale cortical dynamics (Jirsa and Kelso, 2000; Ghosh et al., 2008; Knock et al., 2009; Jirsa et al., 2010). With different dynamics between the nodes, we found that both WGC and TE capture the causal influence at the appropriate conduction delay. Importantly, as we vary the conduction speed from 4mm/ms to 30 mm/ms, we see WGC and TE can reliably recover the causal influence. We also plot the EMO against the CD to clearly illustrate their linear relationship. In the WGC case, we see the EMO occur and then line “elbows” as in our other results, whereas in TE we see a peak. This result is explained in the Methods section.

70 One of the important aspects of our study is the use of both WGC and TE. We employ both of these techniques for several practical and theoretical reasons: 1)

WGC and TE have been show to be theoretically equivalent for certain conditions

(Barnett et al., 2009), and we empirically demonstrate here, for the first time using The Virtual Brain, that equivalence; 2) both WGC and TE are well- established methods on directed functional connectivity in complex systems like the brain; 3) although both share a foundational background theory, they differ in that WGC is model-dependent and (generally) parametric, whereas TE is model free and non-parametric. In addition, WGC is linear and TE captures both linear and nonlinear interactions. Nevertheless, WGC and TE share a close relationship based on Weiner’s original sense of causality between time series – namely, that for two observed stochastic time series, the first is causal to the second if knowledge of the past of the first improves prediction of the second more than what is predictable from the past of the second time series alone

(Wiener, 1956; Bressler and Seth, 2011). WGC is a statistical implementation of this principle based on regression, and TE is an information-theoretic one based on Shannon entropy. We note that although there has been some controversy and confusion, we emphasize that neither propose to measure how information flows through a system, i.e., how one variable, the sender, has a causal effect on another, the receiver. Rather, they measure information transfer, also called predictive transfer, that is based on predictability (Lizier and Prokopenko, 2008).

71 To sum up, we have systematically explored the causal structure of a two node model using TVB, finding, for the first time to our knowledge, that there exists a causal parameter space dependent on various intrinsic and extrinsic parameters, where the casual influence is more, or less, recoverable. The knowledge gained from this study can be used both theoretically when designing models for cortical networks, and practically, when applied to veridical data.

3.4 Methods and Modeling

Modeling with The Virtual Brain (TVB)

To better understand the causal interactions between two nodes in a large scale brain network, we applied a two-step method: (1) generate neural data using a neural mass model in The Virtual Brain (TVB), and (2) analyze the data using both the linear autoregression technique of WGC and the information theoretic technique of TE. TVB is a novel platform (http://www.thevirtualbrain.org/tvb,

(Sanz-Leon et al., 2013) dedicated to better understanding the resting-state

large-scale dynamics of neuronal populations, which generate their activity at the

mesoscopic level of cortical columns. Although one of the central features of TVB

is the ability to create personalized brain networks for analysis using an

individual’s functional neuroimaging data, our study focused on using TVB for a

generalized theoretical exploration of the dynamics of large-scale brain networks.

As a neuroinformatics simulation platform, TVB comprises brain dynamics at

multiple scales, with local neuronal models that govern intrinsic dynamics, each

having different state variables and control parameters representing biophysical

72 population features, e.g., neuronal oscillations, channel dynamics, mean-field membrane potentials and neuronal bursting mechanisms. Globally, TVB incorporates parameters representing inter-node dynamics, including long-range coupling, various deterministic and stochastic integration schemes, and conduction speeds of the transmissions sent between nodes (Jirsa et al., 2010;

Sanz Leon, et al., 2013; Ritter, et al.,2013; Woodman, et al., 2014).

The Neural Mass Model, local and global parameterizations

The neural mass model we used to describe the intrinsic dynamics of each node is the Stefanescu-Jirsa 3D (SJ3D) model. Historically stemming from the pioneering work of Walter Freeman, neural mass models like the SJ3D yield the kind of emergent dynamical behavior that is observed using EEG, LFP, MEG, or

ECoG modalities. Neural mass modelling techniques are desirable for modeling large-scale neural interactions in that they significantly reduce the computational load for larger neuronal assemblies. The Stefanescu-Jirsa model has two forms, the 2-dimensional form originating from the Fitzhugh-Nagumo model, and the 3-

dimensional form which comes from the single-neuron Hindmarsh-Rose model

(Hindmarsh and Rose, 1984). Using nonlinear dynamical systems theory, the original Hindmarsh-Rose (HR) model consists of three coupled differential equations, two of which are nonlinear (see equations). The three state variables in this case represent the instantaneous rate of change of the membrane potential ( ), the spiking of the neuron ( ), both of which operate on fast time scales, and𝑥𝑥̇ the bursting behavior of the𝑦𝑦 neuroṅ ( ), which operates on a slower

73 𝑧𝑧̇ time scale. and capture the transport of ions through fast channels, while ( ), called the bursting𝑥𝑥̇ 𝑦𝑦 ̇variable, gives the inward current to the neuron through 𝑧𝑧̇ slower channels. In this model, a, b, c, d, r, and s are control parameters which represent the rates at which ion channels allow the exchange of ions. I is the main control parameter and represents the external input into the neuron.

Varying I gives different behaviors of the neuron, as mentioned above.

= + + 3 2 𝑥𝑥̇ = 𝑦𝑦 − 𝑎𝑎𝑥𝑥 𝑏𝑏 𝑥𝑥 − 𝑧𝑧 𝐼𝐼 2 𝑦𝑦̇ = 𝑐𝑐[−(𝑑𝑑𝑥𝑥 −)𝑦𝑦

0 𝑧𝑧̇ 𝑟𝑟 𝑠𝑠 𝑥𝑥 − 𝑥𝑥 − 𝑧𝑧

Equation 3.1 Hindmarsh- Rose Model

Putting together a network of Hindmarsh-Rose (HR) neurons into a

neurocomputational unit involves the use of mean field approximations and, in

this case, dimensional reduction techniques (Stefanescu and Jirsa, 2008;

Breakspear and Jirsa, 2007). In a mixed network of excitatory and inhibitory

neurons, the external input I then gives rise to a parameter distribution g(I) (with

mean and standard deviation µ and σ in TVB). This distribution governs the

dispersion of excitability of the neural population (since each neuron in the

population can have a different level of excitation). The network connectivity of

the mixed population is modelled by three important parameters K11, K12, and

K21, excitatory to excitatory, excitatory to inhibitory, and inhibitory to excitatory

influences, respectively (these are represented in Figure 11 as arrows between

74 neuronal populations). In the individual nodes, the neurons are coupled linearly and synaptic transmission delays are ignored. Given this, the collective behavior, or average activity, is considered for a population of 200 neurons, 150 excitatory and 50 inhibitory. The activity of the entire network within the neural mass is given by Equation 2, representing three base modes (or activity regimes), each reflecting the activity of the excitatory and inhibitory populations.

Each mode tracks a specific dynamical behavior of the neuronal population, representing that behavior as a cluster in state space, and captures the distribution of the dispersion parameter. In sum, the SJ3D model is a reduced network which reduces the 3 equations of 200 neurons , a total of 600 equations, to 18 neural mass equations, thus being an 18-dimensional model. (3 modes each containing 6 differential equations - 3 for the excitatory population, and 3 for

the inhibitory population). Here, ξ, η, and τ are analogous to the state variables

x, y, and z of the original HR single neuron in the excitatory subpopulation, and

α, β, and γ are the state variables of the inhibitory subpopulation. They give the

mean-field potentials of each state variable. Aik, Bik, and Cik, are the coupling

constants. This model is capable of complex dynamics, including fixed point

dynamics, spiking limit cycle oscillations, bursting, synchronization, multi-

clustering, oscillator death, as well as chaotic dynamics. Stefanescu and Jirsa

have shown that this reduced, low-dimensional population model is still able to

reproduce the dynamics of the full system. Although getting closer to biologically

realism, the SJ3D model is particularly sensitive to parameter fluctuations, and

can lose stability quite easily (Stefanescu and Jirsa, 2011). Thus, as in most

75 mathematical modeling of neural masses, there exists a tradeoff between how close the model approaches reality and its robustness. As our research demonstrates, this tradeoff is reflected in the designed casual structure of the model system.

= a + b K [ A ] - K B + IE 3 2 3 3 ̇i i i i i 11 k=1 ik k i 12 k=1 ik k i i ξ η − ξ ξ − τ − =∑ ξ+− ξ �∑ α − ξ � 2 𝑖𝑖 𝑖𝑖 𝑖𝑖 𝜂𝜂̇= 𝑐𝑐 − 𝑑𝑑𝑑𝑑 + 𝜂𝜂

𝑖𝑖 𝑖𝑖 𝑖𝑖 𝑖𝑖 = e +𝜏𝜏 ̇f 𝑟𝑟𝑟𝑟 𝜉𝜉 −+ 𝑟𝑟K𝜏𝜏 𝑚𝑚 C + II 3 2 3 i i i i i 21 k=1 ik k i i α̇ β − α α =− γ +�∑ ξ − α � 2 𝑖𝑖 𝑖𝑖 𝑖𝑖 𝛽𝛽̇ = ℎ − ℎ𝛼𝛼 +𝛽𝛽

𝑖𝑖 𝑖𝑖 𝑖𝑖 𝑖𝑖 𝛾𝛾̇ 𝑟𝑟𝑟𝑟𝛼𝛼 − 𝑟𝑟𝛾𝛾 𝑛𝑛

Equation 3.2 The Stefanescu-Jirsa Model

Globally we can then represent the entire two node network by the general

evolution equation of TVB, which comprises the intrinsic dynamics of both nodes,

as well as the extrinsic dynamics governed by the global parameters. Crucially,

both the intrinsic and extrinsic variability of the nodes in the model determine the

global dynamics. Here, the activity of each node is governed by the SJ3D model.

The main excitatory state variable ξ is governed by its own intrinsic dynamics

and is coupled to the other node by a product Γ, which is the product of the

coupling scaling factor from a linear coupling function and the connectivity

weighting. In addition, the interaction between nodes involves a time delay

76 (governed by the axonal conduction speed and the inter-node distance) and noise (Becker et al., 2015).

( ) = ( ) + [ ] + ( )

𝑖𝑖̇ 𝑖𝑖 𝑖𝑖𝑖𝑖 𝑗𝑗 i𝑗𝑗 Equation 3.3 The general𝜉𝜉 𝑡𝑡 evolution𝑓𝑓 𝜉𝜉 equationΓ 𝜉𝜉 �𝑡𝑡 −that∆𝑡𝑡 determines� 𝜖𝜖 𝑡𝑡 the global dynamics of the two node neural mass model.

In the model, two nodes were chosen to represent the IntraParietal Sulcal cortex

(IPSCc) and the dorsolateral prefrontal cortex (DLPFc), both known to be involved in working memory. Although not used here, the large-scale connectivity

matrix in TVB is typically derived from individual diffusion tensor imaging (DTI),

tractographic data combined with directional weights data from the CocoMac

database (Ritter et al., 2008), and can be modified by changing the brain regions,

distances of fiber tracts between nodes, or the connection weights. Time series

were produced from each node by averaging the excitatory modes of the SJ3

model at that node. Numerical integration was performed for all analyses using a

stochastic integration scheme (variable-order Adams backward differentiation

formula with noise), and an integration step size of 0.012ms.

We note here that the simulated time series contained both noise and signal, yet

we note here that it is the signal component which is reflected by the causal

influence. With no noise, the time series would be deterministic, and not amenable to GC or TE analysis. That is, there would be perfect correlation, and thus no variability, and when both nodes have the same dynamics, even though causal influence may exist, it cannot be gleaned from statistical analysis. Finally,

77 each simulation length was 5000ms. Table 1 shows the various values we used in all of our simulations.

Different large-scale connectivity matrices were utilized depending on the pattern of inter-node influence being tested. With no coupling between nodes (condition

1), the connectivity pattern had 0 weight in both directions. When the nodes were coupled in one direction only (condition 2), say from IPSc to DLPFc, the weight was 9 in the forward direction from IPSc to DLPFc, and 1 in the reverse direction from DLPFc to IPSc. When the nodes were bi-directionally coupled (condition 3), the weights were 4.5 in each direction. For each connectivity pattern, we conducted multiple simulations for each parameterization of various local and global control parameters. For each simulation the first 1000 ms of the time series from each node was eliminated due to transient behavior, allowing the signal to become stable.

78

Figure123.9. Two node neural mass model of SJ3D nodes showing the pattern of unidirectional driving (condition 2).

Causal Analysis

Causal analysis consisted of a several-step process: for each simulation and

each node: we first extracted ξ, the main state variable for excitatory activity

being monitored in TVB. (We did not monitor η, τ of the excitatory

subpopulation, nor α, β, and γ of the inhibitory sub population.) We next summed

ξ over the three modes; we then z-scored the extracted time series for the

summed ξ, and plotted the time series for each node, first unfiltered, as in the

model equations, then band-pass filtered. A 2nd order Butterworth filter revealed alpha and beta oscillatory activity in the time series from the model. All causal analysis was performed without downsampling (Barnett and Seth, 2017).

Stationarity of each time series was confirmed by a Dickey-Fuller test. The

79 undirected functional connectivity (UFC) was computed as the inter-node cross- correlation to reveal any possible relationship between the nodes. Finally, causality was determined by performing bivariate directed functional connectivity

(DFC) analysis on ξ using GC and TE.

Figure13.10. The three modes of the SJ3D model for the main excitatory state variable () of both nodes.

80 For GC, two linear autoregressive models, the unrestricted and restricted models, were fitted to the time series data, using OLS, where p is the model order, α, β and γ are the coefficients, and εt is the residual error:

= 𝑝𝑝 + 𝑝𝑝 +

𝑌𝑌𝑡𝑡 � 𝛼𝛼𝑖𝑖𝑌𝑌𝑡𝑡−𝑖𝑖 � 𝛽𝛽𝑖𝑖𝑋𝑋𝑡𝑡−𝑖𝑖 𝜖𝜖𝑡𝑡 𝑖𝑖=1 𝑖𝑖=1

Y = p Y +

t � γ i t−i ϵt i=1 Equation 3.4 The unrestricted and restricted models are used to compute Granger causality.

The unrestricted model was constructed by regressing the receiver (DLPFc) time

series on both its own past and on the past of the sender (IPSc). We then

performed a Wald test to determine if the β coefficients in the unrestricted model

were jointly significantly different from zero. If the Wald test yielded a significant

p-value (p < 0.05), it meant that the X variables in the unrestricted model

significantly improved the fit of the data by the model, and the directed interaction

was statistically significant. We performed this test for model orders (lag lengths)

1 through 20, plotting the p-value against the increasing model order, to

determine the model order with the most significant p-value and visually identified

the shift from non-significance to significance at the “emergent” model order -

EMO. A Bonferroni family-wise error correction rate was applied to the series of

p-values.

81 For GC, we computed the delayed model, with coefficients computed for different delay values between the modeled time point and the model itself. The autocorrelation of each signal, as well as the cross-correlation between them, were examined to check for delayed interaction. Our results demonstrated that the computed EMO matched the conduction delay between the nodes in TVB.

For TE, we used the Java Information Dynamics Toolbox (Lizier, 2014). TE was computed as a function of 3 parameters: embedding dimension, embedding delay, and source-target lag (also called TE Delay) (Wibral, 2014). A Gaussian estimation of the joint probability distributions was used. TE was computed by the following equation:

( , ) ( , ) ( , ) ( ) ( , ) TESR (l, k, τS, τR,µ) = H(R | R ) - H(R | R , S ) k τR k τR k τR k l τS t t−1 t t−1 t−1−µ

Equation 3.5 Transfer Entropy from Sender S to Receiver R as a function of five parameters.

In this equation, the TE from sender to receiver is the reduction of uncertainty or

entropy of the current state of the receiver when the states of the past of the

sender are taken into account, given the past states of the receiver as well

(Bossomaier et a., 2016). Here, H() is the conditional Shannon entropy which is

the average amount of uncertainty needed to describe one variable given that

another is known.

82 In order for the TE results to be analogous to the GC results, we used surrogate data to similarly perform null hypothesis statistical testing on TE. A null distribution was created using surrogate time series data, which contained all of the properties of the original data without causal connectivity. TE was then computed for the two veridical time series, and then using one of them with the surrogate data. As in GC, a p-value was returned under the null hypothesis of no causality. We then computed TE again, against a range of embedding dimensions. The embedding dimension was analogous to the model order in GC in that it represented the number of previous states (lags in this case) to include in the computation. For conduction delay, we plotted the p-value against the source-target lag to find the lag where the causal influence was captured.

Traditionally, the embedding dimension, embedding delay and source target lag are set and TE is computed. Here, we ran a range of computations to see where the causal influence was found. For the conduction delay plots we observed a peak in TE, as we computed TE for a range of source target lags, which only included that particular lag in time. Thus, we observed TE>0 for that particular lag only.

83

Figure143.11. System Model of Unidirectional Causal Influence Between Two Cortical Nodes: This condition depicted here represents the coupled network dynamics from IPSCc to DLPFc. The equation represents the general evolution of the coupled system, which includes: the intrinsic dynamics of one node, an extrinsic coupling (capital Gamma) affecting the dynamics of each node as well as causality, the dynamics of the other node, delay time between nodes, and noise. The Gamma term is a product of the connectivity weight set in TVB and the linear coupling scaling factor (csf). Xi is the mean field of the fast variable of the excitatory population, and represents the summed transmitted output dynamics of all three modes of the SJ3D model. The sender and receiver node in this case produce different temporal dynamics due to variation in parameter settings as well as extrinsic connectivity, represented in green and purple. Finally, the causality analysis is performed based on prediction of current time point, based on Granger causality and transfer entropy.

We include in this section a workflow diagram to illustrate how the signals were generated in TVB, causally connected, and then causally analyzed. The table is included to show the TVB model parameters that were used.

84 Figure153.12. Graphical workflow representation of sequential steps taken to generate the simulated time series data in TVB, and the analysis that was subsequently performed in R. 1. The connectivity matrix was created in TVB with weights and directionality selected, in this case with 2 nodes. 2. The local dynamic model was selected, in this case the Stefanescu-Jirsa 3- dimensional model (a reduced Hindmarsh Rose model). 3. Parameter values were chosen for each node model. 4. The linear coupling function was selected with the coupling scaling factor “a”. 5. The conduction delay was chosen in mm/ms. 6. In the TVB monitor, temporal averaging (of multi-unit activity) was selected, along with the sampling frequency and the simulation time in ms. 7. The 3 modes of the main excitatory state variable of the SJ3D model were summed, and then used for causal analysis. A 2nd order Butterworth filter revealed alpha oscillations. 8. Causal analysis was performed by Granger causality and transfer entropy in R.

85 Local and Global Values Description Parameters a, b, c, d 1, 3, 1, 5 Fixed parameters controlling the transport of ions through fast channels, or spiking frequency, also switching between spiking and bursting s 4 Adaption parameter, sub-threshold overshoot r Range:{0, 0.694} Control parameter affecting the number of spikes per burst X0 -8/5 Resting potential, fixed point of x. µ, σ 3.1, 0.4 Control parameters affecting the non- identical neuronal membrane input ∆K11, ∆K12, and K21 Range:{0.0 - 0.7; 0.0 - Control parameters 0.5}, constant at 0.15 affects the strength of excitatory to inhibitory connectivity within each node Coupling Scaling Linear-Range:{.001, Parameter affecting Factor (Linear and 5.0} the strength of Nonlinear) coupling between the nodes Conduction Speed Range:{1, 20} Representing the time (mm/ms) delay between nodes Monitor (Temporal .0976523 ms Rate at which Average): Sampling (1024Hz) samples of signal are Period taken Integration Step Size dt=0.01220703125 ms For integration scheme Noise 1.0 Introduces noise in the general evolution equation Simulation Length 5000ms

Table 1: Parameter values used in TVB model.

86 3.5 Conclusion

Using a coupled two-node, resting-state, large-scale neural mass model generated by the The Virtual Brain, we have systematically investigated the causal dynamics produced by the system. Our work has implications for several fields, including coordination dynamics, functional and causal connectivity of large scale brain networks, and theoretical neuroscience. Directionality was recovered and further investigated by varying network dynamics, both intrinsic as well as whole-system (extrinsic) dynamics. Based on a framework that cortical causality in inherently probabilistic, we found that a causal parameter space exists where causality is more, or less, recoverable. The nature of how causality works in the brain is still not well understood, and this research hopes to fill in some empirical and theoretical gaps.

87 CHAPTER 4: ANALYZING CAUSAL RELATIONS BETWEEN THREE NODES

USING THE VIRTUAL BRAIN

4.1 Introduction

Two of the central motivations for conceptualizing causal relations (in the brain) as probabilistic rather than deterministic are the notions of so-called imperfect regularities and spurious regularities. In probabilistic causation (PC), causes do not uniquely determine, but rather raise the probability of their effects occurring.

As we have previously discussed in Mannino and Bressler (2015), deterministic or regularity theories of causality state that causes are invariably followed by their effects, always in constant succession or conjunctions. In many cases, however, they are not – e.g., smoking does not always cause lung cancer. In the brain, it is not always the case that if two areas are structurally or functionally coupled, neural activity in the first will cause neural activity in the second. Likewise, for spurious regularities, if a cause is regularly followed by two effects, either mediated indirectly by a third variable, or directly but with a delay in time, then a regularity theory will only infer a spurious causal connection between the two effects in the latter case, and between the cause and second effect in the former case. We call the first case “sequential”: X causes Y which in turn causes Z, and

88 in the second case, “temporal”: X causes both Y and Z, but with different delays

(see Figure 1).

The information-theoretic measure of causal relations in the brain known as

Weiner Granger causality (WGC) is based in probability and statistics and as such, we have argued, captures a probabilistic notion of causation in the brain.

This model-based method of WGC used to infer causal relations (directed functional connectivity) between two neural time series is based on predictability

– if the prediction of one variable is improved when the other is taken into

account, a WG-causal connection is established. Crucial for WGC is the axiom

that causes temporally precede their effects. However, pairwise analyses of the

three variables in both the sequential and temporal causal structures may yield

spurious or misleading WGC significance, that is, a spurious causal relation

where there is none. In these cases, a further extension of WGC was developed

by Geweke (1984), to account for the effects of the third endogenous variable

called conditional Weiner-Granger causality (cWGC, also called statistical

ablation). Here, if the prediction of the effect is improved by including information

from the cause, while including information from the third variable, then an

indirect WG-cause can be established and the presence of the third variable is

eliminated as being a possible direct cause. In the sequential case, significant

GC may be spuriously identified from X to Z, and in the temporal case from Y to

Z; that is, a pairwise analysis alone cannot reveal if there is a third causal

variable present. cWGC however, has the ability to “condition out” the third

variable.

89 Reichenbach (1956), one of the first to develop a full theory of PC, introduced a notion he called “screening off” which is meant to conceptually explain these two particular types of probabilistic relationships involving three variables, sequential and temporal. In the context of large-scale brain networks and neural time series analysis, Ding et al., (2006), Dhamala et al., (2008) and Tang et al., (2011) have used one or both of these examples to demonstrate the need for cWGC to rule out the possible effects of a third causal variable, and have shown that

“screening off” in PC is equivalent to “conditioning out” in GC.

In this paper, we demonstrate this equivalence using the two causal structures discussed in the context of PC by Reichenbach, and in the context of large-scale brain networks (LSBN), by Ding et al., (2006). We do this using a neural mass model (NMM) as implemented in The Virtual Brain (TVB). As in our previous two node research, we use the Stefanescu-Jirsa 3D (SJ3D) model in TVB. Our results establish several interesting findings in terms of information flow in

LSBN’s: 1) pairwise analysis in both causal structures reveals spurious GC, and cWGC can remove the (non-existent) spurious causal connection, 2) the strength of the global coupling between neural masses is important for yielding correct

WGC results, 3) the intrinsic dynamics of each neural mass is also important for establishing information flow. In veridical neural data, coupling between brain regions, axonal transmission times, and internal circuitry all play key roles in the ability to establish functional connectivity. These results validate the notion that functional connectivity methods measuring information flow in the brain is not

90 straightforward, and that the several factors come into play for optimal methods to ascertain ground truth.

Figure164.1. Two causal structures: sequential (causal structure 1) and temporal delay (causal structure 2). In the sequential case, IPSc causes DLPFc, which then causes V1. There is a 10ms delay between each connection, with a directed connection weight of 3 (indicates strong) between each. In the temporal case, IPSc simultaneously causes DLPFc and V1 with longer and shorter delays respectively, 10ms and 2ms. In each case, the solid arrows represents actual connections, and the dotted arrows represent potential spurious connections revealed by bWGC.

4.2 Methods

Much of methodology used in this paper is recapitulated from our previous work in Mannino and Bressler (2018), and so will not recounted here. Concerning the generative model, we continue to use the SJ3D model in TVB, and vary the intrinsic dynamics as before. External parameters in TVB are the same, including

91 conduction delay, integration scheme, linear coupling function, temporal average monitoring, and simulation length. We have set up the two causal structures in

TVB and used the same connectivity weights (see figure 1). In the sequential case, the IPSc causes DLPFc, which in turn causes V1. We use these nodes as implemented in TVB. In the temporal case, IPSc simultaneously causes DLPFc with a longer delay, and V1 with a shorter delay.

Concerning the causal analysis, we again use the p-value produced by the Wald test on the unrestricted autoregressive model of specified lag length to test for significance between the time series for bivariate WGC (bWGC). We plot the negative log of the p-value against the lag length, capturing the pre-set conduction delay which governs the directed causal influence in the NMM. We have previously shown that the Wald test is not significant for lags lower than the conduction delay, for which there is no causal influence, then becomes significant for lags at or higher than the conduction delay, for which there is enough temporal delay for the directed causal influence to be “felt” by the receiving node. For the cWGC, we use the implementation used by Guo et al.,

(2008), and Ding et al., (2006) (see appendix A for the formalism). This method, originally formulated by Geweke (1984) takes into account the possible effects by all other nodes.

92 4.3 Results

Our results empirically reproduce both modeling cases (sequential and temporal) from Ding et al., (2006), and demonstrate the conceptual equivalence of screening off with conditioning out. In that paper, both causal structures used a linear autoregressive model as the generative model. In our case, we use a considerably more complex nonlinear NMM as implemented in TVB, and as such we found that, in the case of the temporal structure, the results are as predicted.

In the case of the sequential structure however, reproducing the results from that paper was not as straightforward as expected. Thus, we first present the results from the temporal case, and then the sequential case.

4.3.1 Causal structure 2, the temporal case.

In this case, we first perform bWGC between all pairs of nodes in both directions.

We first show bivariate case between IPSc and DLPFc (the longer delay), then

between IPSc and V1 (the shorter delay), and finally between V1 and DLPFc, the

spurious connection, as there is no directly weighted connection between these

two nodes. Given that the conduction delay is pre-set at 10ms between IPSc

and DLPFc, we find correct GC for all lags. Also, since the conduction delay is

pre-set at 2ms between IPSc and V1, we find correct GC in this connection as

well. Finally, spurious GC is shown between V1 and DLPFc for lags higher than

the difference of the delays of the other two connections. That is, the p-values

shift to significance where there is none. We then we show the cWGC between

93 V1 and DLPFc. As in Ding et al., (2006) the spurious GC significance drops to non- significance, correctly revealing no direct connection between V1 and

DLPFc. In this case, each of the nodes contain different dynamics (figure 2). In

the case where they have the same dynamics e.g., all bursting or all spiking, we

find, as in our previous bivariate research, spurious causal relations for lags

lower than the conduction delay.

Figure174.2. Time series for main excitatory state variable Xi of each node showing different intrinsic dynamics. In this case, the IPSc exhibits mainly spiking behavior, meaning this node is in the dominantly inhibitory regime of the parameter space. The DLPFc and V1 both exhibit bursting behavior; however, the DLPFc contains more spikes per burst and V1 fewer spikes per burst. Both of these nodes are in the dominantly excitatory regime of the parameter space.

94 Figure184.3. A) Correct bWGC recovery from IPSc to DLPFc, showing significance beginning at lag 10, the red line. The blue shows no significance in other direction, since there is no directed connectivity from DLPFc to IPSc. B) Correct bWGC recovery from IPSc to V1, showing significance beginning at lag 2. C) Spurious bWGC from V1 to DLPFc, beginning at lag 10. D) The spurious causality has been removed using cWGC.

4.3.2 Causal structure 1, the sequential case.

In this case, we first performed bWGC on all pairs of nodes (shown in figure 3).

Crucially, the global coupling strength was kept the same as in the temporal case, using the linear coupling function, with a low coupling scaling factor of .01.

As in our previous research, the coupling scaling factor (csf) rescales the incoming activity from the causal node before entering the equations of the receiving node. Consistent with our research, Wang et al., 2014 have found that

95 among a variety of causal methodologies, WGC performs better at detecting weaker links. In figure 4, we see that from IPSc to DLPFc, correct causality is recovered at lag 10, likewise from DLPFc to V1. However, interestingly, we found no spurious causality from IPSc to V1, as have been found previously in the papers previously mentioned. We expected, based on known theory about

WGC to find spurious causal influence at lag 20, given that the delay from IPSc to DLPFc, and from DLPFc to V1 are each 10ms.

Figure194.4. A) Correct bWGC recovery from IPSc to DLPFc, showing significance beginning at lag 10, the red line. The blue shows no significance in other direction, since there is no directed connectivity from DLPFc to IPSc. B) Correct bWGC recovery from DLPFc to V1, showing significance beginning at lag 10. C) No spurious bWGC from IPSc to V1, beginning at lag

96 In order to explain what could be responsible for the non-spurious result (C in figure 4), we tested three hypotheses: 1) the difference in intra-nodal intrinsic dynamics, 2) the difference in inter-nodal intrinsic dynamics, and 3) the strength of the global coupling. First, figure 5 shows six other simulations varying the intra-nodal intrinsic dynamics, each of which gives the same result of no spurious

causality.

Figure204.5. Six simulations with the accompanying parameter values for each of the three nodes showing the result of the connection IPSc to V1, red line, and the other direction, the blue line. Each simulation has different parameter configurations allowing for each node to have its own distinct dynamics. Each simulation shows no spurious causal influence from IPSc to V1.

97 We then switched the inter-nodal dynamics between the driver and receiver to see if this would have an effect on the ability to show spurious causality. Figure 6 shows another configuration of the time series exhibiting different dynamics (as distinct from Figure 2). In this case, the IPSc contains bursting behavior with more spikes per burst, and V1 contains spiking behavior.

Figure214.6. Time series showing distinct dynamics. IPSc has bursting behavior and V1 has the spiking behavior.

98 Figure224.7. No spurious causal relation for the inter-nodal difference of dynamics, where the driver and receiver have their dynamics reversed.

Finally, we began to significantly increase the csf to see if there was any effect on the appearance of spurious causality from IPSc to V1. We found that when we began to increase the csf, spurious causal appears at and past lag 20. Figure 8 shows the spurious result with csf = .075. When we performed cWGC to spurious result past lag 20 is removed to non-significance.

Figure234.8. A) Spurious causality from IPSc to V1 at lag 20. B) Using cWGC, the spurious causality disappears.

99 Moreover, we found an additional two results. First, there is a gradual decrease in the significance past lag 20 (ultimately to non-significance) as the csf is gradually reduced to .01. This is shown in Figure 9. Second, raising the csf enough to see spurious significance between IPSc and V1 in turn had an effect on the causal relations between the IPSc and DLPFc, and between the DLPFc and V1. Specifically, producing spurious causality in the non-connection of IPSc and V1, also produced spurious significance in the lowers lag of the other two connections in the network. This is shown in Figure 10.

100

Figure244.9. The gradual decreasing of the coupling scaling factor shows that the spurious causality between IPSc and V1 disappears at a value of .015.

Figure254.10. Increasing the global coupling csf has the effect of producing spurious causality in both directions between IPSc and DLPFc, and between DLPFc and V1.

4.4 Discussion

Using the SJ3D model in The Virtual brain platform, we have already explored

how the intrinsic (to the neural mass) as well as extrinsic (e.g., global coupling

101 and conduction delay) parameters effect the causal relations between two nodes.

Here, we extend the network with a third node and, with the same methodological approach, we use time-domain bivariate and conditional WGC, and attempt to empirically validate the notion of Hans Reichenbach’s ‘screening off’ in probabilistic causation with statistical ablation (conditioning out) from Ding et al., (2006). Our results demonstrate that, given the two causal structures investigated by both Reichenbach and Ding, bWGC produces spurious causality and that cWGC removes the spurious causal relation. However, we found that reproducing these results validates some concerns about the use of functional connectivity in LSBN’s.

We first recapitulate some of the discussion concerning Reichenbach’s screening off from our previous work, and then interpret the meaning of our results. In both causal structures, a regularity theory of causality would falsely infer a causal relationship between IPSc and V1 in case 1, and between V1 and DLPFc in case

2. However, in both cases, a probabilistic notion of causation (PC) conceptually allows for the causal relation to exist. PC has been defined as follows: the cause

C raises the probability of an effect E when P(E|C) > P(E|~C); the probability that E occurring when C has occurred is greater than the the probability of E occurring when C has not occurred. In this case, C is a cause of E, but not necessarily so, i.e., C does not force E to happen. However, this simple version of PC cannot by itself handle the two causal structures, given that, it may the case the both C and E may be caused themselves by another third variable.

102 Reichenbach (1956) recognized this fact and defined an interpretation of probabilistic causation (PC) called screening off, where one variable is said to be screened off by the other.

In the temporal structure case, consider a simpler example in Figure 11 (adapted from Hitchcock, 2018): when the barometric pressure drops below a certain level

(cause C), two events occur, the mercury falls in the barometer (effect E1), and after a short delay, a storm (effect E2). Given that the storm always occurs after the drop in mercury, a regularity theory of causality would spuriously infer that the drop in mercury causes the storm.

Figure264.11. A meteorological example of the temporal case.

Here, Reichenbach’s screening off condition is defined as:

If P(E2| [E1 & C]) = P(E2|C), then C screens off E1 from E2.

In the storm case, the drop in pressure will screen off the barometer reading from the storm, i.e., given the cause (the pressure drop), the barometer reading

“makes no difference for the probability of whether the storm will occur”

103 (Hitchcock, 2018). That is to say, if C screens off E1 from E2, then once C is known, learning about E1 yields no additional information about E2; all the information that E1 has about E2 is already contained in C. In Reichenbach’s interpretation, this notion of screening off addresses, from a PC point of view, the problem of spurious regularities. While E1 raises the probability of E2, it does not when it is further conditioned on C.

In the sequential case, C  E1  E2, E1 is said to be causally between C and

E2. In this case, E1 screens of C from E2. Thus, in the context of our two

NMM’s: in the sequential case, the activity in the DLPFc screens off the IPSc from V1 – even though the activity in IPSc appears to raise the probability of activity in V1, it does not once we further condition on the activity of the DLPFc, i.e., IPSc to V1 is spurious if not conditioned out with cWGC. In the temporal case, the IPSc screens off V1 from the DLPFc – even though activity in V1 appears to raise the probability of activity in DLPFc, it does not once we further condition on the activity of IPSc, i.e., V1 to DLPFc is spurious if not conditioned out with cWGC.

We found that in both NMM’s, screening off is highly dependent on the strength of the global coupling, and like in our previous research, whether the intrinsic dynamics are sufficiently distinct between all nodes in the network. In the temporal case, low coupling was sufficient to produce the spurious causal relation. This is to be expected given that, it is essentially a reproduction of our

104 two node investigation, and there is no mediating node changing the activity. We then found that cWGC removes the spurious causality as expected. In the sequential case, we found that low coupling was not sufficient to produce spurious causality, and that the coupling needed to be significantly increased.

However, in this NMM, increasing the coupling in turn produces spurious causality in both directions in the other two connections.

4.5 Conclusions

In this paper, our main results are two-fold: 1) we have ultimately reproduced well-known results using bWGC and cWGC from Ding et al., (2006), using, for first time to our knowledge, a neural mass model, and 2) we empirically validate the notion of screening off in PC with the conditioning out using the statistically ablative method of conditional Granger causality. Thus, our results seem to be consistent with previous work (see e.g., Wang et al., 2014) in that the optimal circumstances for establishing correct causal relations in large scale brain networks is highly dependent upon the relationship between the method of functional connectivity used and the model from which the data are generated.

105 INTERLUDE: Nonlinear Neurodynamics, Cortical Coordination Dynamics and

Modeling: [Nonlinear Neurodynamics, K-Sets and The Virtual Brain]

The next chapters (5 and 6) are review chapters concerning interpretations of

Walter Freeman, Steven Bressler, and Scott Kelso’s work on wave packets in nonlinear neurodynamics, coordination dynamics, and the impact these theories have had on cognitive science and consciousness research. Freeman’s work is related to our modeling work in several ways: 1) Freeman’s use of K-sets were the precursors to neural mass models used in TVB, and thus our first model in the TVB is a K3-Set; 2) In terms of coordination dynamics and causality, the order parameters are conceptually equivalent to Freeman’s spatial amplitude modulation patters, that is, they both have downward causal effects on their individual components (Tognoli and Kelso, 2011). Finally, one question that must be addressed here, concerns the relationship between some of the research in this dissertation and the following discussion: Is using information theoretic measures to infer causal relations (such as transfer entropy) consistent with understanding the brain not as an information processing system (along with

Freeman), but rather a relational, embodied dynamical system interacting with an external environment? We argue that these are not inconsistent and that information can be construed in several ways (Adriaans, 2012).

106 CHAPTER 5: THE WAVE PACKET IN MULTI-AREA CORTICAL MODELING:

HISTORY, THEORY AND EMPIRICAL EVIDENCE

5.1 Introduction

The various capacities of the human brain to perceive the external world, attend to it, remember, reason, think, learn and problem solve, are thought to emerge from a complex, self-organizing and coordinated interplay of multiple cortical and subcortical brain areas each comprising populations of millions of neurons

[Bressler & Kelso, 2016]. However, the underlying neurophysiological mechanisms by which these large-scale brain networks operate, both locally and globally, remain poorly understood. Although a surfeit of testable hypotheses addresses certain aspects of the problem at the mesoscopic scale, a unified, mathematical theory of neural dynamics, akin to Maxwell’s equations in electromagnetism, has yet to be provided [Kozma, 2007]. In 2003, one of the foremost pioneers of mathematical modeling in 20th century neuroscience, Walter

J. Freeman III, proposed a fundamental, informational processing functional unit at the millimeter scale, addressing neurons acting together as populations.

Called the wave packet, it provides a potential lynchpin for a unified, mathematical theory of mesoscopic neural dynamics. A central feature of the wave packet is its relation to cognition, specifying how brains create meaning via

107 acts of cognition, where meaning refers to context-dependent activity, and the value of sensory stimuli for the subject. Following its discovery in the olfactory system, Freeman proposed that the wave packet is the fundamental “carrier for the components of meaning” in the brain [Freeman 2003].

In this paper, our principal goal is to advocate an extension of Freeman’s theory of wave packet formation and operation, in the context of nonlinear neurodynamics and large-scale cortical modeling. Specifically, we will begin by first detailing the historical context in which Freeman discovered the wave packet. We will then consider Freeman’s wave packet theory, and propose a notion of the wave packet that extends beyond Freeman’s original framework.

Finally, we will discuss, and provide recent empirical evidence for, a physiological mechanism by which the wave packet forms and operates: inter-areal phase coupling, or synchronization, of oscillatory brain rhythms. Synchronization is measured by spectral coherence: when a high level of spectral coherence is observed between the oscillatory activity of two different neuronal populations, it indicates that they are highly synchronized, or phase-coupled.

5.1.1 What is the Wave Packet?

The wave packet results from a population of millions of interacting cortical neurons oscillating in synchrony at a high temporal frequency. The synchrony derives from mutually excitatory synaptic interactions across the population. The

108 oscillation, called a carrier wave, and observable in the local field potential (LFP),

electrocorticogram (ECoG), intracranial electroencephalogram (EEG), scalp

EEG, and magnetoencephalogram (MEG), is usually described by its temporal

characteristics, such as frequency, amplitude and phase. However, the concept

of wave packet derives from the spatial extension of the oscillation across the

generating population. Because the carrier-wave oscillation is spatially coherent

over all the neurons of the population, the entire event is called a wave packet.

The high spatial coherence of the wave packet cannot be attributed to the

reference-electrode or to distant-generator activity that reaches the recording electrodes by volume conduction. It also cannot be explained by driving from an external pacemaker. Although the event is spatially coherent, it shows spatial modulation of its amplitude and phase, meaning that the oscillation amplitude and phase (relative to the spatial mean) vary across space. The spatial extent of the wave packet is on the order of millimeters, meaning that very high-density multichannel electrode arrays (providing 4 x 4 mm to 6 x 6 mm spatial windows on cortical population activity at the pial surface) must be used to record it

(Panagiotides et al. 2011). Although the wave packet is defined in terms of synchronized waves of neuronal population activity that are primarily of dendritic origin, it is relevant that neuronal spiking activity is high in synchronized neuronal populations such as those supporting the wave packet (Elul 1972), and thus the wave packet may be coextensive with a zone of active spiking (Roland in press).

109 5.2 The History, Motivation and Theory of the Wave Packet

Freeman originally observed the wave packet in local field potentials (LFPs) from

the olfactory system, in structures that included the olfactory bulb, the anterior

olfactory nucleus, and the prepyriform cortex. Oscillatory olfactory LFP bursts in

the gamma frequency range (Fig. 1a) “ride” on the inspiratory phase of a theta-

frequency respiratory rhythm. Measured with eight-by-eight electrode arrays

(approximately 5 mm wide) on the pial surface, the gamma-frequency LFP reveals the wave packet as a spatially coherent event, reflecting activity distributed across the multiple channels (the 64 electrodes) of the array. The LFP oscillations recorded from different electrodes in the array (Figure 1b) are similar in frequency, but the oscillation amplitude and phase vary across the electrode locations. The similarity of gamma oscillations across the electrode array is due to “neural cooperation” that originates from excitatory synaptic interactions of the underlying neurons. The amplitude variation across the array is known as the spatial amplitude modulation (SAM) pattern, and is represented in the contour plots shown in Fig. 1b.

Interestingly, although the SAM pattern is not unique to each conditioned stimulus, it is unique to the organism receiving the stimulus, and it varies according to a number of different psychological variables. As Freeman [2003] states,

110 These AM patterns lack invariance with respect to the stimuli. They

change with context, manner of reinforcement, conditioned response,

cumulative experience with the same stimuli, and state of expectancy.

In combination these contributions relate to the meaning of stimuli,

which mediates the effective integration of the organism with its

environment.

Bressler [2007] makes the further point that

The spatial pattern of amplitude modulation (AM) of the wave packet in

sensory cortices has been found to correlate with the categorical

perception of conditioned stimuli in the corresponding sensory modality.

The spatial AM pattern changes reliably with behavioral context,

conditioning, and the animal’s cumulative experience of the sensory

environment …

The contour plots of Fig. 1b, then, graphically represent wave packets which carry context-dependent information, or behavioral saliency. Moreover, these plots resemble a movie frame, in that they do not vary randomly from one inspiration to the next, but rather change in a continuous, temporally coherent and meaningful fashion. Freeman’s conception of an orderly progression of spatial patterns is referred to as the cinematographic hypothesis of cortical dynamics [Freeman 2006]. To summarize, the wave packet is a spatially coherent pattern of high-frequency neuronal population activity, which, like a movie frame, carries spatially coherent information that may be gleaned by 111 analysis of the LFP waveform. The question still remains: what is the physiological mechanism by which the wave packet carries the components of meaning? We discuss this question in the next section.

5.2.1 Freeman’s Neural Pattern Formation of Meaning

Here we discuss Freeman’s idea that meaning is constructed in the mammalian brain as the organism controls its movements in the environment by the action- perception cycle. Freeman postulated that wave packets arise in cortical areas as part of this process. A profusion of recent research in the science of complex systems addresses the significance of neural pattern formation, which is central to Freeman’s neural theory of meaning construction. However, most pattern formation studies assume that physical patterns in the external world bear information that is “mentally represented” by neural patterns in the brain.

Although it is certainly the case that brains process information from the environment through sensory transduction, we follow Freeman in his rejection of a passive view of sensory perception, which holds that patterns in the external world are mapped onto neural patterns in the brain, and that the brain creates neural representations of patterns in the external world [Freeman, 2007].

Alternatively, like Freeman, we hold that neural patterns are instead created in the brain by nonlinear dynamical interactions among neuronal populations, as part of the process by which the brain constructs meaning [Freeman 2007]. The creation of meaning is observer-dependent, and driven primarily by endogenous cortical activity rather than by the incoming stimulus. In this view, neural patterns

112 are not based on representation, but instead arise naturally in the brain as it makes sense of its environment.

In predictive coding, for example, the brain creates meaningful neural patterns as predictions about the sources of external stimuli [Bressler &

Richter, 2015]. In this regard, Freeman [1983] states that “… the spatial pattern of encephalographic (EEG) activity of the olfactory bulb depends less on odor than on expectation of odor (emphasis ours)”.

He also says that ‘‘rabbits smell what they expect, not what they sniff” [Freeman

1983].

Freeman expressed his position in a schematic diagram illustrating the neural basis of meaning construction (Figure 2). Here, in a revised schematic based on

Freeman’s original (Figure 3), we illustrate the creation of meaning in the brain via interacting brain areas. The main interactions that Freeman saw as critical for meaning creation are: the space-time loop, the context loop, the control loop, and the motor loop. These loops act in concert to create meaning as the organism evolves through time through the four processing stages of the action-perception cycle (motor output, pre-stimulus, stimulus input, and pre-motor, each with the appropriately highlighted arrows and boxes).

113 (1) space-time loop: the hippocampus and entorhinal cortex interact throughout

the entire action-perception cycle to provide spatial and temporal context to the creation of meaning. The entorhinal cortex, serving as the main interface between the hippocampus and the sensory systems, integrates sensory information and internal neural activity, and sends integrated activity back to the

sensory cortices. Recently, López-Madrona et al. [2017] have confirmed the

central role of the entorhinal cortex, finding that the bidirectional information flow

between the entorhinal cortex and different hippocampal regions, shown here in

the space-time loop, is strongly dependent on the particular structural

connectivity patterns within the layers of the entorhinal cortex.

(2) context loop: the entorhinal cortex interacts with the sensory systems mainly

during the anticipation of a sensory stimulus and its processing. This loop

provides sensory context for the creation of meaning, and allows the organism to

anticipate, and compensate for, the effects of action on sensory inflow.

(3) control loop: in pre-motor preparation and planning for action, and during the

action itself, the entorhinal cortex interacts with the motor systems. This loop imbues actions with meaning, and facilitates the control of bodily movement.

(4) motor loop: actions are performed in the environment in this loop. The motor

systems send information to the effectors in the body, allowing for the execution

of movement by controlling muscle movement.

114 Finally, in our view, the creation of meaning in the brain involves the consensual emergence of wave packets in interacting neocortical areas, and their linkage by synchronization. We emphasize that, since wave packets are comprised of spatially coherent LFP amplitude modulation patterns, it follows that meaning is a spatiotemporal pattern of the brain.

5.3 Interacting Neocortical Areas in Cognition

Following upon Freeman’s work, Bressler [2007] proposed that, in cognition, wave packets emerge in circumscribed areas of the neocortex, as they do in paleocortex during odor processing. Neocortical areas interact over reciprocally directed axonal pathway [Felleman & Van Essen 2001; Bressler & Menon 2010].

This bidirectional architecture gives rise to reentrant interactions [Tononi et al.

1998] that support the emergence of global neurocognitive state [Bressler 1999].

We propose here that each neocortical area generates a sequence of wave packets during cognition, and that their SAM patterns are mutually constrained in the process. These SAM patterns carry context dependent information, allowing the neocortical wave packet to change reliably with changes in behavioral context.

Our theory of wave packets in the neocortex thus entails at least three elements:

1) the concurrent formation of wave packets in multiple interconnected cortical

115 areas; 2) the interaction of connected cortical areas over bidirectional fiber tracts;

and 3) the modulation of wave packet formation in each cortical area. In short,

we propose that interacting neocortical areas exert mutual constraint on their

wave packets. Through interaction, the information expressed by the wave

packet in a cortical area is constrained [Thagard & Verbeurgt 1998] by influences

transmitted from other cortical areas. In this way, it is proposed that wave

packets cognitively cohere to satisfy the constraints imposed on each cortical

area by influences from all interacting areas [Bressler 2007].

Mutual constraint satisfaction on neocortical SAM patterns has the following

consequences: the set of all possible SAM patterns is reduced, and the extent of

possible pattern activity is restricted [Bressler 2007]. Perhaps the most important

result of mutual pattern constraint satisfaction is that it provides meaning: the

wave packet produced by a cortical area in this process expresses meaning, an

expression that depends on the relation of that area with all the other areas with

which it interacts. The meaning expressed by each cortical area is thus

expressed within the context of all the others to which it interacts, and that

context guarantees informational consistency: the modulation of wave packet

formation in each cortical area by inputs from all interacting areas constrains the meaning expressed by the wave packet to be consistent with the wave packets in those other areas. The creation of informationally consistent wave packets in multiple distributed cortical areas is equivalent to the emergence of global

116 neurocognitive state in the neocortex. The physiological mechanism by which

these interactions take place is considered below.

In the next section, we will address inter-areal oscillatory phase coupling as a

proposed physiological mechanism by which neocortical areas interact to

produce a global neurocognitive state. The linkage of neuronal populations in

connected cortical areas is proposed to occur by inter-areal phase synchronization. This phenomenon is explained by cortical coordination dynamics using extrapolation of an equation for a coupled oscillator system to multiple coupled oscillators [Bressler & Kelso 2016].

5.4 Cortical Coordination Dynamics

Coordination dynamics describes the way that entities in self-organizing systems

work together in time. Evidence suggests that coordination dynamics operates in

the neocortex by the joining together of cortical areas through inter-areal phase

coupling of oscillatory neuronal population activity [Bressler & Kelso 2016]. The

linkage of cortical areas is observed to be intermittent, i.e., it rapidly switches

back and forth in time between states of phase coupling and decoupling

[Freeman 2006]. Intermittency is often seen in complex systems as a sign of

“relative coordination” [von Holst 1939/1973], best described by the dynamic

property of metastability [Tognoli & Kelso 2014; Bressler & Kelso 2016].

Metastability refers to the property of a system of coupled oscillators whereby it

117 tends to settle into a stable coordination pattern but is unable to do so. Instead,

the system “hovers” in unstable patterns near the stable coordination pattern.

The observation by Freeman [2006] that wave packets recur intermittently at

rates in the theta frequency range suggests that phase coupling of multi-areal

cortical neuronal population activity alternates with phase decoupling, with the

cortical system normally operating in a metastable state.

The oscillation frequency of the phase coupling that links cortical areas in large-

scale networks is an important issue. It has been demonstrated [Salazar et al.

2012] that prefrontal and posterior parietal areas of the macaque monkey are

linked by phase coupling in the beta frequency range during working memory

storage. Moreover, the degree of beta-frequency phase coupling that was

observed depended on the sample stimulus employed. It remains to be

determined the degree to which phase-coupled oscillations in all frequency

ranges contribute to working memory storage in the human fronto-parietal

system [Stam et al. 2002].

5.5 Inter-areal Beta-Frequency Phase Coupling

Neocortical beta (13-30 Hz) rhythms have been observed in the brain in relation to attention, working memory, anticipation, learning, and visual perception [Zheng et al, 2015; Howe et al, 2011; Sturman et al, 2011; Gola et al, 2013], and it has been proposed that they play a general functional role in the neocortex by signaling the “status quo” [Engel & Fries 2010]. The most highly studied beta

118 rhythm is that in the motor system, where it is thought to play an important functional role in maintaining continuous motor output, such as occurs during maintained grasp [Babapoor-Farrokhran 2017; van Wijk et al, 2012]. In addition to the motor system, beta oscillations have also been reported in the olfactory, somatosensory, and visual systems. As we propose above, cortical areas that engage in the co-expression of wave packets are thought to be linked via long- range phase synchronization in the beta frequency range, while others that are not engaged, are not linked.

It has been proposed that the observed phenomenon of top-down directed beta- frequency phase coupling provides evidence for predictive coding in the neocortex (Bressler & Richter 2015). Top-down processing occurs when high- level neuronal populations send modulatory signals to low-level populations via axonal pathways. Recent evidence indicates that high-level populations in distributed large-scale networks of neocortical association areas may deliver behavioral context to low-level sensory areas (Richter et al. submitted). Two predictions about predictive coding derive from the thesis being proposed here.

First, distributed high-level neuronal populations interact by phase coupling of their deep-layer beta-frequency LFPs, leading to the consensual expression of meaning by the superficial-layer gamma-based wave packets in the populations of this collection. Second, beta oscillatory inputs delivered to the deep layers of low-level sensory populations constrain the gamma-based wave packets in those populations to be consistent with the behavioral context being conveyed.

119 Evidence for inter-areal beta-frequency phase coupling originally came from coherence analysis of LFPs recorded in the macaque monkey neocortex. In one study, a large-scale cortical network was observed in the sensorimotor cortex during the foreperiod of a visual discrimination task when the monkey was pressing a response lever to indicate engagement in the task [Brovelli et al.

2004] (see figure). The network was found to be formed by the phase coupling of somatosensory and motor LFP oscillations in the beta frequency range. In a second investigation, Salazar et al. [2012] likewise observed necortical phase coupling in the beta frequency range. In this study, phase coupling occurred between LFPs from the posterior parietal and prefrontal macaque neocortex during the delay period of a visual delayed match-to-sample task when working memory was required (see Figure 4). Thus, both studies provide evidence for large-scale phase coupling of neocortical neuronal population activity in the beta- frequency range in relation to cognition.

Subsequent work has suggested a mechanism for the interaction of beta and gamma oscillations across the layers of the neocortex. First, a recent modeling study [Lee et al. 2013] of top-down attention suggests that beta-frequency oscillations in deep layers of the neocortex induce rhythms in the gamma- frequency range in superficial layers. Although the exact laminar mechanism by which deep beta rhythms interact with superficial gamma rhythms is unknown, it appears that the linkage of the deep layers of distant neuronal populations in the neocortex by long-range beta-frequency synchronization may be responsible for

120 mutual pattern constraint of gamma-frequency-based wave packets in the

superficial layers. Furthermore, recent experimental evidence supports the

finding that beta synchronization is restricted to deep cortical layers (Bastos et al.

2015).

The existence of wave packets in the human neocortex was demonstrated by a recent study [Panagiotides et al. 2011] using microgrid ECoG recording from the

cortical surface of a human epileptic subject. It demonstrated local and context-

dependent, high spatial-frequency patterns in the anterior inferior temporal gyrus

associated with (four) distinct human behaviors, including reading, talking on the

telephone, looking at photographs, and face-to-face interaction. High spatial-

frequency ECoG patterns from this region, corresponding to wave packets,

distinguished the behavioral states.

5.6 Conclusions

The goal of this paper is to propose that meaning for an organism depends on

the expression in the neocortex of a distributed set of wave packets that, like in

the olfactory system where they were discovered by Freeman, consist of gamma-

frequency oscillations that are coherent on a millimeter scale and spatially

modulated in amplitude across their domain of cooperative neuronal activity.

Extrapolating from Freeman’s original work in the olfactory system, we propose

that wave packets exist throughout the entire neocortex. We further propose that

121 the combined information content of this distributed set of wave packets, existing in superficial cortical layers, is mutually constrained by deep-layer phase synchronization in the beta frequency range. In short, the multi-area expression of informationally consistent gamma-frequency wave packets, constrained by inter-areal beta-frequency phase coupling, is a potential mechanism for producing meaning in the neocortex.

122 CHAPTER 6: FREEMAN’S NONLINEAR BRAIN DYNAMICS AND

CONSCIOUSNESS

6.1 Introduction

Consciousness is not a good place to start a theory of brain functions, because there is no biological test to prove whether consciousness is present in a supine subject other than to ask “Are you now or were you ever awake?” – Walter

Freeman in How Brains Make Up Their Minds

We know not through our intellect but through our experience. Consciousness is the very being of mind in action. – Maurice Merleau-Ponty

The past forty years have seen tremendous growth in the search for a unified theoretical framework of spatiotemporal brain dynamics; that is, a well-tested empirical and mathematical theory, capable of explanation, description and prediction of neuronal activity at all scales of inquiry. What would a unified theory of brain dynamics possibly look like, and what could it do? Ultimately, it would presume to causally link neuronal dynamics and cognitive processes, and might possibly shed light on the subjective experience of consciousness. Though these questions are still unanswered, work on such a theory began with Walter

123 Freeman III and his study of the nonlinear neurodynamics of neuronal oscillations. Freeman provided one of the first applications of techniques and methods from physics and mathematics to the brain and its functions.

Throughout his career, Freeman brought together and employed concepts from these disciplines, including nonlinear dynamical systems theory, and related concepts of self-organization, pattern formation, emergence, and phase transitions. He also utilized sophisticated neural time series analysis techniques.

And, in a rare historical moment where empirical scientific work leads to insight about deep, unresolved philosophical problems, Freeman’s neurophysiological work led him to draw powerful philosophical conclusions about the nature of meaning, intentionality, and consciousness. His scientific journey led to a number of seminal books, including Mass Action in the Nervous System (1975) and How Brains Make Up Their Minds (2000). The former explores the neural mechanisms that produce the electroencephalogram (EEG), the major neuroimaging technique of the time, and the latter is a discussion of how the empirical observations he made led him to a scientific and philosophical theory of consciousness, meaning and personal freedom.

Bressler’s theory of neurocognitive networks is a direct extension of Freeman’s nonlinear neurodynamics in the field of cognitive neuroscience. Neurocognitive networks are large-scale systems of interconnected neuronal populations organized to perform cognitive functions (Luria 1966; Bressler 2008).

124 Neurocognitive networks operate according to the principles of nonlinear neurodynamics expounded by Freeman.

Paralleling these developments was the pioneering work by Kelso and his colleagues, establishing the field of coordination dynamics, a broad theoretical framework furnishing an understanding of coordinated behavior in and among living things. Although many of the same concepts found in Freeman’s nonlinear neurodynamics are also found in Kelso’s theory, Kelso has emphasized topics that are different from Freeman, including multistability, metastability, complementarity, synergetics, and relative phase (Kelso 2008).

Freeman’s and Kelso’s overlapping historical threads of theoretical work have had a major impact on brain our modern understanding of dynamics in cognitive neuroscience, and Bressler’s empirical evidence for coordination between cortical areas in large-scale brain networks by interareal oscillatory phase coupling of beta rhythms fuses many of the intertwining concepts.

What impact have these theoretical schemas had on the scientific study of consciousness? In this study, cognitive terms like “mind”, “thought”, “mental processes”, and “mental representations” are somewhat ambiguous, and therefore lose some of their potency. Different theories of consciousness compete for adherents, paradigm shifts occur, and a surfeit of data is readily available for analysis, but the results often lack interpretation. By emphasizing

125 the temporal aspect of consciousness, its grounding in the brain, and the brain’s embodiment, Freeman’s neurodynamical conceptual scheme, shared to a degree by Bressler and Kelso, has made major contributions to our understanding of consciousness. In An Essay on Understanding the Mind, Kelso states that

“Minds, brains, and bodies, yours and mine, immersed as they are in their own worlds, both outside and inside, share a common underlying dynamics.” We argue here that Freeman’s emphasis on brain dynamics and embodiment has important implications for the scientific study of consciousness.

Freeman viewed materialism and cognitivism as being fundamentally incapable of explaining his observations of the spatial amplitude modulation (AM) patterns of wave packets (see below) in the olfactory system. For him, materialism identifies the mind with neural patterns of energy or matter in the external world, which are transduced by the peripheral and central nervous systems into neural flows. Cognitivism, or computationalism, proposes that the mind operates by a syntactic language of thought consisting of symbols, consisting of mental representations (providing the semantics) computed by the brain in a manner similar to a Von Neumann computer. Freeman answered Von Neumann’s well- known statement that whatever the language of the brain is, it is not mathematics, by saying that the brain does not have a language at all; it is not a logical device. Cognitivism treats the mind as a reflexive, stimulus-response device. Freeman’s work moved him away from this view into what he called the pragmatic view, where the brain is a dynamical system, and the mind is an

126 emergent, self-organizing, embodied, process existing in relationship with the external world. This pragmatic view partially originates from, and fits well with, the philosophical school of phenomenology, and, as we shall see, Freeman traces its philosophical line to Thomas Aquinas, John Dewey, and Maurice

Merleau-Ponty.

In this essay, our aims are threefold: 1) to explore relevant concepts in

Freeman’s rich theoretical framework, 2) unify it and provide empirical support for it, and 3) explore the import it may have for the study of consciousness. We will take as our point of departure the recent publication on cortical coordination dynamics by Bressler and Kelso (2016). We will then outline Freeman’s nonlinear neurodynamics and discuss different aspects of this work, including theoretical background, empirical findings, computational modeling, and how these are being engaged in modern research. We will then draw parallels between Freeman and Kelso’s work, discussing the Freeman-Kelso dialogue, while highlighting the importance of various concepts in cognitive and computational neuroscience. In our view, both Freeman’s and Kelso’s conceptual frameworks have great explanatory power, and, with contributions by Bressler, are grounded in a dynamical systems approach not only to the brain, but also to the mind, and to consciousness. Thus, we argue, along with the Freeman, that a theoretical explanation of consciousness must move beyond computationalism, and employ concepts of nonlinear brain dynamics rather than mental representation. Finally, this discussion will allow us to speculate on what import

127 Freeman’s work has on understanding consciousness, particularly his interpretation of Thomas Aquinas on intentionality and meaning, all the while underscoring an embodied neurodynamical approach to consciousness.

6.2 Cortical Coordination Dynamics: building a framework

Coordination dynamics began as a quantitative research program within the burgeoning field of complex systems in the 1980’s, and proposes to theoretically explain the reciprocated, coordinative movements and behavior in and among living entities. While these entities can be movements of fingers (Kelso 1984), human limbs (Kelso 1986), swarms of birds (Miles 2011), people acting within economies (Oullier 2008) or social situations (Oullier 2009), cortical coordination dynamics examines neuronal coordinative structures in the cerebral cortex

(including paleocortex and neocortex), and describes the way that single neurons or neuronal populations self-organize over time. There is now a plethora of evidence that the principles of coordination dynamics, such as emergence, symmetry breaking and metastability, operate in the cerebral cortex by the nonlinear coupling of cortical areas through inter-areal phase synchronization of oscillatory neuronal population activity (Bressler & Kelso 2016). Synchronization is often seen in complex systems as a sign of “relative coordination” (von Holst

1939/1973). The relative coordination of cortical areas is observed to be intermittent, rapidly switching in time between states of phase coupling and decoupling (Freeman 2006). Both the degree and strength of relative coordination changes over time, as the cortical system enters and exits different

128 coordination states. It has been observed (Freeman 2006) that wave packets recur intermittently at rates in the theta frequency range. The intermittency of cortical dynamics, as the alternation between phase coupling and decoupling of multi-areal neuronal population activity, suggests that the coordination states are best described by the dynamic property of metastability (Bressler & Kelso 2016).

Metastability is the property of a system of coupled oscillators, whereby it tends to settle into a stable coordination pattern but is unable to do so. Instead, the system remains in an unstable pattern near the stable coordination pattern. Thus, cortical metastability produces the simultaneous tendency for cortical areas to remain segregated, manifesting their own intrinsic activity, and to be integrated, influencing each other by reciprocal coupling. Metastability results from the broken symmetry in an important pillar of coordination dynamics, the extended

Haken-Kelso-Bunz (HKB) model. The model describes the relative phase synchronization between two nonlinearly coupled nonlinear oscillating entities. In cortical dynamics, each oscillating neuronal population has its own intrinsic frequency, and thus the difference between the intrinsic frequencies of coupled populations is non-zero. The most relevant variable that is observed, in this case the relative phase, is referred to as the order parameter, and it operates in a circular casual fashion (see below) with the individual parts of the system.

The relative phase of coupled cortical populations is often measured between local field potentials (LFPs) recorded from those populations. The cortical LFP, which originates from the synchronous extracellular postsynaptic dendritic

129 currents of a cortical population, is enormously helpful for understanding cortical coordination dynamics since cortico-cortical transmission occurs at the population level, not at the level of single neurons (Bressler 2016). Of course, the

LFP itself is not transmitted between areas, but is a convenient signal for measuring relative phase in place of the population’s axonal pulse activity, which is transmitted but often cannot be measured (Bressler 2016). The relative phase of LFPs from cortical areas often serves as an order parameter that is explained by the extended HKB model, and supports the existence of metastability in cortical systems.

Cortical LFP studies show that the cortex remains flexibly coordinative by maintaining a state of metastability during cognitive processing (Bressler & Kelso

2016). The phase coupling between cortical areas in neurocognitive networks of the macaque monkey has been demonstrated to link LFP oscillations in the beta frequency range (Brovelli et al. 2004; Bressler et al. 2007; Salazar et al. 2012;

Richter et al. 2016), and a code for the content of working memory in the macaque monkey has been found in the strength of fronto-parietal beta- frequency synchronization (Salazar et al. 2012). Thus, monkey working memory storage may depend on coordinated fronto-parietal beta oscillations. It is interesting in this regard that the frontal eye fields and the anterior cingulate cortex have recently been demonstrated to be coordinated in the monkey by coupled oscillations in the beta (and theta) frequency ranges (Babapoor-

Farrokhran et al. 2017), suggesting that multi-frequency inter-area cortical

130 coordination may be a general mechanism of cognitive function. However, it still remains to be determined whether working memory (Stam et al. 2002) actually depends on the coordination of oscillations in the cortex.

In sum, metastability allows the interdependence and independence of cortical areas to be balanced as they coordinate to form neurocognitive networks. The comprehensive description of neurocognitive network dynamics by metastability is comparable to other theoretical explanations of neural flexibility, such as chaotic itinerancy [Tsuda 2001], self-organized criticality [Bak et al. 1988], and structured flows on manifolds [Woodman 2011]. It is also compatible with the global workspace theory of consciousness [Baars 2003].

6.2.1 The emergence of unified neurocognitive state

Following Freeman’s work, and in accord with the global workspace hypothesis,

Bressler (2007) has proposed that Freeman’s wave packets are found in different areas of the neocortex, as they are in different areas of paleocortex, such as the olfactory cortex (Mannino & Bressler 2017). Experimental evidence on neocortical wave packets thus far is scanty, leading Bressler to observe that

“since the spatial extent of coherent domains can exceed the size of the primary sensory areas, it remains an open question as to what degree the coherent domains are confined by the boundaries of cortical areas as defined anatomically” (Bressler 2007). Nonetheless, there is much evidence that neocortical areas are capable of generating spatially coherent patterns of neuronal population activity. Moreover, the overwhelmingly bidirectional

131 architecture of axonal pathways in the brain allows for reentrant interactions between neocortical areas (Tononi et al. 1998). Reentrant processing is central to the emergence of unified neurocognitive state (Bressler 1999), and we infer that neocortical areas interact with other areas through reentrant processing to mutually constrain their wave packets (Mannino & Bressler 2017).

What does it mean for the wave packets of interacting neocortical areas to be mutually constrained? Each neocortical area is postulated to express meaning by the current state of its wave packet, and the cognitive context for that expression is provided by its interactions with other areas. The modulation of wave packet formation in each neocortical area by inputs from all interacting areas thus constrains the meaning expressed by an area’s wave packet to be consistent with that of the wave packets in all other interacting areas. The consistency that results from the formation of cognitively consistent wave packets in multiple distributed cortical areas allows the emergence of a “global neurocognitive state” in the brain that provides the neural basis for cognitive state. Reentry between interacting cortical areas unifies the meanings expressed by their wave packets, and this unified meaning may be interpreted by the organism as the meaning of its experience.

What we call the global neurocognitive state is crucially dependent on reentrant processing (Edelman & Gally 2013). Those authors note that reentry is the central mechanism in the brain that allows for its vast functionality, and we argue

132 that it may be the central mechanism allowing for the formation of meaning via distributed interdependent wave packets. Interactions between cortical areas are measured by their degree of LFP phase synchronization, and a common technique for measuring phase synchronization is called spectral coherence.

Interestingly, Thagard (1998) has advocated a cognitive constraint satisfaction explanation of cognitive coherence as being essential for the formation of meaning. By providing a neurodynamic explanation for cognitive constraint satisfaction, our formulation follows Freeman in yielding a neural explanation for meaning formation in the brain.

6.3 Freeman’s Nonlinear Neurodynamics

Freeman’s theory of nonlinear neurodynamics, was gleaned from his neurophysiological research. In this section, we briefly discuss three important highlights of his work: his discovery of wave packets in the mammalian paleocortex, his theoretical inclusion of chaos theory to explain brain activity, and his concept of K-sets for computational brain modeling.

First, the wave packet results from a population of thousands to millions of interacting cortical neurons oscillating in synchrony at a high temporal frequency.

The synchrony derives from mutually excitatory synaptic interactions. The oscillation, called the carrier wave, and observable in the local field potential

(LFP), electrocorticogram (ECoG), intracranial electroencephalogram (EEG),

133 scalp EEG, and magnetoencephalogram (MEG), is described by its oscillatory characteristics: frequency, amplitude and phase. However, the concept of the wave packet derives from the spatial extension of the oscillation across the generating population: since the carrier-wave oscillation is spatially coherent over all the neurons of the population, the entire event is called a wave packet. The high spatial coherence of the wave packet cannot be attributed to the reference- electrode or to distant-generator activity that reaches the recording electrodes by volume conduction. It also cannot be explained by driving from an external pacemaker. Although the event is spatially coherent, it shows spatial modulation of its amplitude and phase, meaning that the oscillation’s amplitude and phase

(relative to the spatial mean) vary across space. The spatial extent of the wave packet is on the order of millimeters, meaning that small, very high-density multichannel electrode arrays (providing at least 4 x 4 mm to 6 x 6 mm spatial windows on cortical population activity at the pial surface) must be used to record it (Panagiotides et al. 2011). Although the wave packet is defined in terms of synchronized waves of neuronal population activity that are primarily of dendritic origin, it is relevant that neuronal spiking activity is high in synchronized neuronal populations such as those supporting the wave packet (Elul 1972), and thus the wave packet may be coextensive with a zone of active spiking (Roland in press).

Following its discovery in the olfactory system, Freeman proposed that the wave packet is the fundamental “carrier for the components of meaning” in the brain

(Freeman 2003). This concept is closely related to Freeman’s concept of

134 intentionality and freedom, which we address below. Freeman originally observed the wave packet in LFPs from the olfactory system, in structures that included the olfactory bulb, the anterior olfactory nucleus, and the prepyriform cortex. Oscillatory olfactory LFP waves in the gamma frequency range (Fig. 1a)

“ride” on the inspiratory phase of a theta-frequency respiratory rhythm. Measured with eight-by-eight electrode arrays (approximately 5 mm wide) on the pial surface, the gamma-frequency LFP reveals the wave packet as a spatially coherent event, reflecting activity distributed across the multiple channels (64 electrodes) of the array. The LFP oscillations recorded from different electrodes in the array (Figure 1b) have nearly identical waveform and frequency, but the oscillation amplitude and phase vary across the electrode locations. The similarity of gamma oscillations across the electrode array is due to “neural cooperation” that originates from excitatory synaptic interactions of the underlying neurons. The amplitude variation across the array is known as the spatial amplitude modulation (SAM) pattern, and is represented in the contour plots shown in Fig. 1b.

Interestingly, although the SAM pattern is not unique to each conditioned stimulus, it is unique to the organism receiving the stimulus, and it varies according to a number of different psychological variables. As Freeman (2003) has stated,

These AM patterns lack invariance with respect to the stimuli. They

135 change with context, manner of reinforcement, conditioned response,

cumulative experience with the same stimuli, and state of expectancy.

In combination these contributions relate to the meaning of stimuli,

which mediates the effective integration of the organism with its

environment.

Likewise, Bressler (2007) has pointed out that

The spatial pattern of amplitude modulation (AM) of the wave packet in

sensory cortices has been found to correlate with the categorical

perception of conditioned stimuli in the corresponding sensory modality.

The spatial AM pattern changes reliably with behavioral context,

conditioning, and the animal’s cumulative experience of the sensory

environment …

The contour plots of Fig. 1b, then, graphically represent the SAM patterns of wave packets, which are context-dependent events having behavioral saliency.

The SAM patterns resemble movie frames in that they do not vary randomly from one inspiration to the next, but rather change in a continuous, temporally coherent and meaningful fashion. Freeman’s conception of an orderly progression of spatial patterns is referred to as the cinematographic hypothesis of cortical dynamics (Freeman 2006). To summarize, the wave packet recorded from a cortical area is a spatially coherent pattern of high-frequency neuronal population activity, which, like a movie frame, carries spatially coherent

136 information that may be gleaned by analysis of the LFP waveforms from that area.

Brain dynamics, according to Freeman is chaotic, and brain chaos helps to explain, in more than metaphorical terms, the process by which the brain rapidly adapts to, and organizes, incoming stimuli. For this to happen, the brain must rapidly switch patterns based on small changes in incoming stimuli. A chaotic system is mathematically defined as one that exhibits a bounded regime, determinism, aperiodicity, and sensitive dependence on initial conditions.

Although it has not yet been determined whether brain dynamics adheres to all of these properties, cortical activity must have extremely high dimensionality (if each neuron is one state variable, then the state space has 10’s of billions of dimensions). If the cortex is chaotic, its chaos may be high dimensional, and thus the cortex may be indistinguishable from a purely stochastic system (Mannino &

Bressler, 2015). For Freeman, the brain traverses an “attractor landscape” which is both globally stable and locally unstable at the same time (Freeman & Skarda,

1987). He claims that this delicate chaotic balance is necessary for humans to survive in an ever-changing external environment, and arises naturally from nonlinear neurodynamics. According to Freeman, the brain is a nonlinear self- organizing feedback system existing “on the edge of chaos”, and undergoing rapid phase or state transitions. Having chaotic dynamics allows the brain to be prepared to adapt to unpredictable changes. The spatiotemporal patterns of neuronal dendritic potentials are oscillatory, and they constantly reorganize

137 themselves, creating, and updating, new chaotic attractor landscapes. This

phenomenon has been noted before, in discussions of self-organized criticality,

metastability, and chaotic itinerancy, among others (Bak, 1987, Bressler & Kelso,

2016, Tsuda, 2013, Freeman, 2003).

Chaotic brain dynamics may be capable of supporting consciousness, at least in

the “soft problem” sense (see discussion below). To understand this claim, we

must give a brief description of Freeman’s computational modeling of

neurodynamics, which dates at least back to his use of a set of deterministic

nonlinear second-order differential equations (named the K-set in honor of

Aharon Katchalsky) to replicate the EEG (Freeman & Skarda 1987). K-sets form

a hierarchy reaching across multiple scales. Freeman modeled the K0 set as a

group of neurons that have the same sign (excitatory or inhibitory) but do not

interact. The KI set is a group of interacting K0 sets, that can be either excitatory

or inhibitory. The KII set is a mixed population of excitatory and inhibitory

neurons that produces limit cycle oscillations in the gamma range via negative

feedback. KIII sets, consisting of multiple KII sets, model a full sensory system.

Not only do KIII sets produce chaotic dynamics, they are also capable of various

functions including, learning and memory. Finally, the KIV set consists of at least

three interacting KIII sets, and models an entire cortical hemisphere, exhibiting intentionality. An interesting question that we will address is whether the KIV set is conscious.

138 6.4 Comments on the Freeman-Kelso Dialogue

We comment here on some of the similarities and differences between the views of Freeman and Kelso and colleagues on nonlinear brain dynamics. Like

Freeman, Kelso and colleagues have been motivated by the mathematics of dynamical systems. Kelso’s theory of coordination dynamics begins with empirical observations of motor acts and theoretical studies of nonlinear dynamical systems. As described above, his theory has been extended to explain the interaction of coupled oscillators, and may explain the coupling of brain regions (Bressler & Kelso 2001, 2016). By contrast, Freeman’s theory originated in electrophysiology, and he came to nonlinear dynamics some time after his empirical work in the olfactory system. Although the two approaches differ in their starting point, both end up offering valuable insights into the nature of the mind. Both approaches feature the study of the nonlinear coupling of neural populations, as well as the notion of broken symmetry. However, a central difference between them centers around Kelso’s notion of the order parameter.

For Kelso, the order parameter is relative phase, whereas for Freeman it is the spatial amplitude modulation pattern of the wave packet. As Kelso & Tognoli

(2009) observe, “It is these amplitude patterns of aperiodic carrier waves…that constitute his (Freeman’s) order parameter.”

Among the similarities of the two approaches is the prominence of state transitions: in the evolution of relative phase in Kelso’s viewpoint and in the

139 cinematographic evolution of the wave packet in Freeman’s. Another similarity is the reliance on Haken’s concept of “circular causality”, in which the individual elements of a system give rise to a collective variable – the order parameter, and the collective variable in turn exerts a causal influence on the individual components (Haken, 1991; 1996).

Both Kelso and Freeman interpret their results in light of attractor dynamics: for the former as the relative phase between coupled oscillators in a dynamical systems model moving through dynamical regimes, and for the latter as the series of transitions of the wave packet from one state to the next. For Freeman, the wave packet exemplifies a chaotic attractor in an attractor landscape

(Freeman, 2000), and is key to understanding important cognitive properties, such as intentionality, meaning and consciousness. Nevertheless, we argue here that in the context of consciousness both metastable coordination dynamics and nonlinear neurodynamics offer possible mechanisms for the brain’s production of conscious thought.

Finally, the nature of theoretical explanation in light of empirical evidence does not dictate that the two theories be completely distinct in all respects; after all, it is well-known that Einstein’s relativistic field equations reduce to Newtonian theory of gravity for speeds much than the speed of light and much weaker gravitational fields. Thus, in this case, we view the evidential overlap between

Kelso’s and Freeman’s theories as a sign of theoretical convergence, i.e.,

140 evidence that nonlinear dynamics is more than simply an abstract mathematical description of brain dynamics. We believe that in this case, theoretical underdetermination is a strength not a weakness (Stanford, 2013). Indeed, according to the well-recognized Quine-Duhem thesis in the epistemology of science: hypotheses can never be tested in isolation. There will always be a set of so called auxiliary hypotheses or background assumptions that come along with the particular hypothesis in question. Thus, in this case, we argue that there might not be a single test which would separate or favor either theory, nor any evidence that might favor one over the other. Granted, we are perhaps far away from a fully unified theory of brain dynamics mentioned above as a basis for consciousness: however, we believe that both Kelso and Freeman’s theory serve well as scientific building blocks for the future.

6.5 Classical Cognitivism, Freeman’s Pragmatism and The Action-Perception

Cycle

Freeman’s emphasis on nonlinear neurodynamics moved the field of cognitive neuroscience away from mental representations and stimulus-response theories of the brain. He promoted an understanding the brain as a dynamical system, and the mind as an emergent process. Freeman took aim at two philosophies, classical materialism and cognitivism, that he considered to be dead-end products of an over-reliance on feedforward, single-cell theories in neuroscience.

141 He objected to materialism’s identification of the mind with what the brain does, and its attribution of mental states to physical brain states. He stated that

“materialists view the mind as physical flows, whether of matter, energy, or information, which have their sources in the world” (Freeman 2000). According to

Freeman, a dead-end product of the materialist view is the twentieth century psychology of behaviorism, in which mental states are expressed by reflexive behaviors in the classical stimulus-response paradigm.

Freeman also opposed cognitivism, which, although it does not reduce mental states to matter, does reduce them to symbolic representations of sensory information. For the cognitivist, the brain is an information processing system with its own language of thought (Fodor 1975, Pinker 2005). Neurons compute by encoding sensory information into trains of action potentials, which are “bits that represent qualities, aspects, or features of the stimuli” (Freeman 2000). In this view, neurons are logical, two-state elements, which combine to form networks that “bind” sensory features into mental representations, and that perform logical computations on these bits of information. At the highest level of cognition, processes like attention, memory, language and perception are carried out by the promulgation of logical sequences of representational states.

By contrast, Freeman placed his own nonlinear neurodynamic view of the mind squarely in the camp of pragmatism. Freeman’s pragmatic philosophy is centered around the idea that the mind cannot be understood without an account of how the brain controls the body to perform actions in the external world. A core

142 feature of Freeman’s pragmatism is that perception for the organism is not a passive reception of information from the environment, but a constructive one that relies on endogenous neural activity. Freeman gave the name “action- perception cycle” to the mechanism for how brain does this. The brain, body and environment continually engage each other in a synergistic relationship that defines the mind. For Freeman, the mind is a dynamic structure arising from the body’s actions in its environment. In place of the stimulus-response paradigm,

Freeman held that neural responses are not primarily driven by exogenous stimuli, but rather that they conform to the self-organization of wave packets that continually evolves in a high-dimensional state space (Freeman, 2007). The brain is not a logical device, but a dynamical system that forms hypotheses about what the next stimulus will be, and then tests those hypotheses against the results. Freeman’s pragmatism aligns well with other similar views, such as embodied cognition, dynamicism, and enactivism.

Freeman identified the action-perception cycle with the philosophical concept of intentionality, which he traced to Thomas Aquinas, John Dewy, and Maurice

Merleau-Ponty, whom he saw as providing conceptual frameworks for explaining and interpreting his empirical work. Freeman cited Aquinas as describing the first and most relevant concept of intentionality for the neurodynamics of consciousness. For Aquinas, the intent of an organism manifests as the goal- directed actions that it executes for a future objective. Freeman clearly noted that this sense of the term “intentionality” is not the philosophical one defining the relation between a thought and that which it signifies – what is commonly called

143 “aboutness.” Like Aquinas, Freeman used the term “intentionality” in a biological context, not a philosophical one, to mean the body’s actions outward into the environment, and the learning of the consequences of those actions.

For Freeman, the body acts into the world. Perception follows from action, and the effects of action in turn change the body and the self. As a result, the brain learns to predict what will happen in the future. Intentionality thus changes the order of events in the classical stimulus-response paradigm, in which sensory stimuli provide information from the world, the organism learns from it, and responds with action. In intentionality, the organism predicts what the world will be like, performs actions to test those predictions, receives sensory input as a consequence those actions, and learns from those consequences. (Freeman

2000; Llinas 2007). The prediction made by an animal is thus a hypothesis about how the external world will respond to its action, and this hypothesis is generated in the context of the already formed goals of the animal. The animal performs this action to test the hypothesis. Freeman noted that this sequence is a “thrusting outward of the body” (Freeman 2000).

The action-perception cycle, repeatedly performed as the animal engages the world, is the basis of intentionality, and possibly consciousness itself. If so, consciousness requires movement, embodiment, action, and learning. Merleau-

Ponty, a pioneer of the philosophy of phenomenology, emphasized the role of the body acting into the world as primary for understanding cognition. Freeman noted that intentionality does not require consciousness, though it does require that a

144 body perform actions in the environment in a process that creates meaning for

the animal. Merleau-Ponty saw this as a phenomenological process, and called it

“maximum grip” (Freeman 2000). This intentionality without representation

(Dreyfus 1997) is what John Dewy called “action into the stimulus” (Freeman

2000). The concepts of action and perception in intentionality are thus antithetical

to any stimulus-response theory of mind.

Freeman’s neurophysiological evidence for intentionality was the spatial

amplitude modulation pattern of the cortical wave packet. Due to the conflux of

inputs from all sensory systems into the limbic system, particularly the entorhinal

cortex, Freeman postulated that the limbic system is essential for understanding intentionality. Freeman’s original schematic diagram illustrating the neural basis of intentionality is represented in Figure 2. In a revision of this schematic, we illustrate the creation of intentionality in the brain (Figure 3). The main interactions that Freeman saw as critical for intentionality in the action perception

cycle are: the space-time loop, the context loop, the control loop, and the motor loop. These loops act in concert to create meaning as the organism evolves in time through the four stages of the action-perception cycle (motor output, pre-

stimulus, stimulus input, and pre-motor, each highlighted in Figure 3 by arrows

and boxes).

The neural systems active in the action-perception cycle thus are:

(1) space-time loop: the hippocampus and entorhinal cortex interact throughout

the entire action-perception cycle to provide spatial and temporal context to the

145 creation of meaning. The entorhinal cortex, serving as the main interface

between the hippocampus and the sensory systems, integrates sensory

information and internal neural activity, and sends integrated activity back to the

sensory cortices. Recently, López-Madrona et al. [2017] have confirmed the

central role of the entorhinal cortex, finding that the bidirectional information flow

between the entorhinal cortex and different hippocampal regions, shown here in

the space-time loop, is strongly dependent on the structural connectivity patterns

of the layers of the entorhinal cortex.

(2) context loop: the entorhinal cortex interacts with the sensory systems during

the anticipation and processing of a sensory stimulus. This loop provides sensory

context for the creation of meaning, and allows the organism to anticipate, and

compensate for, the effects of action on sensory inflow.

(3) control loop: in pre-motor preparation and planning for action, and during the

action itself, the entorhinal cortex interacts with the motor systems. This loop

imbues actions with meaning, and facilitates the control of bodily movement.

(4) motor loop: actions are performed into the environment in this loop. The

motor systems send information to the effectors in the body, i.e. the muscles, allowing for the execution of action.

146 (5) prefrontal cortex-limbic loop: the prefrontal cortex is essential for the prospective component of goal-directed behavior (Fuster 2014; Preston &

Eichenbaum 2013). Thus, for devising future plans of action the prefrontal cortex interacts with the medial temporal lobe (MTL) system.

We now address the question of the neurodynamics of consciousness in the last section of this essay.

6.6 How does Freeman’s nonlinear neurodynamics contribute to understanding consciousness?

In the philosophical literature, the problem of consciousness has been split into hard and soft problems (Chalmers, 2000). The latter is the question of the degree to which science can explain how neural activity produces cognition and cognitive processes. The former, is precisely how (objectively measured) neural activity gives rise to subjective experience, also termed “qualia”, the feeling of what it is like to have experiences. Does Freeman’s nonlinear neurodynamics help generate a move toward answering the hard problem of consciousness?

Does Freeman’s theory help explain either the soft or hard problem? Can it bridge the so-called explanatory gap between the hard and soft problems? We argue that, although it is not widely appreciated in the scientific study of consciousness, Freeman reframed the relationship between consciousness and the brain in terms of dynamics. For him, the brain operates in a cycle of action

147 and perception, making hypotheses about the external world by sensing, perceiving, predicting, and testing. This cycle is the essence of intentionality, and therefore of consciousness. His theory offers a fundamentally dynamical view of consciousness, as the dynamical interaction between the organism and its environment, which can be modelled and explained by nonlinear neurodynamics.

Freeman’s nonlinear neurodynamics is not currently among the most popular scientific theories of consciousness, which include the global workspace hypothesis, quantum theories of consciousness, and neural correlate theories.

However, it offers something of great value, which others may not: an emphasis on consciousness not being a thing or phenomena per se, but rather a cyclic dynamic process operating in time. Freeman thus refers to consciousness as a

“dynamic operator” (Freeman, 2000). To understand Freeman’s view, we must return to his basic topics of study: neuronal populations; their interaction through state transitions; and their creation of meaning via the wave packet. The central entity in Freeman’s theory is the KIV set. K-sets are the essential hierarchical mesoscopic model for neural dynamics.

In building up to the dynamics of the KIV set, Freeman outlines 10 building blocks of neurodynamics (summarized in Kozma & Freeman 2009 and Freeman

2000). We briefly summarize them here: (1) the damped non-oscillatory response of the excitatory KI set; (2) negative feedback established by the emergence of damped oscillations between interacting excitatory and inhibitory KI sets in the

148 KII set; (3) the phase transition of the KII set to a limit cycle oscillatory pattern; (4) chaotic activity formed by the interactions of three or more coupled KII set oscillators, each with its own intrinsic frequency; (5) carrier wave formation by the

KIII set; (6) formation of wave packets having spatial amplitude modulation patterns; (7) modification of synaptic connectivity patterns in the KIII set with learning; (8) a decrease of sensory noise that is accompanied by the enhancement of wave packets in the sensory KIV set; (9) the multisensory convergence of transmitted sensory wave packets in the entorhinal cortex, which then, through recurrent loops (See Figure 3) strengthens the wave packets in the transmitting sensory areas, resulting in formation of a Gestalt in the sensory KIV set (Kozma 2007); and (10) “global integration of frames at the theta rates through neocortical phase transitions representing high-level cognitive activity in the KV model” (Kozma 2007). We speculate that consciousness is created in this last building block as a global neurocognitive state (Bressler 2007) that “directs the intentional state” of the entire central nervous system.

The neurophysiological underpinning of the action-perception cycle is the coordinated activity of cortical processes that leads to consensual wave packet expression across the cortex. Thus, Freeman states that, “Consciousness is the process that makes a sequence of global states of awareness. It is a state variable that constrains the chaotic activities of the parts by quenching local fluctuations”. Hence, consciousness is a spatiotemporal process that facilitates meaning and intentionality. We emphasize here that, although a neurodynamics

149 of consciousness may not solve either the hard or soft problems, it does offer a

unique perspective: consciousness requires embodiment, and subjectivity

requires dynamics.

We argue that, by virtue of its neural grounding, emphasis on temporal evolution,

and embodiment, the formulation of a neurodynamics of consciousness based on

Freeman’s work could offer a scientific theory that is consistent with the

phenomena of subjective awareness. Freeman’s viewpoint has great explanatory

power, as it includes facets of philosophy, psychology, complexity science and

neuroscience. Admittedly, the neurodynamics of consciousness still has many

details to be worked out, but perhaps one day it will reach the scientific status of

Newton’s theory of gravitation, Einstein’s theory of relativity, or Maxwell’s theory

of electromagnetism. In favor of Freemans neurodynamics, an important point to

re-emphasize is that it incorporates the body’s movement in the environment.

6.7 Conclusions

Francisco Varela, a prominent proponent of embodiment, stated that “Cognition depends upon having a body with various sensorimotor capacities, and second, that these individual sensorimotor capacities are themselves embedded in a more encompassing biological, psychological and cultural context” (Varela,

1991). If the same applies to consciousness, then Freeman’s theory may provide a way forward. We argue that a nonlinear neurodynamics approach has much to

150 offer for the understanding of consciousness. Freeman’s nonlinear

neurodynamics approach to cortical coordination dynamics should be considered

as providing a description of neural events in the brain, embedded in the body

and environment, that give rise to conscious awareness. Lastly, Stanislas

Dehaene has postulated that once we move closer to solving the so-called easy problems of consciousness the traditional hard problem will no longer be relevant. We do not go so far here as to claim that the hard problem of consciousness is irrelevant. We do propose a definition from the perspective of neurodynamics that consciousness is a dynamical process that occurs uniquely in a part of the physical world (the brain) that is comprised of neurons, having intrinsic connectivity and extrinsic relations with the environment.

151

Figure276.1. Concurrent LFP records from 2 olfactory bulb and 2 prepyriform cortex sites, demonstrating increased gamma wave amplitude in both structures in response to an olfactory stimulus. b. (left) Example of a pattern of LFP gamma waves recorded during inhalation from a very high-density 64-channel electrode array place on the surface of the olfactory bulb. The 64 EEG traces were band- pass-filtered in the gamma frequency range and segmented to display an oscillation with inhalation. The “x" marks a bad channel that is replaced with an EEG signal from the prepyriform cortex. (right) Spatial amplitude patterns, displayed as topographic maps, showing the olfactory bulb response to filtered air (given as a control stimulus, top row) and amyl acetate, (given as a conditioned stimulus , bottom row). Note that both the air and amyl spatial amplitude patterns change from the first day of training (column 1) to the second (column 2) recorded 2 weeks later. This result, showing that the pattern lacks invariance with respect to the odor stimulus, was the first indication to Freeman that the pattern relates to the meaning of stimulus.

Figure286.2. Freeman’s schematic showing the flow of neural activity in the construction of meaning, with emphasis on two main structures in the limbic system: the entorhinal cortex and the hippocampus. 152

Figure296.3. Revision of Freeman’s schematic diagram, illustrating the neural basis for intentionality in the mammalian action-perception cycle as 4 processing stages (1 – pre-stimulus processing of expected stmuli with expectation based on motor output; 2 – stimulus input processing with context based on stimulus expectation; 3 – pre-motor processing of anticipated motor acts with context based on processed stimuli; 4 – motor output processing with context based on motor anticipation). The action-perception cycle creates meaning in the brain via interacting brain areas. In each processing stage of this revised diagram, the prefrontal cortex interacts with the limbic system to give mammalian goal- directed behavior its prospective character (Fuster 2014).

153 CHAPTER 7: CONCLUSIONS

7.1 Tying it all together: Implications and Interpretations

In this work, several concepts have been addressed, including: establishing

causal relations between nodes in a neural-mass, brain network model, the underlying, conceptual nature of causality and causal relations in the human brain, and the relationship between the dynamics of brain regions and how information flows between them. The field of nonlinear neurodynamics has permeated this work throughout, both in its use in a generative model, as well as interpreting its relation to the subject of consciousness. One of the most central

and important results of this work asks the question: is there a relationship

between the internal circuitry of brain regions (in this simulated case the dynamics of a neural mass model) and how information flows between them?

The answer is in the affirmative, moreover, the ability to recover causal relations

between the neural masses is highly dependent on the dynamics. Some of this

work echoes the claim made by Wang et al., (2014), that there may be no

optimal method of functional connectivity that matches all type of underlying

generative models. It has been shown that, not only the dynamics, but

conduction delays as well as coupling strengths play a crucial role in established

causal relations. The work of Ding et al., (2006) has also been faithfully

154 reproduced using a neural mass model in The Virtual Brain, an important step

towards understanding causality in the cortex. Cortical causality in inherently

probabilistic, and the concept of screening off can help explain how causal

relations can be inferred in large-scale brain networks. Finally, the general

theory of nonlinear neurodynamics has been shown to have significant

importance for understanding cognitive processes, as implemented by the

unification of wave packets in large scale neurocognitive networks, and even

consciousness.

7.2 Future Considerations

Not only has this research hopefully answered some theoretical as well as

empirical questions, it has generated many general and specific questions, to be

investigated in later work: are the results generalizable to other neural mass

models? What is exact relationship between coupling and intrinsic dynamics in

terms of information flow? Generally, what can be said about the reliability of

functional connectivity method used and the underlying generative model?

155

APPENDICES

Appendix A. Epilogue: a contextual self-reflection

Appendix B. Commentaries and Reply to Commentaries to Chapter 2: Physics of Life Reviews Paper

156 APPENDIX A

Epilogue: a contextual self-reflection

All truths wait in all things, They neither hasten their own delivery nor resist it, They do not need the obstetric forceps of the surgeon, The insignificant is as big to me as any, (What is less or more than a touch?)

Logic and sermons never convince, The damp of the night drives deeper into my soul.

(Only what proves itself to every man and woman is so, Only what nobody denies is so.)

A minute and a drop of me settle my brain, I believe the soggy clods shall become lovers and lamps, And a compend of compends is the meat of a man or woman, And a summit and flower there is the feeling they have for each other, And they are to branch boundlessly out of that lesson until it becomes omnific, And until one and all shall delight us, and we them.

I believe a leaf of grass is no less than the journey-work of the stars…

From Song Of Myself, by Walt Whitman

Song of Myself has been my favorite poem since I was 10. I have a journal entry from around 14 or 15 years old, when I wrote the last section of the poem and how it made me feel, how it changed my subjective experience of the world…

Now, reading it again, I wonder, how this particular small passage from this long and dramatic poem, can capture my life with such eloquence and precision?

157 Thus far, my existence has seemed to be a like a non-random walk, a naturally- selected for journey, an event horizon never approached, a search for those truths, waiting in all things… It has been a long, long road until now, filled with a lot of self-doubt and confident moments, exuberance and disappointment, adventures and road-blocks, and…consciousness! I remember struggling through the last 2 years of my Bachelors in astrophysics (in which I ended up doing mediocre), three hours north of here, in Melbourne, FL, and immersing myself in studies of consciousness, interpretations of quantum physics, metaphysics, cosmology, pantheism and new age thought. I read all the books:

Gary Zukav’s The Dancing Wu Li Masters, Fritof Capra’s The Tao of Physics,

Fred Alan Wolf, Evan Harris Walker’s The Physics of Consciousness, and many, many others. In many cases, I had numerous emails in which I quixotically reached out to some of these people asking what it would take to study these things, and contribute to them. I was youthful, mature in many ways (as not many, nay, any, of my friends were thinking about a possible relationship between consciousness, physics, space and time), though very immature in many ways – that is, not that wise, and not knowing what that level of thinking would entail. I certainly lacked brilliance, I was, and still am no Einstein, far from it, but, like he once said, at least I had interest, desire, and a deep passion for science, an intense curiosity. In fact, here is one message thread from Fred Alan

Wolf.

158

His kind response, dated in 2000:

And my further response…

159

(JFKU was, and maybe still is, a graduate school with a program in science and spirituality.)

Youthful, juvenile, and not very professional, but again, my interest was evident.

Nevertheless, here is an image of quiz I took in undergraduate quantum mechanics (deriving Heisenberg’s uncertainty relation), and got a perfect score…so, perhaps not a hopeless case!

160

Since, college, now 19 years ago, I have always dreamed of getting my PhD in consciousness research. In fact, in 2000, I applied to UCLA PhD program in neuroscience, but did not get in. And so, after 14 years, and getting rejected from numerous PhD programs, in philosophy, and even one Masters program in the philosophical foundations of physics, at Columbia University, I decided to first finish my Masters in philosophy and become a philosophy professor, and found, perhaps not so surprisingly, that I was good at that! But my dream never abated, the desire to earn my PhD and be a scientist somehow held very strong in some

(perhaps) large-scale network in my brain. That “mindset” in the technical sense of my advisor Steven Bressler, remained and persevered. Thus, here I am, now

40 years old: I have learned such a great deal of knowledge, about so many subjects and so many brilliant scientists here at the Center. I came in hoping to discover what consciousness is, perhaps I still do not know, but I believe I have come much closer to the truth. For example, a major paradigm shift occurred in my head. I used to believe in representationalism, but now I do not, I believe that the mind and its perception is much more constructive, relational, processural, likewise are some of the knowledge gained by science and scientists. In other words, I used to be a hard core scientific realist, but have more much closer to the anti-realist side. Much of that is due to my discussions with Dr. Bressler about people like Freeman, Kelso, Varela, Maturana, and Glasersfeld, as well as my research in complex systems, coordination dynamics, and neuroscience. I entered the program as a scientifically inclined philosopher, but I believe I can

161 now consider myself a philosophically inclined scientist, and happy to be one!

Regardless of my future path, my search for truth, both philosophically and scientifically will never stop, and I am truly grateful for being at the Center in this program, and for the knowledge gained at this institution.

162 APPENDIX B

Commentaries and Reply to Commentaries to Chapter 2: Physics of Life Reviews Paper

Wiener-Granger causality for effective connectivity in the hidden states:

Indication from probabilistic causality

Comment on “Foundational perspectives on causality in large-scale brain networks”

Wei Tang

Statistics and probability theory have advanced our understanding of

random processes widely observed in the physical world. There is a remarkable

trend in studying the brain by looking into the stochastic information processing in

large-scale brain networks [1,2]. As the review by Mannino and Bressler [3] points

out, the probabilistic notion of causality, with its rooted philosophical foundations,

represents a revolutionary view on how different parts of the brain interact and

integrate to generate function. Specifically, Probabilistic Causality (PC) asserts

that a cause should increase the probability of occurrence of its effect, and PC

between two brain regions entails that the probability for the activity in one region

to occur increases when conditioned on the activity of the other. This definition

claims inherent randomness in the causal relationship.

As Mannino and Bressler [3] point out, an important tool to quantify PC is

Wiener-Granger Causality (WGC) [4,5]. The philosophical examination in this

review inspires a reconsideration of how WGC could be applied and interpreted

from the perspective of PC. A long-standing belief in the brain imaging field is that

WGC is designed to detect functional connectivity in the data, while Dynamic

163 Causal Modeling (DCM) [6], as opposed to WGC, models the dynamic hidden

states of neural activity that may underlie the observed connectivity [7]. However,

if PC is assumed to exist in the hidden states, WGC can be aptly applied to capture

it, going beyond the canonical data-driven analyses. To explicitly model the hidden

states alleviates the dichotomy between WGC and DCM: both methods can

measure effective connectivity [8] despite the different perspectives they take.

Estimating WGC for the hidden states would require state-space modeling

similar to that adopted by DCM [6], but with history dependency incorporated in

the state equation. The state-space framework has been proposed by Barnett and

Seth [9] using a linear autoregressive state equation. However, in their paper WGC

was calculated from the observation equation, which still conforms to the traditional

idea of characterizing connectivity at the data level. Alternatively, one can use the

same framework to analyze PC at the hidden-state level by calculating WGC from the state equation. Barnett and Seth’s methods [9] readily estimate the state parameters, and constructing WGC from the state parameters would be straightforward. Research along this line is yet to be developed, but the concept of

PC undoubtedly puts up a theoretical framework for integrating existing tools to measure effective brain connectivity.

References

[1] Rolls E, Deco G. The noisy brain: stochastic dynamics as a principle of brain function. Oxford: Oxford University Press; 2010.

164 [2] Bullmore E, Sporns O. Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience, 2009; 10: 186–

198.

[3] Mannino M, Bressler SL. Foundational perspectives on causality in large-scale brain networks. Physics of Life Reviews, 2015; 4(15), this issue.

[4] Wiener N. The theory of prediction. In: Beckenbach EF, editor. Modern

Mathematics for Engineers, New York: McGraw-Hill; 1956, vol. 1.

[5] Granger CW. Investigating causal relations by econometric models and cross- spectral methods. Econometrica: Journal of the Econometric Society 1969; 37(3):

424–438.

[6] Friston K, Harrison L, Penny W. Dynamic causal modeling. Neuroimage, 2003;

19(4): 1273–1302.

[7] Friston K, Moran R, Seth AK. Analysing connectivity with Granger causality and dynamic causal modelling. Current Opinion in Neurobiology, 2013; 23(2): 172–

178.

[8] Friston K, Functional and effective connectivity in neuroimaging: a synthesis.

Human Brain Mapping, 1994; 2(1–2): 56–78.

[9] Barnett L, Seth AK. Granger causality for state-space models. Physical Review

E, 2015; 91(4): 040101.

Deterministic versus probabilistic causality in the brain: to cut or not to cut

Comment on “Foundational Perspectives on Causality in Large-Scale Brain

Networks” by Mannino & Bressler

Mengsen Zhang, Craig Nordham, and J. A. Scott Kelso

165 In recent decades the rapid growth of new imaging technologies and measurement tools has dramatically changed how neuroscientists explore the function of the brain. A careful examination of the conceptual basis of causal inference using such methods is long overdue. Mannino and Bressler (M&B) [1] provide an informative review on the notion of causality from the perspectives of philosophy, physics, complex systems and brain sciences.

M&B assert that causality in the brain is probabilistic, not deterministic in nature.

Later on, they say that they have not tried to answer the question of whether the brain is inherently deterministic or stochastic. The two statements cannot be consistent unless “causality” in M&B’s definition speaks not to how the brain actually works but instead is equated to statistical relations between measurements. This is akin to the Born interpretation of quantum mechanics which gives not the density of the stuff but rather the density of probability of finding the stuff [2]. Beyond general (unfortunately unsubstantiated) claims that the brain is self-organizing [cf. 6] M & B seem to be telling us not how the brain is, but how we happen to measure it.

We wish to draw a distinction between probabilistic relations (statistical relations) and probabilistic causal relations (stochastic processes). Probabilistic relations are what we can calculate statistically from some measurement. However, when we say causal relations, we are talking about whether one event truly influences a second—which is to say that if we are able to manipulate the first event, we will see corresponding changes in the second, no matter if the change is stochastic or deterministic. It is not guaranteed that all statistical relations are also causal

166 relations, since we cannot know if we have recorded all causal events and/or for a sufficiently long time to screen off spurious (non-causal) statistical relations.

Even if a statistical relation is found to reflect a causal relation, it is not guaranteed that this underlying causal relation has to be stochastic. One can also make statistical measurements on deterministic processes. In practice, the experimental verification of the most deterministic theories still requires doing statistics. In statistical mechanics, for example, one can assume the microstates follow deterministic laws, while still studying their statistical properties [3].

With the above clarification in mind, let us consider the reasons M&B use for the abandonment of determinism in terms of inference methods. First, M&B argue that deterministic causal inference leads to spurious conclusions. They use the following example: suppose an event X (barometric pressure drop) causes both event Y (mercury rising in a column) and event Z (storm). M&B argue that the spurious regularity YZ can be screened off statistically by the probability of

XZ but “a deterministic interpretation of causality would mistakenly infer that the drop in mercury causes the storm.” However, a meteorologist with a deterministic view of the system could falsify this hypothesis by heating up the mercury to see if a height increase can stop the storm. Rather than looking for more variables, she is applying systematic inductive inference by devising a crucial experiment that discriminates alternative hypotheses of interest, the path of strong inference [4]. Thus, one would not want to confound the problem in the inference methods with that of the causal assumptions.

167 M&B further argue that the ubiquitous mutual causality in the brain, and complex systems in general, (e.g. bidirectional influence between two nodes in a neural network) renders determinism logically impossible. We do not see how this conclusion follows. Staying within the classical world, Newton’s 3rd law says if A exerts a force on B, then B must also exert a force on A: if the earth is pulling the moon, the moon is also pulling the earth, which is being mutually causal.

Nonetheless, most would view the system composed of earth and moon as deterministic, meaning that given an initial state of the system, the future trajectory is uniquely determined. Without that, it would be difficult to send people to the moon on a rocket. So this is logically possible and practically useful. In complex systems, mutual causality is a key to self-organization in sociological, physical, chemical, and biological systems. Both deterministic and stochastic views of causality, with their corresponding mathematical formalisms, contribute to the understanding of the dynamics of such systems [5].

M&B also argue that due to the high dimensionality of the brain, “even if the brain does employ deterministic influences,” it would be indistinguishable from a stochastic one. However, we should point out that converging evidence shows that some interactions can lead to low-dimensional collective dynamics [5– 9].

Essentially, while interacting components (say at a micro-level) give rise to the emergence of collective patterns (at meso-level), the collective patterns can in turn enslave the behavior of those very same components (at micro-level). These two opposing forces result in so-called circular causality (Figure 1). Collective patterns are temporally assembled in order to accommodate certain functional

168 demands and their low-dimensional dynamics is invariant to the different specific membership configurations of the components. This nonlinear property of biological complex systems is called degeneracy [9], which is essential for the system to resist micro-level fluctuations and perturbations [10]. In the meantime, fluctuations or stochasticity can assist in the switching of patterns in order to adapt to new functional needs [11]. Now maybe we can ask: what is the nature of the causality that makes a pattern persist and what is the nature of the causality that makes a pattern change? Wouldn’t it be nice to let them complement each other with regard to the function of the brain?

Figure 1. The causal loops of coordination dynamics. Control parameters, such as λ, which can be specific or non-specific, t (time), and Q (stochastic noise) lawfully influence the micro-level. Resulting component interactions may produce collective effects at the meso-level (upward causation) that in turn may modify component behavior (downward causation). In certain situations, such as the human dynamic clamp [12] collective variables act back on control parameters invoking circular causality.

We humbly suggest that probabilistic and deterministic notions of causality each hold half of the puzzle of the function of the brain, rather than being mutually

169 exclusive. How then can we make inferences about them? Experimentally, we can create tasks (functional demands) and apply perturbations that carry the system through different behavioral repertoires, the dynamics of which can be observed by measures of stability [5][6][9][13][14]. For modelling, as M&B agree,

Granger causality and dynamic causal modeling are complementary [15] and can be used in tandem to understand the brain. One would also want to supplement these with techniques that treat essentially nonlinear complex systems [16].

Phase space [17] and projection methods with appropriate basis functions for nonlinear stochastic systems [18] may illuminate more of the features of the system and yield better predictions than linear statistical inference.

References:

[1] Mannino, M., & Bressler, S. L. (in press). Foundational perspectives on

causality in large-scale brain networks. Physics of Life Reviews.

http://doi.org/10.1016/j.plrev.2015.09.002

[2] Bell, J.S. (1990). Against measurement. Physics World (August), pp.33-

40.

[3] Schrödinger, E. (1989). Statistical thermodynamics. Courier Corporation.

[4] Platt, J. R. (1964). Strong inference. Science, 146(3642), 347-353.

[5] Haken, H. (2004). Synergetics: introduction and advanced topics. Berlin ;

New York: Springer.

[6] Kelso, J. A. S. (1995). Dynamic patterns: The self-organization of brain

and behavior. MIT press.

170 [7] Kelso, J.A.S. (2009). Synergies: Atoms of brain and behavior. Advances in

Experimental Medicine and Biology, 629, 83-91. [Also D. Sternad (Ed) A

multidisciplinary approach to motor control. Springer, Heidelberg].

[8] Weiss, P.A. (1969). The living system: determinism stratified. In

A.Koestler, & J.R. Smythies, (Eds.) Beyond Reductionism. Beacon

Press, Boston.

[9] Kelso, J. A. S. (2012). Multistability and metastability: understanding

dynamic coordination in the brain. Philosophical Transactions of the

Royal Society B: Biological Sciences, 367(1591), 906-918.

[10] Kelso, J. A. S., Tuller, B., Vatikiotis-Bateson, E., & Fowler, C. A. (1984).

Functionally specific articulatory cooperation following jaw perturbations

during speech: evidence for coordinative structures. Journal of

Experimental Psychology: Human Perception and Performance, 10(6),

812-832.

[11] Braun, J., & Mattia, M. (2010). Attractors and noise: twin drivers of

decisions and multistability. Neuroimage, 52(3), 740-751.

[12] Dumas, G., DeGuzman, G.C., Tognoli, E. & Kelso, J.A.S. (2014) The

Human Dynamic Clamp as a paradigm for social interaction. Proceedings

of the National Academy of Sciences

http://www.pnas.org/cgi/doi/10.1073/pnas.1407486111

[13] Bressler, S. L., & Kelso, J. A. S. (2001). Cortical coordination dynamics

and cognition. Trends in Cognitive Sciences, 5(1), 26-36.

171 [14] Kelso, J. A. S. (2014). The dynamic brain in action: Coordinative

structures, criticality, and coordination dynamics. In D. Plenz & E. Niebur

(Eds.) Criticality in Neural Systems, Wiley, pp. 67-104.

[15] Friston, K., Moran, R., & Seth, A. K. (2013). Analysing connectivity with

Granger causality and dynamic causal modelling. Current opinion in

Neurobiology, 23(2), 172-178.

[16] Miller, J. H., & Page, S. E. (2009). Complex adaptive systems: an

introduction to computational models of social life: an introduction to

computational models of social life. Princeton university press.

[17] Kantz, H., & Schreiber, T. (2004). Nonlinear time series analysis.

Cambridge university press.

[18] Wiener, N. (1966). Nonlinear problems in random theory. Cambridge,

Massachusetts, USA: The MIT Press.

What is the nature of causality in the brain? - Inherently probabilistic. Comment

on “Foundational Perspectives on Causality in Large-Scale Brain Networks” by

Mannino & Bressler

Mukesh Dhamala

Understanding cause-and-effect (causal) relations from observations concerns all sciences including neuroscience. Appropriately defining causality and its nature, though, has been a topic of active discussion for philosophers and scientists for centuries. Although brain research, particularly functional neuroimaging research, is now moving rapidly beyond identification of brain

172 regional activations towards uncovering causal relations between regions, the nature of causality has not be been thoroughly described and resolved. In the current review article [1], Mannino and Bressler take us on a beautiful journey into the history of the work on causality and make a well-reasoned argument that the causality in the brain is inherently probabilistic. This notion is consistent with brain anatomy and functions, and is also inclusive of deterministic cases of inputs leading to outputs in the brain.

A living brain is a complex dynamical system with many highly interconnected, interacting and self-organizing entities (neurons). The traditional notion of brain regions as information-processing units, with an input, a local- processing capability and an output is too rigid and is not generally applicable throughout the brain. An ultimate response of a neuronal system in the brain is determined by feedforward, feedback and modulatory influences and is not always guaranteed [2]. Incoming inputs, even in absence of neuronal outputs, can bring about other physiological changes such as local cerebral blood flow changes [3]. In view of a likelihood of an observable neuronal response in an ongoing stream of various inputs, probabilistic causality is more appropriate for describing neural influences. Granger causality measures in both time- and frequency-domains [4, 5] and with estimation approaches based on either autoregressive modeling [6] or spectral factorization [7, 8] are consistent with the probabilistic notion of causality. These measures have been applied to study causal relations in large-scale brain networks from a variety of brain function recordings, including local field potentials (LPFs),

173 (EEG), (MEG), and functional magnetic resonance

imaging (fMRI) [9, 10, 11], functional near-infrared imaging [12], intra-cortical

EEG [13].

This review [1] also makes an excellent distinction between Granger

causality and dynamical causal modeling (DCM) [14], another widely used

method to study causal relations among regions of large-scale networks. The

original DCM [14] relies on deterministic models of distributed neuronal

responses (hidden) to external inputs to the brain due to sensory stimulation or task performance. This version of DCM treats the brain as a deterministic dynamical system and causal relations obtained are based on neuronal input to output relations. The causal relations thus obtained are deterministic in nature. A

recent version usually referred as stochastic DCM (sDCM) [15, 16], however,

uses modeling of stochastic dynamical systems to describe hidden variables

generating the observed brain data in situations of no explicit tasks or sensory

stimulation. This version of DCM can allow for a probabilistic interpretation of causality.

The human brain is considered to be one of the most complex systems in

the universe. There are many mysteries to be solved including the patterns of

causal relations among brain regions during perception, cognition and behavior.

This review [1] comes in a right time when a major initiative like the BRAIN initiative is underway in attempt to mapping the entire human brain. These perspectives on the nature of causality relating to the brain in the review are

174 helpful for identifying and using the appropriate network activity analysis tools for these and other similar efforts.

References

[1] Mannino M, Bressler SL. (2015). Foundational Perspectives on Causality in

Large-Scale Brain Networks. Physics of Life Reviews, doi:10.1016/j.plrev.2015.09.002, this issue.

[2] (Douglas and Martin, 2004). Douglas RJ, Martin KA. (2004). Neuronal circuits of the neocortex. Annu. Rev. Neurosci., 27, 419-51.

[3] Iadecola C. (2004). Neurovascular regulation in the normal brain and in

Alzheimer’s disease. Nat. Rev. Neurosci. 5, 347 – 360.

[4] Granger, C. W. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica: Journal of the Econometric Society, 424-

438.

[5] Geweke, J. (1982). Measurement of linear dependence and feedback between multiple time series. Journal of the American Statistical Association, 77(378),

304-

313.

[6] Ding, M., Chen, Y., & Bressler, S.L. (2006). Granger causality: Basic theory and application to neuroscience. In Schelter. S., Winterhalder, N., & Timmer, J.

175 Handbook of Time Series Analysis.

[7] Dhamala M, Rangarajan G, Ding M. (2008). Estimating Granger causality from Fourier and wavelet transforms of time series data. Phys Rev Lett ,100,

018701 1–4.

[8] Dhamala M, Rangarajan G, Ding M. (2008). Analyzing information flow in

brain networks with nonparametric Granger causality. Neuroimage, 41, 354–362.

[9] Bressler, S.L., Seth, A.K. (2011). Wiener–Granger causality: a well

established

methodology. Neuroimage, 58(2), 323-329.

[10] Friston, K., Moran, R., Seth, A. K. (2013). Analysing connectivity with

Granger

causality and dynamic causal modelling. Current opinion in neurobiology, 23(2),

172-178.

[11] Seth AK, Barrett AB, Barnett L. (2015). Granger causality analysis in

neuroscience and neuroimaging. J. Neurosci. 35 (8), 3293 – 3297.

doi: 10.1523/JNEUROSCI.4399-14.2015.

[12] S. Bajaj S, Drake D, Butler AJ, Dhamala M. (2014). Oscillatory motor

network activity during rest and movement: an fNIR study. Frontiers in Systems

Neurosc. 8, 1. doi: 10.3389/fnsys.2014.00013

[13] Epstein CM, Adhikari, B, Gross R, Willie J, Dhamala M. (2014). Application

of high frequency Granger causality to analysis of epileptic seizures and surgical

decision-making. Epilepsia 55, 2038 – 2047.

176 [14] Friston K.J., Harrison L., Penny W.D. (2003). Dynamic causal

modelling. NeuroImage, 19, 1273–1302.

[15] Friston K.J., Li B., Daunizeau J., Stephan K.E. (2011). Network discovery

with DCM. NeuroImage, 56, 1202–1221.

[16] Daunizeau J, Stephan KE, Friston KJ. (2012). Stochastic dynamic causal

modeling of fMRI data: should we care about neuronal noise? Neuroimage, 62

(1), 464 – 481.

Critical perspectives on causality and inference in brain networks:

Allusions, illusions, solutions?

Comment on: “Foundational Perspectives on Causality in Large-Scale Brain

Networks.

Vaibhav A. Diwadkar

Allusions: What does causality in the brain really refer to?

The human brain is an impossibly difficult cartographic landscape to map out. Within it’s convoluted and labyrinthine structure is folded a million years of phylogeny, somehow expressed in the ontogeny of the specific organism; an ontogeny that conceals idiosyncratic effects of countless genes, and then the

(perhaps) countably infinite effects of processes of the organism’s lifespan resulting in remarkable heterogeneity [1, 2]. The physical brain itself is therefore

177 a nearly un-decodable “time machine” motivating more questions than frameworks for answering those questions: Why has evolution endowed it with the general structure that is possesses [3]; Is there regularity in macroscopic metrics of structure across species [4]; What are the most meaningful structural units in the brain: molecules, neurons, cortical columns or cortical maps [5]?

Remarkably, understanding the intricacies of structure is perhaps not even the most difficult aspect of understanding the human brain. In fact, and as recently argued, a central issue lies in resolving the dialectic between structure and function: how does dynamic function arises from static (at the time scales at which human brain function is experimentally studied) brain structures [6]? In other words, if the mind is the brain “in action”, how does it arise?

Despite the countably finite number of published studies using in vivo and other neuroimaging techniques, Mannino and Bressler articulate an uncomfortable ontological truth: Understanding causal interactions between brain structures is a non-trivial problem that cannot merely be surmounted by mountains of data [7]. After all, the brain concedes signals quite readily, a technical question that Angelo Mosso remarkably enough began to tackle in the late 19th century [8], and currently is being addressed using multiple imaging modalities. The central question of interest in theoretical neuroscience is the discovery of hidden brain “states” from which these emergent signals emerge [9], a problem that requires us to reverse engineer ourselves towards understanding

“neural” interactions [10]. Where within these hidden states might we find causality and what form might such causality assume?

178

“Causal” illusions: A brief history of causality

To highlight the complexity of this question, the authors elegantly outline a comprehensive narrative arc on the relatively recent history of “causality” itself.

They first discuss the notion of deterministic causality, enshrined as an emergent property of Newtonian physics, before then ultimately transitioning to current modifications of the notion of causality in the physical world, informed by discoveries in quantum mechanics and quantum entanglements. The journey is brisk, yet illuminating. David Hume’s empiricism is considered, an early approach at addressing the ontology of causality through perception and the regularity of events resulting in the subjective experience of causality, yet in many ways not material to the fundamentals of what the term really means. The authors then turn to Immanuel Kant’s explicitly psychological theory of causality as a category emerging from a priori representations that act to process empirical knowledge.

As the authors imply, the philosophical Worlds of Hume and Kant emerged from

Newton’s Universe: Mechanistic order could be modeled by mathematics, an inherently deterministic system. This philosophical world was doomed once the universe on which it was based needed substantive modification. The authors correctly credit Bertrand Russell with insights on the bases of causality that were far ahead of his scientific world, preceding the maturation of quantum mechanics as a fundamental theory of the physical world. In his 1913 classic paper, Russell dealt with causality not in terms of its psychological, but in terms of its ontological

179 basis, suggesting the phrase “functional relations” as a substitute for “causal relationships” between events. In fact, the concept of “functional relations” is almost ideally suited to describing relationships between brain units. When the author’s do extend the concept of causality to brain interactions, they invoke the term in a probabilistic (or quantum sense). But before considering causality in large scale brain networks, we must consider what “neural activity” is [11].

Far from being a single and simple unitary construct, “neural activity” does not describe a single unitary construct with specific functional consequences.

Rather the brain is characterized by multiple classes of neural activity within and across it’s spatial scales, with these classes of neural activity having distinct or overlapping functional correlates [12] with complex hierarchies [13]. Mannino and

Bressler invoke three concepts crucial in understanding functional interactions between the brain’s constituents and why deterministic causality is a logical impossibility: First is the notion of mutual causality, a characteristic of brain units

(at least at measurable time scales) wherein they exert near contemporaneous effects on each other. The second is the notion of multiple (and therefore causally indeterminate) inputs to single units within the brain, an anatomical lattice that is characteristic of the brain’s known architecture. And a third feature is the fact that neurons themselves function as threshold units, such that their outputs (one class of “neural activity” alluded to earlier) may or may not reflect influences from other neuronal populations (leading to indeterminacy in the causal basis of brain signals; more on this below). These three considerations alone highlight the murkiness of thinking of causality between neural units

180 (regardless of spatial scale). The world of the brain is not the world of Newtonian mechanics. Rather it is a world of complex entanglements best explained from the framework of probabilistic rather than deterministic effects.

“Solutions”: Do formal computational solutions for understanding causality exist?

Implicit in Mannino’s and Bressler’s thesis is the idea that divining causality among constituents in large-scale brain networks is almost impossibly challenging. For while the notion of probabilistic causality is the only viable version of the causality construct, problems abound. For one thing (and not explicitly considered by the authors), the relationship between observed brain signals and the presumed “hidden” states that give rise to them may be close to indeterminate. The limitations of functional magnetic resonance imaging (fMRI) are highly illuminating in this regard [14]. Despite being the in vivo neuroimaging tool of overwhelming choice, current fMRI is blind to aspects of neural activity that we know to be exquisitely detailed, thus unable to reliably characterize excitatory-inhibitory circuits within cortical layers, parse apart top-down and bottom-up influences and distinguish between modulatory and function-specific effects on brain activity. Certainly, given the currently observable spatial and temporal scales at which data is collected from the brain, and the degeneracy of cognitive states and their origins in brain structure [6], it is not straightforward to anticipate frameworks for bridging the space between these gaps in fMRI. In

181 some ways, plausible solutions to inferring causality might be downstream from these fundamental problems.

The authors note that multivariate autoregressive models within the framework of Wiener-Granger (WG) analyses seek to statistically quantify differences in the degree of mutual effects between network nodes, thus satisfying probabilistic approaches toward causality [15]. However, WG’s limitations as a tool for “network discovery” are characterized by among other aspects, the assessment of limited numbers of network nodes, and to the limited temporal resolution of classes of brain signals such as fMRI. WG’s limitations in some ways are complemented by dynamic causal modeling (DCM)[16, 17] a framework for assessing macroscopic network interactions based on biophysical models mapping cumulative neural responses into hemodynamic signals. While the classical form of DCM implicitly retains elements of the classical deterministic notion of causality, the inferential process is itself inherently probabilistic

(Bayesian)[18], acknowledging that a statistical approach is essential for discovering likely generative models of neuroimaging data. Moreover, recent efforts at adding stochastic terms (random neuronal fluctuations) to DCM state equations [19] are an important concession to the challenges of determinism in the brain.

Yet, it is unclear whether WG, DCM or any of a myriad of analytic approaches can make a real dent in the fundamental problem: Can we reliably infer causality in large-scale brain networks? Mannino and Bressler are in my

182 opinion, wholly accurate in framing this as what types of analytic approaches are consistent with probabilistic causality, rather than which approaches are correct.

Neuroimaging is awash in data and there is little doubt that it is in treating the brain as a complex system that we will begin to unravel its complexity in generating cognitive architectures [6, 20]. Yet, as Mannino and Bressler’s review suggests, if network dynamics can only be thought of as arising from probabilistic causality, then data alone will not suffice. Rather what may be required is a chorus of analytic approaches that are infused by both computational and philosophical considerations. In recent history, the discipline of Physics has been characterized by perhaps three complete conceptual revolutions (Newton’s,

Einstein’s and Bohr’s). It is likely that discipline of Neuroscience may need several. In this timely and material review, Mannino and Bressler artfully articulate this humbling thought.

REFERENCES

1. Gould, S.J., Ontogeny & Phylogeny. 1977, Cambridge, MA: Belknap

(Harvard University Press).

2. Uylings, H.B., et al., Consequences of large interindividual variability for

human brain atlases: converging macroscopical imaging and

microscopical neuroanatomy. Anat Embryol (Berl), 2005. 210(5-6): p. 423-

31.

3. Bastir, M., et al., Evolution of the base of the brain in highly encephalized

human species. Nat Commun, 2011. 2: p. 588.

183 4. Zhang, K. and T.J. Sejnowski, A universal scaling law between gray

matter and white matter of cerebral cortex. Proc Natl Acad Sci U S A,

2000. 97(10): p. 5621-6.

5. Amunts, K., et al., BigBrain: an ultrahigh-resolution 3D human brain

model. Science, 2013. 340(6139): p. 1472-5.

6. Park, H.J. and K. Friston, Structural and functional brain networks: from

connections to cognition. Science, 2013. 342(6158): p. 1238411.

7. Mannino, M. and S.L. Bressler, Foundational perspectives in causality in

large-scale brain networks. Physics of Life Reviews, In Press.

8. Sandrone, S., et al., Weighing brain activity with the balance: Angelo

Mosso's original manuscripts come to light. Brain, 2014. 137(Pt 2): p. 621-

33.

9. Friston, K.J., et al., Network discovery with DCM. Neuroimage, 2012.

56(3): p. 1202-21.

10. Stephan, K.E., On the role of general system theory for functional

neuroimaging. J. Anat., 2004. 205: p. 443-470.

11. Singh, K.D., Which "neural activity" do you mean? fMRI, MEG, oscillations

and neurotransmitters. Neuroimage, 2012. 62(2): p. 1121-30.

12. Buzsaki, G. and E.W. Schomburg, What does gamma coherence tell us

about inter-regional neural communication? Nat Neurosci, 2015. 18(4): p.

484-9.

184 13. Buzsaki, G., N. Logothetis, and W. Singer, Scaling brain size, keeping

timing: evolutionary preservation of brain rhythms. Neuron, 2013. 80(3): p.

751-64.

14. Logothetis, N.K., What we can do and what we cannot do with fMRI.

Nature, 2008. 453(7197): p. 869-78.

15. Bressler, S.L. and A.K. Seth, Wiener-Granger causality: a well established

methodology. Neuroimage, 2011. 58(2): p. 323-9.

16. Stephan, K.E., et al., Dynamic causal models of neural system

dynamics:current state and future extensions. J Biosci, 2007. 32(1): p.

129-44.

17. Friston, K.J., L. Harrison, and W. Penny, Dynamic causal modelling.

Neuroimage, 2003. 19(4): p. 1273-302.

18. Stephan, K.E., et al., Bayesian model selection for group studies.

Neuroimage, 2009. 46(4): p. 1004-17.

19. Daunizeau, J., K.E. Stephan, and K.J. Friston, Stochastic dynamic causal

modelling of fMRI data: should we care about neural noise? Neuroimage,

2012. 62(1): p. 464-81.

20. Petersen, S.E. and O. Sporns, Brain Networks and Cognitive

Architectures. Neuron, 2015. 88(1): p. 207-19.

The many levels of causal brain network discovery. Comment on “Foundational perspectives on causality in large-scale brain

networks” by Mannino & Bressler.

Pedro A. Valdes-Sosa

185 Unravelling the dynamically changing networks of the brain is probably the single most important current task for the neurosciences. I wish to commend the authors on this refreshing and provocative paper, which not only recapitulates some of the longstanding philosophical difficulties involved in the analysis of causality in the sciences, but also summarizes current work on statistical methods for determining causal networks in the brain. I fully concur with several of the opinions defended by the authors:

• The most fruitful level of analysis for systems neuroscience is that of

neural masses, each comprising thousands of neurons. This is what is

known as the mesoscopic scale.

• The brain is a complex system with many interacting parts at this scale,

many of which influence each other reciprocally and in a nonlinear

fashion.

• Mesoscopic brain circuit analysis can only be carried out using a concept

of causal relations that is probabilistic rather than deterministic. However,

at this level of organization, quantum effects are probably negligible

placing causal analysis on less problematic terrain.

• Wiener-Granger Causality is a very useful tool for determining probabilistic

causality as has been demonstrated for some time by one of the authors

[1].

Analysis of the foundational problems of causality analysis in large-scale brain networks can be further clarified by considering the following points [2]:

186 1. A causal brain network can be formalized as a directed graph.

Establishing the network is therefore specifying the (directed) links

between neural masses (nodes) which express causal relations [3].

Importantly these links should be mediated by axonal connections with no

intermediate stations. This is a formal definition of the concept of effective

connectivity[4]. Our objective is therefore to establish a graph expressing

effective brain connectivity which can, of course, vary in time.

2. Causal brain networks are always estimated from indirect and noisy

experimental data. Therefore, we must work with state-space models

which not only stipulate the state evolution equation (SEE, the dynamics

and interactions of neural masses) but also the observation equation. The

observation equation should never be ignored. Unfortunately, this is a

common error, as when identification of a causal brain network is

attempted from the scalp EEG—ignoring volume conduction effects and

problems with the reference. Causal analysis in neuroscience is never

complete if both equations are not completely described and accounted

for.

3. The SEE contain the specification of the causal brain network. This

specification can be generic: the linear autoregressive model expressed in

the paper, or nonlinear generalizations [5]. Alternatively they may be

biophysically motivated and expressed as neural mass or neural field

models [6]. Identifying causal brain networks is therefore partly an

exercise in statistical estimation of the parameters of the SEE to identify a

187 given directed graph. See point 6 below to understand why we use the

term “partly”.

4. Realistic neural SEE must always include random elements and therefore

contain stochastic or random differential or integral equations. Limited

information will necessitate the use of Bayesian Statistics with adequate

prior distributions [7], [8].

5. The absence of a link in the graph is the absence of a probabilistic causal

relation between one neural mass and another. This is best expressed as

lack of Wiener-Akaike-Granger-Schweder (WAGS) influence [2], [9]. This

concept encompasses continuous/discrete time, as well as

linear/nonlinear and generic/biophysical SEE and currently has been

formulated with great generality [10]. The absence of WAGS influence

may be tested using statistical tests [11] or, alternatively, the strength of

that influence may be assessed with measures such as [12]. As the

authors point out some of these measures are actually equivalent (WAGS

tests for linear Gaussian autoregressive SEE and Transfer Entropy).

6. Current theory for determining causal graphs [13], [14] has made it clear

that the theoretical underpinnings of probability theory are not sufficient for

establishing causality. In addition, the operations of interventions must be

introduced (the axiom for the “do operator” of Pearl for example). This is

why we state that the statistics of WAGS influence measures only “partly”

contributes to the identification of brain causal networks. There is currently

much work by statisticians to augment the statistical arsenal to include

188 these concepts [3], [15], [16]. An attempt to craft a WAGS measure in this

spirit can be found in [17].

Having stated these points, it is easy to understand that discussions about

“labelled” methods for estimating brain causal networks might benefit from further specification. The authors in the present paper follow others in comparing Wiener-Granger techniques versus DCM, the former posited as a probabilistic method for assessing causality, the latter being deterministic. In the first place both techniques are based on the concept of WAGS influence.

The so called Wiener-Granger methods are statistical tests of the discrete time multivariate linear model but, as stated before this is not the only flavor of WAGS influence analysis. Additionally, there are many different varieties of

DCM, including those that are stochastic [18]–[20]. It might be better to discuss families of techniques on the basis of the six points enumerated above.

The review paper [7] points out several unsolved problems in determining causal brain networks. I would like to end my brief comments by emphasizing a few of them. One is that current biophysical modeling is increasingly recognizing that the SEE describes the spatio-temporal evolution of neural activity distributed over extended spatial manifold (let us denote such a manifold as Β) [21]. Thus instead of trying to identify simple graphs depicting balls with directed arrows between them we are faced with finding meaningful subsets of of the Cartesian product Β×Β. Very little work has been done on

WAGS analysis for this type of structure [22][23]. The problem of spatially

189 extended causation brings up a more important problem relates to the

concepts of “thin” and “thick” causality proposed by Nancy Cartwright [24]. In

her definition placing directing links between neural nodes is “thin” causal

analysis. Actually discovering how the brain transforms maps of neural

activity from one structure to the other would be the “thick” counterpart. The

importance of spatial patterns is underscored by recent interest in multivoxel

pattern analysis [25] which is finding its way into WAGS [26].

The conceptual problems are compounded by our increasing capacity to

generate data. Brain causal analysis is probably the most complex problem in

the analysis of “Big Data”[27]. An example of grappling with the challenge of

“big data” for WAGS has been presented in [28]. That paper argues that

linear WAGS measures are essentially 3 dimensional tensors with emitting

neural structures, receiving structures and temporal transmission as intrinsic

dimensionalities. It is to be expected that such a fusion between novel data

processing techniques and causality analysis will be further stepping stones

in the path traced out by the authors of the paper we which discuss.

References

[1] S. L. Bressler and A. K. Seth, “Wiener-Granger Causality: A well

established methodology.,” Neuroimage, vol. 58, no. 2, pp. 323–329, Mar.

2010.

[2] P. a Valdes-sosa, A. Roebroeck, J. Daunizeau, and K. Friston,

“NeuroImage Effective connectivity : In fl uence , causality and biophysical

190 modeling,” Neuroimage, vol. 58, no. 2, pp. 339–361, 2011.

[3] M. Eichler, “A graphical approach for evaluating effective connectivity in

neural systems.,” Philos. Trans. R. Soc. Lond. B. Biol. Sci., vol. 360, no.

1457, pp. 953–967, May 2005.

[4] K. Friston, “Functional and effective connectivity: a review.,” Brain

Connect., vol. 1, no. 1, 2011.

[5] W. a. Freiwald, P. Valdes, J. Bosch, R. Biscay, J. C. Jimenez, L. M.

Rodriguez, V. Rodriguez, A. K. Kreiter, and W. Singer, “Testing non-

linearity and directedness of interactions between neural groups in the

macaque inferotemporal cortex,” J. Neurosci. Methods, vol. 94, no. 1, pp.

105–119, Dec. 1999.

[6] P. a. Valdes-Sosa, J. M. Sanchez-Bornot, R. C. Sotero, Y. Iturria-Medina,

Y. Aleman-Gomez, J. Bosch-Bayard, F. Carbonell, and T. Ozaki, “Model

driven EEG/fMRI fusion of brain oscillations,” Hum. Brain Mapp., vol. 30,

no. 9, pp. 2701–2721, Sep. 2009.

[7] P. A. Valdés-Sosa, J. M. Sanchez-Bornot, A. Lage-Castellanos, M. Vega-

Hernández, J. Bosch-Bayard, L. Melie-García, E. Canales-Rodríguez, J. M.

Sánchez-Bornot, A. Lage-Castellanos, M. Vega-Hernández, J. Bosch-

Bayard, L. Melie-García, and E. Canales-Rodríguez, “Estimating brain

functional connectivity with sparse multivariate autoregression.,” Philos.

Trans. R. Soc. Lond. B. Biol. Sci., vol. 360, no. 1457, pp. 969–81, May

2005.

191 [8] W. Tang, S. L. Bressler, C. M. Sylvester, G. L. Shulman, and M. Corbetta,

“Measuring granger causality between cortical regions from voxelwise fmRI

BOLD signals with LASSO,” PLoS Comput. Biol., vol. 8, no. 5, p.

e1002513, May 2012.

[9] O. O. Aalen and A. Frigessi, “What can statistics contribute to a causal

understanding?,” Scand. J. Stat., vol. 34, no. 1, pp. 155–168, Mar. 2007.

[10] A. Gégout-Petit and D. Commenges, “A general definition of influence

between stochastic processes,” Lifetime Data Anal., vol. 16, no. 1, pp. 33–

44, Jan. 2010.

[11] J. F. Geweke, “Measures of Conditional Linear Dependence and Feedback

Between Time Series,” J. Am. Stat. Assoc., vol. 79, no. 388, pp. 907–915,

1984.

[12] D. Y. Takahashi, L. a. Baccalá, and K. Sameshima, “Frequency domain

connectivity: An information theoretic perspective,” 2010 Annu. Int. Conf.

IEEE Eng. Med. Biol. Soc. EMBC’10, vol. 2010, no. 2, pp. 1726–1729, Jan.

2010.

[13] J. Pearl, “Causal inference in statistics: An overview,” Stat. Surv., vol. 3,

no. September, pp. 96–146, 2009.

[14] P. Spirtes, “Graphical models, causal inference, and econometric models,”

J. Econ. Methodol., vol. 12, no. March, pp. 3–34, 2005.

[15] M. H. Maathuis and P. Nandy, “A review of some recent advances in

causal inference,” pp. 1–23, Jun. 2015.

192 [16] J. Peters, P. Bühlmann, and N. Meinshausen, “Causal inference using

invariant prediction: identification and confidence intervals,” Jan. 2015

[17] R. D. Pascual-Marqui, R. J. Biscay, J. Bosch-Bayard, D. Lehmann, K.

Kochi, T. Kinoshita, N. Yamada, and N. Sadato, “Assessing direct paths of

intracortical causal information flow of oscillatory activity with the isolated

effective coherence (iCoh).,” Front. Hum. Neurosci., vol. 8, no. June, p.

448, 2014.

[18] J. Daunizeau, L. Lemieux, a E. Vaudano, K. J. Friston, and K. E. Stephan,

“An electrophysiological validation of stochastic DCM for fMRI.,” Front.

Comput. Neurosci., vol. 6, no. January, p. 103, Jan. 2012.

[19] K. J. Friston, A. Bastos, V. Litvak, K. E. Stephan, P. Fries, and R. J. Moran,

“DCM for complex-valued data: Cross-spectra, coherence and phase-

delays.,” Neuroimage, pp. 1–17, Jul. 2011.

[20] K. J. Friston, J. Kahan, B. Biswal, and A. Razi, “A DCM for resting state

fMRI,” Neuroimage, pp. 1–12, 2013.

[21] K. E. Stephan, L. Kasper, L. M. Harrison, J. Daunizeau, H. E. M. den

Ouden, M. Breakspear, and K. J. Friston, “Nonlinear dynamic causal

models for fMRI,” Neuroimage, vol. 42, no. 2, pp. 649–662, Aug. 2008.

[22] P. a Valdes-Sosa, “Spatio-temporal autoregressive models defined over

brain manifolds.,” Neuroinformatics, vol. 2, no. 2, pp. 239–50, Jan. 2004.

[23] P. a. Valdés-Sosa, J. M. Bornot-Sánchez, M. Vega-Hernández, L. Melie-

García, A. A. Lage-Castellanos, E. Canales-Rodríguez, J. M. Sánchez-

193 Bornot, M. Vega-Hernández, L. Melie-García, A. A. Lage-Castellanos, E.

Canales-Rodríguez, M. Valdes-Sosa, and J. M. Bornot-Sánchez, “Granger

Causality on Spatial Manifolds: Applications to Neuroimaging,” in

Handbook of Time Series Analysis: Recent Theoretical Developments and

Applications, Wiley-VCH, 2006, pp. 461–491.

[24] N. Cartwright, Hunting causes and using them: approaches in philosophy

and economics: summary. Cambrdige University Press, 2010.

[25] F. Pereira and M. Botvinick, “Information mapping with pattern classifiers:

A comparative study,” Neuroimage, vol. 56, pp. 476–496, May 2011.

[26] E. Kim, D.-S. Kim, F. Ahmad, and H. Park, “Pattern-based Granger

causality mapping in FMRI.,” Brain Connect., vol. 3, pp. 569–77, 2013.

[27] E. Bareinboim and J. Pearl, “Causal inference from big data: Theoretical

foundations and the data-fusion problem,” Proc. Natl. Acad. Sci., Dec.

2015.

[28] E. Karahan, P. A. Rojas-Lopez, M. L. Bringas-Vega, P. A. Valdes-

Hernandez, and P. A. Valdes-Sosa, “Tensor Analysis and Fusion of

Multimodal Brain Images,” Proc. IEEE, vol. 103, no. 9, pp. 1531–1559,

Sep. 2015.

Comment to Physics of Life Review

Comment on “Foundational perspectives on causality in large-scale brain

194 networks,” by Mannino and Bressler

Stochastic causality, criticality, and non-locality in brain networks

By Robert Kozma and Sanqing Hu

For millennia, causality served as a powerful guiding principle to our

understanding of natural processes, including the functioning of our body, mind,

and brain. The target paper presents an impressive vista of the field of causality

in brain networks, starting from philosophical issues, expanding on neuroscience

effects, and addressing broad engineering and societal aspects as well. The

authors conclude that the concept of stochastic causality is more suited to

characterize the experimentally observed complex dynamical processes in large- scale brain networks, rather than the more traditional view of deterministic causality. We strongly support this conclusion and provide two additional examples that may enhance and complement this review: (i) a generalization of the Wiener-Granger Causality (WGC) to fit better the complexity of brain networks; (ii) employment of criticality as a key concept highly relevant to interpreting causality and non-locality in large-scale brain networks.

Spurious causality is a widely recognized shortcoming of many practical algorithms analyzing causality. This problem can be significantly mitigated by the probabilistic approach to WGC, as proposed by Mannino and Bressler. They also point out the dominance of bidirectional pathways between cortical areas and the role of mutual causality. To address such issues, a new measure of causality

(NC) from Y to X has been proposed by substituting the original definition of

195 WGC based on the backwards recursive estimation of the auto regressive model parameters with a more comprehensive evaluation, which describes the proportion that Y occupies among all possible contributions to X. NC is a natural extension of WGC. NC has been shown to be less susceptible to spurious causal effects, while it is very efficient in revealing biologically relevant causal influences

[3]. Recent successful applications of NC include the analysis of human EEG in motor imagery for efficient brain computer interfaces [4], and the identification of evolving mutual flows of causal effects in the auditory cortex of gerbils during category learning and strategy formation [10, 5]. We propose NC to be added to the repertoire of possible tools of causality studies.

Criticality is a fundamental property of networks and graphs and it is essential when describing the interrelationship between the network structure and its dynamics. In the past decade, criticality gained popularity in characterizing neurodynamics in vitro and in vivo [1]. Arguably, brains can be described as open thermodynamics systems operating as immense neural networks, which are maintained near criticality through myriads of mutual feedback pathways among excitatory and inhibitory neural masses. The mathematical tools of random graph theory (RGT) and neuropercolation provide suitable tools to describe brains as critical systems [7, 8].

In the context of the present commentary, we argue that criticality has important consequences on the interpretation of causality in brain networks. Experiments

196 using advanced brain imaging techniques, including ECoG, fMRI, MEG, and

EEG indicate the presence of frequent transitions between coherent and non- coherent phases [2, 6]. The transition from non-coherent to coherent phases has been described as condensation, while the opposite process as evaporation, which can be interpreted by invoking the concepts of Bose-Einstein condensation and entanglement in Quantum Field Theory (QFT) [1]. Here we emphasize that

RGT provides a complementary approach to QFT in describing criticality and phase transitions in the cortical neuropil medium.

Non-locality is naturally embodied in the anatomy of the cortical neuropil in the following sense. Neurons are dominantly local entities as the overwhelming majority of synaptic connections are limited to the local neighborhood (dendritic arbor) of the neurons. However, long axonal connections reach cortical areas far away from the cell body. RGT describes non-locality though the relative role of long axons as a control parameter of cortical phase transitions. Additional critical parameters of neuropercolation are the relative size of inhibitory and excitatory populations, the threshold values characterizing the sensitivity of neural populations to input effects, and the initial level of background activity [9].

Neuropercolation incorporates Freeman’s principles of neurodynamics through a hierarchy of phase transitions [2, 8], including phase transitions: (i) from zero- point convergence to sustained background activity; (ii) from sustained non-zero activity to sustained narrow-band oscillations; (iii) from narrow-band oscillations to broad- band oscillations (chaos); and (iv) intermittent synchronization- desynchronization transitions as part of the intentional action-perception cycle.

197 The action-perception cycle is embodied in the repetitive sequence of phase transitions (condensations) at theta rates, which is especially prominent in olfaction, but its presence is hypothesized in other sensory cortices as well. It would be very interesting to see how the repetitive sequence of phase transitions in the action-perception cycle may supersede the long-discredited regularity theory framework of causality, by replacing the rigid/deterministic periodicity assumption of causality with a living and pulsing, stochastic/chaotic synchronization- desynchronization paradigm. Under the highly complex conditions of spatio- temporal neurodynamics in large-scale brain networks, the concept of circular causality may be more suitable than linear causality. Critical phase transitions can provide the inherent mechanism maintaining the circle of causality during the cognitive process via intermittent large-scale coherence events across the hemisphere as neural correlates of higher cognition and consciousness. New experimental and theoretical results are likely to lead to further breakthroughs in studying causality in brain networks in the years ahead.

Acknowledgments: This work has been supported in part by NSF CRCNS Grant

DMS- 13-11165 on “US-German Collaboration on Strategy Change in Cognitive

Biological and Technical Systems.”

References:

1. Capolupo, A., W.J. Freeman, G. Vitiello (2013) “Dissipation of ‘dark energy’ by

cortex in knowledge retrieval,” Phys Life Rev, 10(1), 85–94. 198 2. Freeman, W.J., R. Quian Quiroga (2013) “Imaging Brain Function with EEG:

Advanced Temporal and Spatial Imaging of Electroencephalographic

Signals,” Springer, New York.

3. Hu, S., G. Dai, G. Worrell, Q. Dai, and H. Liang (2011) “Causality analysis of

neural connectivity: Critical examination of existing methods and advances

of new methods”, IEEE Trans Neural Networks, 22(6), 829–844.

4. Hu, S., Wang, H., Zhang, J., Kong, W., Cao, Y., Kozma, R. (2015)

“Comparison Analysis: Granger Causality and New Causality and Their

Applications to Motor Imagery,” IEEE Trans. Neur. Netw. & Learning Syst.

(in press).

5. Hu, S., Sokolov, Y., Kozma, R., Ohl, F., Schultz, A., Wanger, T. (2015)

“Causality Flow in Auditory Cortex during Gerbil Category Learning,” (in

progress).

6. Kozma, R., J.J.J. Davis, W.J. Freeman (2012) “Synchronization of De-

Synchronization Events Demonstrate Large-Scale Cortical Singularities As

Hallmarks of Higher Cognitive Activity,” J. Neuroscience and Neuro-

Engineering, 1(1), 13-23.

7. Kozma, R., M. Puljic, W.J. Freeman (2014) “Thermodynamic model of criticality

in the cortex based on EEG/ECoG,” D Plenz, E. Niebur , Eds., Criticality in

Neural Systems, Wiley-VCH, Weinheim, Germany, pp. 153–176.

199 8. Kozma, R., W.J. Freeman (2015) “Cognitive Phase Transitions in the Cerebral

Cortex - Enhancing the Neuron Doctrine by Modeling Neural Fields, “ Springer

Verlag, Heidelberg, ISBN 978-3-319-24404-4.

9. Kozma, R., Puljic, M. (2015). Random graph theory and neuropercolation for

modeling brain oscillations at criticality. Current Opinion in Neurobiology, 31,

181-188.

10.Ohl, F. (2015) “On the Creation of Meaning in the Brain—Cortical

Neurodynamics During Category Learning,” In: “Cognitive Phase Transitions in

the Cerebral Cortex - Enhancing the Neuron Doctrine by Modeling Neural Fields,

“ R. Kozma, W.J. Freeman, Springer Verlag, Heidelberg, ISBN 978-3-319-24404-

4, pp. 147-159.

Causal influence in neural systems: reconciling mechanistic-reductionist and

statistical perspectives Comment on “Foundational perspectives on Causality in

Large-Scale Brain Networks” by Mannino & Bressler

John D. Griffiths

The modern understanding of the brain as a large, complex network of interacting elements is a natural consequence of the Neuron Doctrine [1,2] that has been bolstered in recent years by the tools and concepts of connectomics. In this abstracted, network-centric view, the essence of neural and cognitive function derives from the flows between network elements of activity and

200 information - or, more generally, causal influence. The appropriate characterization of causality in neural systems, therefore, is a question at the very heart of systems neuroscience.

Accordingly, the past two decades has seen a substantial amount of neuroscientific research concerned with the development and application of analysis methodologies for estimating causal relations in neural systems from various kinds of neurobiological recordings. These may be grouped into two broad classes: phenomenological approaches and physiological approaches1.

Phenomenological approaches, such as time- and frequency-domain Granger causality (GC; [3–5]); directed transfer function [6], partial directed coherence [7], structural equation modelling [8, 9], and Bayesian networks [10, 11], seek to describe relationships between observed neurobiological variables directly, generally using off-the-shelf algorithms developed in the fields of machine learning and applied statistics. Physiological approaches, in contrast, begin with a (typically relatively coarse-grained) biophysical generative model of the system in question, and seek to estimate parameters of that model using the measured data [12–20]. Causal influence is not so much the raison d’être of these models, but rather, as noted by Friston et al. [21], it is inherent in their mathematical formulation: fluctuations in (hidden) state variables such as the mean firing rate of a neuronal subpopulation induce changes in the state variables of other brain regions such as average post-synaptic membrane potentials, as specified ultimately by the model's equations of motion. The strengths of these causal influences between neuronal subpopulations are determined by synaptic gain or

201 coupling strength parameters, and are typically amongst the key parameters of interest that are optimized when fitting the model to experimental data.

The proposal by Mannino & Bressler (this issue) to regard causal relationships in large scale brain networks within the framework of probabilistic causality (PC) resonates well with the statistical philosophies motivating the various phenomenological approaches to causal connectivity outlined above. Causal weights - in terms of e.g. (conditional) probabilities, regression coefficients, and derivative quantities such as GC, all lend themselves naturally (and indeed, often preferentially) to probabilistic interpretations. It is less clear, however, how and whether physiologically-based approaches to modelling causal influences in brain networks might fit into the PC schema. In these models, connectivity strengths are generally taken to represent underlying physiological quantities such as synaptic densities, which connect the source and target populations through a mechanistic and (in principle) reductionist, rather than a statistical, relationship2. Indeed, it has been argued [21] that the key advantage and difference of physiological approaches over phenomenological approaches is that the former may be used to test mechanistic hypotheses, possibly arrived at through initial exploratory phenomenological analyses.

It should be noted that, whilst the emphasis thus far has been primarily from an empirical, data-analytic perspective, the questions raised go much deeper. This is because the models that inform the physiological approaches discussed above are not only statistical data analysis tools, but also constitute some of the key

202 building blocks in several mathematical theories of brain function [22–24] . A truly foundational concept of

1 This distinction is one that is often made in comparisons of so-called 'functional' and 'effective' connectivity [15, 26], where the latter is often equated with one specific family of techniques, dynamic causal modelling (DCM; [14]). The terminology used here is preferred as it is both more descriptive and more generic.

2 This is essentially a neuro version of the classical Russellian objection to causal interpretations in physics. causal influence in neural systems should apply equally to both theoretical structures and to data analysis procedures.

There are a number of ways in which Mannino & Bressler might address this challenge to their PC thesis. One may be to simply interpret connection strengths in physiological models as probabilistic by fiat, perhaps by appealing to the argument that the underlying phenomena are themselves inherently probabilistic, even if our necessarily over-simplified mathematical representations of them are not. This of course sidesteps the issue entirely, and would only really be adequate if the problem is not considered to be a serious one at either an applied or foundational level. More compelling would be to develop a formulation of PC that incorporated more directly the mechanistic causal relationships inherent in

203 the physiological approach3. A third alternative could be to abstain entirely from

PC interpretations of physiological model parameters, and highlight instead the difference between mechanistic-physical causal relationships and probabilistic causal relationships. The resultant plurality of causal influence ontologies may be less elegant than a single all-encompassing framework, but is not without precedent (see e.g. Cartwright [25]). Seth and colleagues [5] also argue for a distinction between the causal interpretations offered by one family of physiological models (DCM) and GC, suggesting that the two approaches “ask and answer fundamentally different questions, so that choosing one or the other

(or both) depends on whether one is interested in describing the data in terms of information flow (GC) or exposing the underlying physical-causal mechanism

(DCM).”

To conclude, an admirable feature of the target article is its synthesis of traditional and modern philosophical treatments of causality, together with current statistical and neurobiological theory and practice. Like most interesting ideas, the authors’ characterization of interactions in large-scale brain networks within the framework of PC raises many questions, but with further refinement promises to provide a compelling foundational perspective on the functional organization of neural systems.

References

8. [1] T. H. Bullock, M. V. L. Bennett, D. Johnston, R. Josephson, E. Marder, and

R. D. Fields, “Neuroscience. The neuron doctrine, redux.,” Science (80-.

204 )., vol. 310, no. 5749, pp. 791–793, 2005.

9. [2] G. M. Shepherd, Foundations of the Neuron Doctrine. New York: Oxford

University Press, 1991.

10. [3] M. Ding, S. L. Bressler, W. Yang, and H. Liang, “Short-window

spectral analysis of cortical event-related potentials by adaptive

multivariate autoregressive modeling: data preprocessing, model

validation, and variability assessment.,” Biol. Cybern., vol. 83, no. 1, pp.

35–45, 2000.

11. [4] A. Brovelli, M. Ding, A. Ledberg, Y. Chen, R. Nakamura, and S. L.

Bressler, “Beta oscillations in a large-scale sensorimotor cortical network:

Directional influences revealed by Granger causality,” Proc. Natl. Acad.

Sci. U. S. A., vol. 101, no. 26, pp. 9849–9854, 2004.

12. [5] A. K. Seth, A. B. Barrett, and L. Barnett, “Granger Causality Analysis

in Neuroscience and Neuroimaging,” J. Neurosci., vol. 35, no. 8, pp.

3293–3297, 2015.

13. [6] M. J. Kamiński and K. J. Blinowska, “A new method of the description

of the information flow in the brain structures.,” Biol. Cybern., vol. 65, no.

3, pp. 203–210, 1991.

14. [7] L. a Baccalá and K. Sameshima, “Partial directed coherence: a new

concept in neural structure determination.,” Biol. Cybern., vol. 84, no. 6,

pp. 463–474, 2001. 205 3 It may be noted in connection with this that one of Salmon's chief concerns regarding the PC models of Reichenbach, Good, and Suppes, is their “attempt to carry out the construction of causal relations on the basis of probabilistic relations among discrete events, without taking account of the physical connections among them’” [27; pg. 66]

[8] A. R. McIntosh and F. Gonzalez-Lima, “Structural equation modeling

and its application to network analysis in functional brain imaging,” Hbm,

vol. 2, no. 1–2, pp. 2–22, 1994.

[9] W. D. Penny, K. E. Stephan, a. Mechelli, and K. J. Friston, “Modelling

functional integration: a comparison of structural equation and dynamic

causal models,” Neuroimage, vol. 23, pp. S264–S274, 2004.

[10] S. M. Smith, K. L. Miller, G. Salimi-Khorshidi, M. Webster, C. F.

Beckmann, T. E. Nichols, J. D. Ramsey, and M. W. Woolrich, “Network

modelling methods for FMRI,” Neuroimage, vol. 54, no. 2, pp. 875–891,

2011.

[11] J. D. Ramsey, S. J. Hanson, C. Hanson, Y. O. Halchenko, R. a.

Poldrack, and C. Glymour, “Six problems for causal inference from fMRI,”

Neuroimage, vol. 49, no. 2, pp. 1545–1558, 2010.

[12] P. A. Robinson, C. J. Rennie, D. L. Rowe, and S. C. O’Connor,

“Estimation of multiscale neurophysiologic parameters by

206 electroencephalographic means,” Hum. Brain Mapp., vol. 23, no. 1, pp.

53–72, 2004.

[13] C. C. Kerr, C. J. Rennie, and P. A. Robinson, “Physiology-based modeling of cortical auditory evoked potentials,” Biol. Cybern., vol. 98, no.

2, pp. 171–184, 2008.

[14] K. J. Friston, L. Harrison, and W. Penny, “Dynamic causal modelling,”

Neuroimage, vol. 19, no. 4, pp. 1273–1302, 2003.

[15] O. David, S. J. Kiebel, L. M. Harrison, J. Mattout, J. M. Kilner, and K.

J. Friston, “Dynamic causal modeling of evoked responses in EEG and

MEG.,” Neuroimage, vol. 30, no. 4, pp. 1255–72, May 2006.

[16] D. A. Pinotsis, R. J. Moran, and K. J. Friston, “Dynamic causal modeling with neural fields.,” Neuroimage, vol. 59, no. 2, pp. 1261–74,

Jan. 2012.

[17] R. J. Moran, S. J. Kiebel, K. E. Stephan, R. B. Reilly, J. Daunizeau, and K. J. Friston, “A neural mass model of spectral responses in electrophysiology,” Neuroimage, vol. 37, no. 3, pp. 706–720, 2007.

[18] P. A. Valdes, J. C. Jimenez, J. Riera, R. Biscay, and T. Ozaki,

“Nonlinear EEG analysis based on a neural mass model.,” Biol. Cybern., vol. 81, no. 5–6, pp. 415–424, 1999.

[19] P. a Valdes-Sosa, J. M. Sanchez-Bornot, R. C. Sotero, Y. Iturria-

207 Medina, Y. Aleman-Gomez, J. Bosch-Bayard, F. Carbonell, and T. Ozaki,

“Model driven EEG/fMRI fusion of brain oscillations.,” Hum. Brain Mapp., vol. 30, no. 9, pp. 2701–21, Sep. 2009.

[20] P. Aram, D. R. Freestone, M. J. Cook, V. Kadirkamanathan, and D.

B. Grayden, “Model-based estimation of intra- cortical connectivity using electrophysiological data,” Neuroimage, vol. 118, pp. 563–575, 2015.

[21] K. Friston, R. Moran, and A. K. Seth, “Analysing connectivity with

Granger causality and dynamic causal modelling.,” Curr. Opin. Neurobiol., vol. 23, no. 2, pp. 172–8, Apr. 2013.

[22] G. Deco, V. K. Jirsa, P. a. Robinson, M. Breakspear, and K. Friston,

“The Dynamic Brain: From Spiking Neurons to Neural Masses and

Cortical Fields,” PLoS Comput. Biol., vol. 4, no. 8, p. e1000092, 2008.

[23] P. L. Nunez, “Toward a quantitative description of large-scale neocortical dynamic function and EEG.,” Behav. Brain Sci., vol. 23, pp.

371–398;, 2000.

[24] P. A. Robinson, C. J. Rennie, D. L. Rowe, S. C. O’Connor, and E.

Gordon, “Multiscale brain modelling,” Philos. Trans. R. Soc. B Biol. Sci., vol. 360, no. 1457, pp. 1043–1050, 2005.

[25] N. Cartwright, “Where is the theory in our ‘theories’ of causality?,” J.

Philos., vol. 103, no. 2, pp. 55–66, 2006.

208 [26] K. Friston, “Causal Modelling and Brain Connectivity in Functional

Magnetic Resonance Imaging,” PLoS Biol., vol. 7, no. 2, p. e33, 2009.

[27] W. C. Salmon, “Probabilistc Causality,” Pacific Philos. Quarterley,

vol. 61, pp. 50–74, 1980.

Reply to Comments on “Foundational Perspectives on Causality in Large-Scale

Brain Networks”

Michael Mannino, Steven L. Bressler

We thank all the commentators on our paper, whose expertise and insight have proved to be invaluable for refining our thoughts, and for future considerations of the concepts discussed in our paper. Overall, the comments represent a variety of viewpoints, each coming from an individual niche of knowledge and presented in a constructive manner. Moreover, we thank the editor of this journal for allowing us the chance to respond publically to (public) comments on our ideas.

Our source paper attempts to provide a foundational framework for causality in the brain, considered as a complex system. The term causality (or causation) is widely used in cognitive and computational neuroscience, and such a deeply-rooted philosophical term requires a full, rigorous, conceptual analysis.

As Griffiths [4] correctly states, “The appropriate characterization of causality in 209 neural systems, therefore, is a question at the very heart of systems neuroscience.” Moreover, in terms of large-scale neurocognitive networks, given that nodes or regions may influence one another in a variety of ways, we find it appropriate to ask, what does “causal influence” mean in the context? Thus, our paper tries to define the epistemic limitations and ontological suppositions about causality in the brain. Finally, in our paper, we wrestle with an important question: is the human brain a deterministic or nondeterministic system?

First, Pessoa and Najafi [1] take an interesting, broad-minded approach to causality in the brain, considered as a complex system. We have no argument with their preferred term “complex system causality” to replace our “probabilistic causality (PC)”, as long as it is clear that complex systems causality is, in fact, probabilistic. Although they confirm our overall claim that classical conceptions of causality fail when considering the brain’s structural and functional networks,

Pessoa and Najafi are largely concerned with disentangling the contributions of different systems, such as those for cognition and emotion. We agree that the issue of decomposability is essential for understanding brain function. In many complex systems, a large number of parts interact in ways that do not allow for any one part to be analyzed in isolation, as in their coupled billiard ball example.

As they state, “…simple ways of reasoning about causation are inadequate when unravelling the workings of a complex system such as the brain.” They go on to suggest that removing the focus on causation as explanation in neuroscience is the appropriate tactic in this case. In the context of dynamic brain networks, their

210 suggestion is to offer a mathematical formalism that describes the multivariate covariance structure of brain data, of which one particular example is the

Bayesian Dynamic Covariance Model. This model considers the covariance between pairs of brain regions, and includes a previously missing component, a time-varying matrix, which allows for a temporal regression. We agree that this model may give considerable insight into brain networks. However, it does not explicitly address the critical issue of causal influence in the brain. We question the reliance on the covariance (or correlation) structure of neural data, and propose that the causal influence structure of the data is more informative and neurobiologically realistic.

Tang [2] underscores our point that Wiener-Granger Causality (WGC), and similar methods, work by quantifying PC, which offers a theoretical framework for explaining why these methods are successful at measuring functional connectivity. Furthermore, along the lines of Barnett and Seth [3], Tang suggests that state space modeling may be a useful addition to WGC for characterizing PC in hidden states. This is an interesting suggestion that nicely extends the notion of PC to hidden states.

Griffiths [4] makes some astute (and well-received) observations that justify our project, as well as a challenge to our thesis. First, Griffiths points out that our proposal “resonates well” with the probabilistic notion of causality, especially with what he calls phenomenological approaches, which rely on

211 directly observed variables. He stresses, however, that these approaches are inherently different from others, such as DCM, which he labels physiological.

Physiological approaches are not causality-centric in that they do not infer causal relations, but rather assume them. Griffiths importantly proposes that the PC formalism be expanded to include the “mechanistic” causal relations implicit in the physiological approach. We agree that such an expansion would be an interesting development, and could be theoretically possible. In fact, we consider the field of causal modeling, particularly the work of Judea Pearl [5, 6], as providing this kind of development. Briefly, we follow Hitchcock [7] in suggesting that Pearl’s Bayesian approach to causal modeling can be used to infer underlying causal structure by “using information about probabilistic correlations.”

Although we cannot provide an in-depth analysis of the causal modeling approach in this short Reply, we note that Pearl’s a structural causal model consists of a set of structural equations, where each equation represents a mechanism or “law” working in the world. Formally, it consists of an ordered triple set , where U is a set of exogenous variables, V is set of endogenous variables, and R is the set of structural equations which govern the relationships of these variables. Causal graphs or diagrams, including directed acyclic graphs and casual loop diagrams, usually accompany the model to visually represent these relationships. We believe that, in the future, Pearl’s approach may offer the appropriate expansion that Griffiths rightly asks for.

Already, White et al, [8] have linked Pearl’s work to WGC.

212 Kozma and Hu [9] take a different approach in their commentary. They state that they strongly support our claims, and offer two additional examples which give further support to our notion of causality in the brain. First, they acknowledge our claim that PC is better able to handle issues of spurious causality when analyzing neurobiological data, and they extend our introduction of PC by introducing what they refer to as New Causality (NC) [10]. First, they claim that this new concept of causality is “less susceptible to spurious causal effects” and has already been applied to human EEG data. We agree that NC may indeed be a practical extension of PC. Second, Kozma and Hu consider the mathematical theory of neuropercolation to support PC, because it describes the brain in terms of self-organizing criticality, in which neural networks follow phase transitions from coherent to non-coherent phases. Although neurons are overwhelmingly local entities, they can also exert non-local effects due to axonal projection. Interestingly, and partly because of these phase transitions, neuropercolation “incorporates Freeman’s principles of neurodynamics”. We agree that, in the context of complex systems, Freeman’s neurodynamical theory fits well with our notion of PC, and phase transitions are important for understanding brain network function. We speculate that neural phase transitions represent a change from one coordination state to another (Bressler & Kelso

2015), and that probabilistic causal relations are exerted as brain areas interact within a coordination state. Finally, Kozma and Hu suggest that the concept of circular causality is more suitable than linear causality in large-scale brain networks. We agree that a substitute for linear causality is called for in complex

213 systems. However, we prefer the term mutual causality to signify that interacting neuronal populations influence one another at the same time, instead of sequentially in a cyclic fashion.

Zhang et al. [12] make several intriguing and somewhat controversial points, which we view as requiring further inquiry, but also nicely complementing our claims. First, they agree with our overall project: to provide a conceptual examination of causal influence in the brain. However, they point out that we assert two possibly contradictory claims; namely, that causality in the brain is probabilistic, and that we are undecided about whether brain function is inherently deterministic or stochastic. They distinguish between how the brain is, and how it is measured: an important point highlighting possible epistemic limitations. However, we argue that our two claims are entirely consistent because they are pitched at different levels of organization. We claim that causality in the brain is probabilistic at the macroscopic (large) scale, but we are agnostic on the question about how the brain operates at the microscopic scale.

Thus, even if it turns out that microscopic neurophysiological mechanisms are deterministic, we hold that macroscopic interactions between brain areas are probabilistic. We assert that the brain can operate simultaneously as both a deterministic and stochastic system, at different spatiotemporal scales. This is possible in complex systems because of the phenomenon of emergence [13, 14].

Consider the following argument by analogy, suggesting that systems can operate both deterministically and probabilistically at different scales: when a

214 keystroke is made in a Word document on a von Neumann computer, say “E” is pressed, ceteris paribus, an E will appear in the document, even if it is possible for the electrons moving in the CPU to be in a superposition of spin states before measurement, and to exist in a cloud of Born probability. Moreover, concerning this quantum issue, as we point out in our target article, Born’s rule only offers the probability of measurement. David Bohm interprets quantum uncertainty as demonstrating that nondeterminism is inherent in the very structure of matter.

Thus, depending on one’s interpretation of quantum mechanics, of which several exist, determinism may exist macroscopically while nondeterminism rules at the microscopic level. Likewise, depending on one’s interpretation of probability, of which there are several, the brain may be microscopically deterministic while macroscopically stochastic.

Zhang et al. also state that we invalidly draw a conclusion that mutual causality makes causal determinism impossible. They give an example of the

Earth-Moon gravitational system, which is, by all accounts, a deterministic system, yet exhibiting a mutual causal attraction. We would argue that the problem with this example is that Zhang et al. are “staying within a classical world.” By contrast, as complexity science has shown by extending Newtonian physics to reveal system properties that cannot be treated classically, the brain, as a complex system, exhibits properties that may indeed make determinism impossible at some levels.

Finally, Zhang et al. contend that, in considering the high-dimensionality of the brain, there is evidence that some dynamics of the brain can be described by

215 low-dimensional collective dynamics. We wholeheartedly agree with this claim. In fact, we offer the example of work by Stefanescu and Jirsa [15], who have reduced the dimensionality of a neural population model by mode decomposition techniques. Zhang et al. further contend that there can be emergence of dynamics at one level in the brain from dynamics at a lower level, and the higher- level dynamics can enslave the lower-level dynamics. Again, we concur with this point, and agree that enslaving may allow a low-dimensional reduction of high- dimensional large-scale brain network dynamics.

Dhamala [16] largely agrees with our main claims that: 1) an appropriate foundation for the use of causality in the brain as a complex system is not only timely, but necessary; and 2) that that foundation ought to be considered as probabilistic. Moreover, he rightly points out that DCM might be expanded to include stochasticity.

Diwadkar [17] eloquently writes that our paper alludes to, what he calls “an uncomfortable ontological truth”, and that “understanding causal interactions between brain structures is a non-trivial problem that cannot merely be surmounted by mountains of data.” We wholeheartedly agree. In our article, we proffer a solution, for which Diwadkar offers an insightful appraisal. Interestingly, he points out that one of the most important aspects of modern theoretical neuroscience is the attempt to uncover the brain’s hidden states which produce observed signals. These hidden states, we would argue, are governed by laws of causality which do not conform to classical forms. However, we admit that our conceptual solution, PC, still does not solve all the methodological problems of

216 uncovering these hidden states. Diwadkar takes our thesis further than we do by pointing out that “…the relationship between observed brain signals and the presumed ‘hidden’ states that give rise to them may be close to indeterminate.”

Part of this indeterminacy is due to limitations of current neuroimaging modalities, some of which we consider in the target article. Thus, we agree with Diwadkar that the fundamental problem of inferring causality in large scale networks has both methodological and epistemological sources. We both believe that the best approach to understanding causality in the brain is to frame the problem in terms of those analytical methods that are consistent with PC, rather than those that may be deemed to be “correct”.

Valdes-Sosa [18] presents an informative commentary, highlighting the difficulty and the paramount importance of identifying causal brain networks. As he observes, these networks dynamically change their functional connectivity, further emphasizing the need to precisely define causality in the brain; and so it is imperative that a foundational analysis of causality be provided. Valdes-Sosa artfully clarifies the claims in the target paper by introducing six key points, to all of which we give our assent. Among these, he points out that causal brain networks can be modeled by directed graphs, comprised of links, representing axonal pathways and synaptic connections, which connect nodes, representing neural masses. Since these networks can only be “estimated from indirect and noisy experimental data” it is necessary to construct a state-space model, which specifies both a state evolution equation (SEE), expressing the dynamical, causal interactions between the nodes, and an observation equation, expressing

217 the way that the observed data are derived from their sources. The state-space model necessitates describing causal influence by Wiener-Akaike-Granger-

Schweder (WAGS) causality. In Valdes-Sosa’s framework, probabilistic causality is thus determined by a WAGS test. Valdes-Sosa emphasizes that a major problem in determining causal brain networks is that they are spatially extended: since the SEE represents the spatial and temporal evolution of the casual network, any appropriate definition of causality must be accommodate both dimensions. The problem of spatially extended causality leads Valdes-Sosa to suggest a further clarification that we see as potentially important. He applies

Nancy Cartwright’s distinction between “thin” and “thick” causal analyses to brain networks. Here, a “thin” analysis simply determines the links between brain network nodes, whereas a “thick” analysis identifies the actual transformations of neural activity represented by the links. We agree with Valdes-Sosa’s implication that an understanding of the transformations that transpire between brain areas lies at the heart of the brain research enterprise.

In conclusion, all the commentators largely agree that our paper fulfills a necessary gap in the literature on the issue of causality in theoretical and computational neuroscience, and although problems still remain, PC is a useful framework for understanding the causal dynamics of large-scale brain networks.

References

[1] Pessoa, L., & Najafi, M. (2015). Complex-system causality in large-scale brain networks: Comment on “Foundational perspectives on causality in large-scale

218 brain networks” by M. Mannino and SL Bressler. Physics of Life Review, this issue.

[2] Tang, W. (2015). Wiener–Granger causality for effective connectivity in the hidden states: Indication from probabilistic causality: Comment on “Foundational perspectives on causality in large-scale brain networks” by M. Mannino and SL

Bressler. Physics of Life Review, this issue.

[3] Barnett L, Seth AK. (2015). Granger causality for state-space models.

Physical Review E, 2015; 91(4): 040101.

[4] Griffiths, J. (2015). Causal influence in neural systems: reconciling mechanistic-reductionist and statistical perspectives: Comment on “Foundational perspectives on causality in large-scale brain networks” by M. Mannino and SL

Bressler. Physics of Life Review, this issue.

[5] Pearl, J. (2000). Causality: Models, reasoning, and inference: Cambridge

University Press.

[6] Halpern, J. Y., & Pearl, J. (2005). Causes and explanations: A structural- model approach. Part I: Causes. The British journal for the philosophy of science,56(4), 843-887.

[7] Hitchcock, Christopher, "Probabilistic Causation", The Stanford Encyclopedia of Philosophy (Winter 2012 Edition), Edward N. Zalta (ed.), URL =

[8] White, H., Chalak, K., & Lu, X. (2011). Linking Granger Causality and the

Pearl Causal Model with Settable Systems. In NIPS Mini-Symposium on

Causality in Time Series (pp. 1-29).

219 [9] Kozma, R., & Hu, S. (2015). Stochastic causality, criticality, and non-locality in brain networks: Comment on “Foundational perspectives on causality in large- scale brain networks” by M. Mannino and SL Bressler. Physics of Life Review, this issue.

[10] Hu, S., G. Dai, G. Worrell, Q. Dai, and H. Liang (2011) “Causality analysis of neural connectivity: Critical examination of existing methods and advances of new methods”, IEEE Trans Neural Networks, 22(6), 829–844.

[11] Bressler and Kelso

[12] Zhang, M., Nordham, C., & Kelso, J. A. (2015). Deterministic versus probabilistic causality in the brain: To cut or not to cut: Comment on"

Foundational perspectives on causality in large-scale brain networks" by M.

Mannino and SL Bressler. Physics of Life Review, this issue.

[13] List, C., & Pivato, M. (2015). Emergent chance. Philosophical

Review, 124(1), 119-152.

[14] List, C., & Pivato, M. (2015). Dynamic and stochastic systems as a framework for metaphysics and the philosophy of science. arXiv preprint arXiv:1508.04195.

[15] Stefanescu, R. A., & Jirsa, V. K. (2008). A low dimensional description of globally coupled heterogeneous neural networks of excitatory and inhibitory neurons. PLoS Computational. Biology, 4(11), e1000219.

[16] Dhamala, M. (2015). What is the nature of causality in the brain? - Inherently probabilistic. Comment on “Foundational Perspectives on Causality in Large-

220 Scale Brain Networks” by M. Mannino & SL. Bressler. Physics of Life Review, this issue.

[17] Diwadkar, Vaibhav. (2015). Critical perspectives on causality and inference in brain networks: Allusions, illusions, solutions? Comment on: “Foundational

Perspectives on Causality in Large-Scale Brain Networks.” Physics of Life

Review, this issue.

[18] Valdes-Sosa, Pedro. (2015). The many levels of causal brain network discovery.

Comment on “Foundational perspectives on causality in large-scale brain networks” by Mannino & Bressler. Physics of Life Review, this issue.

221

REFERENCES

Assisi, C. G., Jirsa, V. K., & Kelso, J. S. (2005). Synchrony and clustering in heterogeneous networks with global coupling and parameter dispersion. Physical review letters, 94(1), 018106.

Baccala LA, Sameshima K (2001). Partial directed coherence: a new concept in neural structure determination. Biological Cybernetics, 84:463-74.

Bajaj, S., Adhikari, B. M., Friston, K. J., & Dhamala, M. (2016). Bridging the Gap: Dynamic Causal Modeling and Granger Causality Analysis of Resting State Functional Magnetic Resonance Imaging. Brain connectivity, 6(8), 652-661.

Bak P, Tang C, Wiesenfeld K (1988) Self-organized criticality. Physical Review A 38:364.

Barnett L, Barrett AB, Seth AK (2009). Granger causality and transfer entropy are equivalent for Gaussian variables. Physical Review Letters, 103:238701.

Barnett, L., Barrett, A. B., & Seth, A. K. (2009). Granger causality and transfer entropy are equivalent for Gaussian variables. Physical review letters, 103(23), 238701.

Battaglia, D., Witt, A., Wolf, F., & Geisel, T. (2012). Dynamic effective connectivity of inter-areal brain circuits. PLoS computational biology, 8(3), e1002438.

Becker, R., Knock, S., Ritter, P., & Jirsa, V. (2015). Relating alpha power and phase to population firing and hemodynamic activity using a thalamo-cortical neural mass model. PLoS computational biology, 11(9), e1004352.

Bell, J. S. (1966). On the problem of hidden variables in quantum mechanics.Reviews of Modern Physics, 38(3), 447-452.

Bernasconi, C. & Konig, P. (1999). On the directionality of cortical interactions studied by structural analysis of electrophysiological recordings. Biological Cybernetics, 81, 199-210.

Bishop, R. C. (2003). On separating predictability and determinism. Erkenntnis, 58(2), 169-188.

Bohm, D. (2012). Quantum theory. Courier Corporation.

222 Bohr, N. (1948). On the notions of causality and complementarity. Dialectica,2(3‐ 4), 312-319. Bossomaier, T., Barnett, L., Harré, M., & Lizier, J. T. (2016). An introduction to transfer entropy: information flow in complex systems. Springer.

Boudjellaba, H., Dufour, J. & Roy, R. (1992). Testing causality between two vectors in multivariate autoregressive moving average models. Journal of the American Statistical Association 87, 1082-90.

Breakspear, M., & Jirsa, V. K. (2007). Neuronal dynamics and brain connectivity. In Handbook of brain connectivity (pp. 3-64). Springer, Berlin, Heidelberg.

Breitung, J. & Candelon, B. (2006). Testing for short- and long-run causality: A frequency domain approach. Journal of Econometrics, 132, 363-378.

Bressler S.L. (1995). Large-scale cortical networks and cognition. Brain Research Reviews, 20:288-304.

Bressler S.L. (2002). Understanding cognition through large-scale cortical networks. Current Directions in Psychological Science, 11:58-61.

Bressler S.L. (2003). Cortical coordination dynamics and the disorganization syndrome in schizophrenia. Neuropsychopharmocology, 28:S35-S39.

Bressler S.L. (2004). Inferential constraint sets in the organization of visual expectation. Neuroinformatics, 2:227-238.

Bressler S.L. (2015). Interareal neocortical actions by neuronal populations. In: Kozma R., Freeman W.J. (eds.), Collective Brain Dynamics: Extending Beyond the Neuron Doctrine. Berlin: Springer.

Bressler S.L., Kelso J.A. (2001). Cortical coordination dynamics and cognition. Trends in Cognitive Sciences, 5:26-36.

Bressler S.L., McIntosh A.R. (2007). The role of neural context in large-scale neurocognitive network operations. In: Jirsa VK., McIntosh A.R. (eds.), Handbook of Brain Connectivity. Berlin: Springer.

Bressler S.L., Menon V. (2010). Large-scale brain networks in cognition: emerging methods and principles. Trends in Cognitive Sciences, 14:277-290.

Bressler SL (2007) The formation of global neurocognitive state. In Neurodynamics of Cognition and Consciousness (pp. 61-72). Springer.

Bressler SL (2008) Neurocognitive networks. Scholarpedia 3:1567.

223 Bressler SL, Kelso JS (2016) Coordination dynamics in cognitive neuroscience. Frontiers in Neuroscience 10:397.

Bressler SL, Menon V (2010) Large-scale brain networks in cognition: emerging methods and principles. Trends in Cognitive Sciences 14:277-290.

Bressler SL, Tognoli E (2006) Operational principles of neurocognitive networks. International Journal of Psychophysiology 60:139-148.

Bressler, S. L. (1999). The dynamic manifestation of cognitive structures in the cerebral cortex. In Understanding Representation in the Cognitive Sciences, Springer US:121-126.

Bressler, S. L. (2003). Cortical coordination dynamics and the disorganization syndrome in schizophrenia. Neuropsychopharmacology, 28(S1), S35.

Bressler, S. L., & Seth, A. K. (2011). Wiener–Granger causality: a well established methodology. Neuroimage, 58(2), 323-329.

Bressler, S. L., Tang, W., Sylvester, C. M., Shulman, G. L., & Corbetta, M. (2008). Top-down control of human visual cortex by frontal and parietal cortex in anticipatory visual spatial attention. The Journal of Neuroscience, 28(40), 10056- 10061.

Bressler, S.L. (2008). Neurocognitive networks. Scholarpedia, 3(2):1567

Bressler, S.L., Seth, A.K. (2011). Wiener–Granger causality: a well established methodology. Neuroimage, 58(2), 323-329.

Bressler, Steven L (2008), Scholarpedia, 3(2):1567.

Brovelli, A., Ding, M., Ledberg, A., Chen, Y., Nakamura, R. & Bressler, S. L. (2004). Beta oscillations in a large-scale sensorimotor cortical network: directional influences revealed by Granger causality. Proceedings of the National Academy of Sciences of the United States of America, 101, 9849-54.

Brovelli, A., Ding, M., Ledberg, A., Chen, Y., Nakamura, R., & Bressler, S. L. (2004). Beta oscillations in a large-scale sensorimotor cortical network: directional influences revealed by Granger causality. Proceedings of the National Academy of Sciences of the United States of America, 101(26), 9849-9854.

Cameron, R. (2002). The ontology of Aristotle's final cause. Apeiron, 35(2), 153- 179.

Cartwright, N. (1979). Causal laws and effective strategies. Nous, 419-437.

224 Cartwright, N. (2004). Causality: One Word, Many Things. Philosophy of Science, 71: 805-19.

Chatfield, C. (2013). The analysis of time series: an introduction. CRC press.

Chen, Y., Bressler, S. L. & Ding, M. (2006). Frequency decomposition of conditional Granger causality and application to multivariate neural field potential data. Journal of Neuroscience Methods, 150, 228-37.

Chen, Y., Rangarajan, G., Feng, J. & Ding, M. (2004). Analyzing multiple nonlinear time series with extended Granger causality. Physics Letters A, 324, 26-35.

Chicharro, D., & Ledberg, A. (2012). When two become one: The limits of causality analysis of brain dynamics. PloS ONE, 7(3), e32466.

Chornoboy, E. S., Schramm, L. P. & Karr, A. F. (1988). Maximum likelihood identification of neural point process systems. Biological Cybernetics 59, 265-75.

Cold, C. S. H. P. M., & Cold, C. S. H. H. B. (2007). Observed brain dynamics. Oxford University Press.

Damos, P. (2016). Using multivariate cross correlations, Granger causality and graphical models to quantify spatiotemporal synchronization and causality between pest populations. BMC ecology, 16(1), 33.

David, O., & Friston, K. J. (2003). A neural mass model for MEG/EEG:: coupling and neuronal dynamics. NeuroImage, 20(3), 1743-1755.

Davies, P. C. W. (2012). The epigenome and top-down causality. Interface focus, 2(1), 42-48.

De Pierris, M. "Kant and Hume on Causality", The Stanford Encyclopedia of Philosophy (Fall 2008 Edition), Edward N. Zalta (ed.), URL = http://plato.stanford.edu/archives/fall2008/entries/kant-hume-causality/.

Dhamala, M., Rangarajan, G., & Ding, M. (2008). Analyzing information flow in brain networks with nonparametric Granger causality. Neuroimage, 41(2), 354- 362.

Ding, M., Bressler, S. L., Yang, W. & Liang, H. (2000). Short-window spectral analysis of cortical event-related potentials by adaptive multivariate autoregressive modeling: data preprocessing, model validation, and variability assessment. Biological Cybernetics, 83, 35-45.

225 Ding, M., Chen, Y., & Bressler, S. L. (2006). Granger causality: basic theory and application to neuroscience. Handbook of time series analysis: recent theoretical developments and applications, 437-460.

Doll, R., & Hill, A. B. (1956). Lung cancer and other causes of death in relation to smoking. British medical journal, 2(5001), 1071.

Dowe, P. "Causal Processes", The Stanford Encyclopedia of Philosophy (Fall 2008 Edition), Edward N. Zalta (ed.), URL = http://plato.stanford.edu/archives/fall2008/entries/causality-process/.

Earman, J. (2004). Determinism: What we have learned and what we still don’t know. Freedom and determinism, 21-46. Edelman GM, Gally JA (2013) Reentry: a key mechanism for integration of brain function. Frontiers in Integrative Neuroscience 7:63.

Einstein, A. (1939). On a stationary system with spherical symmetry consisting of many gravitating masses. Annals of Mathematics, 40(4), 922-936.

Einstein, A. (2015). Relativity: The special and the general theory. Princeton University Press.

Ellis, G. F. (2008). On the nature of causality in complex systems. Transactions of the Royal Society of South Africa, 63(1), 69-84.

Falcon, A. "Aristotle on Causality", The Stanford Encyclopedia of Philosophy (Spring 2014 Edition), Edward N. Zalta (ed.), URL = http://plato.stanford.edu/archives/spr2014/entries/aristotle-causality/.

Falcon, M. I., Riley, J. D., Jirsa, V., McIntosh, A. R., Shereen, A. D., Chen, E. E., & Solodkin, A. (2015). The virtual brain: modeling biological correlates of recovery after chronic stroke. Frontiers in neurology, 6.

Faye, J. "Backward Causation", The Stanford Encyclopedia of Philosophy (Spring 2010 Edition), Edward N. Zalta (ed.), URL = http://plato.stanford.edu/archives/spr2010/entries/causation-backwards/.

Fetzer, J. H., & Nute, D. E. (1980). A probabilistic causal calculus: Conflicting conceptions. Synthese, 44(2), 241-246.

Fodor JA (1975) The Language of Thought (Vol. 5). Harvard University Press. Frankel, L. (1986). Mutual causation, simultaneity and event description. Philosophical Studies, 49(3), 361-372.

Freeman WJ (2000) How Brains Make Up Their Minds. Columbia University Press.

226

Freeman WJ (2007) Proposed cortical “shutter” mechanism in cinematographic perception. In Neurodynamics of Cognition and Consciousness. Springer.

Freeman WJ (2012) Neurodynamics: an Exploration in Mesoscopic Brain Dynamics. Springer.

Freeman, W.J., Burke, B.C., Holmes, M.D., (2003). Aperiodic phase re-setting in scalp EEG of beta–gamma oscillations by state transitions at alpha–theta rates. Human Brain Mapping, 19, 248–272.

Freeman, W.J., Rogers, L.J., (2002). Fine temporal resolution of analytic phase reveals episodic synchronization by state transitions in gamma EEGs. Journal of Neurophysiology, 87, 937–945.

Freiwald, W. A., Valdes, P., Bosch, J., Biscay, R., Jimenez, J. C., Rodriguez, L. M., Rodriguez, V., Kreiter, A. K. & Singer, W. (1999). Testing non-linearity and directedness of interactions between neural groups in the macaque inferotemporal cortex. Journal of Neuroscience Methods, 94, 105-19.

Friston, K. (2011). Dynamic causal modeling and Granger causality Comments on: The identification of interacting networks in the brain using fMRI: Model selection, causality and deconvolution. Neuroimage, 58(2-2), 303.

Friston, K. J., Harrison, L., & Penny, W. (2003). Dynamic causal modelling. Neuroimage, 19(4), 1273-1302.

Friston, K., Moran, R., & Seth, A. K. (2013). Analysing connectivity with Granger causality and dynamic causal modelling. Current opinion in neurobiology, 23(2), 172-178.

Friston, K.J., (1994). Functional and effective connectivity in neuroimaging: a synthesis. Human Brain Mapping, 2, 56–78.

Friston, K.J., (1997). Imaging cognitive anatomy. Trends in Cognitive Sciences, 1, 21–27.

Fuster, J. M. (2014) The prefrontal cortex makes the brain a preadaptive system. Proceedings of the IEEE 102:417-426.

Fuster, J. M., & Bressler, S. L. (2012). Cognit activation: a mechanism enabling temporal integration in working memory. Trends in Cognitive Sciences, 16(4), 207-218.

Fuster, J.M. (2006). The cognit: a network model of cortical representation. Int. Journal of Psychophysiology, 60, 125–132.

227

Fuster, J.M., (1995). Memory in the cortex of the primate. Biological Research, 28, 59–72.

Fuster, J.M., (2003). Cortex and Mind: Unifying Cognition. Oxford University Press, New York.

Geweke, J. (1982). Measurement of linear dependence and feedback between multiple time series. Journal of the American Statistical Association, 77(378), 304-313.

Ghosh, A., Rho, Y., McIntosh, A. R., Kötter, R., & Jirsa, V. K. (2008). Cortical network dynamics with time delays reveals functional connectivity in the resting brain. Cognitive neurodynamics, 2(2), 115.

Ghosh, A., Rho, Y., McIntosh, A. R., Kötter, R., & Jirsa, V. K. (2008). Noise during rest enables the exploration of the brain's dynamic repertoire. PLoS computational biology, 4(10), e1000196.

Gourevitch, B., Bouquin-Jeannes, R. L. & Faucon, G. (2006). Linear and nonlinear causality between signals: methods, examples and neurophysiological applications. Biological Cybernetics, 95, 349-369.

Gower, B. (1991). Hume on probability. British Journal for the Philosophy of Science, 1-19.

Granger, C. W. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica: Journal of the Econometric Society, 424- 438.

Granger, C. W. J. (1980). Testing for causality: A personal viewpoint. Journal of Economic Dynamics and Control, 2, 329-352.

Granger, C. W. J. (2001). Essays in Econometrics: The Collected Papers of Clive W.J. Granger. Cambridge: Cambridge University Press.

Greene, B. (2011). The hidden reality: Parallel universes and the deep laws of the cosmos. Random House LLC.

Griffiths, D. J. (2005). Introduction to quantum mechanics. Pearson Education India.

Hájek, A. "Interpretations of Probability", The Stanford Encyclopedia of Philosophy (Winter 2012 Edition), Edward N. Zalta (ed.), URL = http://plato.stanford.edu/archives/win2012/entries/probability-interpret/.

228 Haken, H. (1991) Synergetic Computers and Cognition – A Top-down Approach to Neural Nets. Springer-Verlag, Berlin.

Haken, H. (1996) Principles of Brain Functioning – A Synergetic Approach to Brain Activity, Behavior and Cognition. Springer, Berlin.

Haque, A. (2013). Causality in Classical Physics. arXiv preprint arXiv:1312.2656.

Heisenberg, W. (1958). Physics and philosophy: The revolution in modern science.

Helen B., Hitchcock C. & Menzies P (eds): The Oxford Handbook of Causality, Oxford University Press, pp. 185-212, 2009.

Hill, A. B. (1965). The environment and disease: association or causation? Proceedings of the Royal Society of Medicine, 58(5), 295.

Hindmarsh, J. L., & Rose, R. M. (1984). A model of neuronal bursting using three coupled first order differential equations. Proceedings of the royal society of London B: biological sciences, 221(1222), 87-102.

Hitchcock, C. "Probabilistic Causation", The Stanford Encyclopedia of Philosophy (Winter 2012 Edition), Edward N. Zalta (ed.), URL = http://plato.stanford.edu/archives/win2012/entries/causality-probabilistic/.

Hitchcock, C. (2004). Do All and Only Causes Raise the Probabilities of Effects? MIT Press, 403-417.

Hlaváčková-Schindler, K., Paluš, M., Vejmelka, M., & Bhattacharya, J. (2007). Causality detection based on information-theoretic approaches in time series analysis. Physics Reports, 441(1), 1-46.

Hoover, K. D. (2008). Causality in economics and econometrics. The new Palgrave dictionary of economics, 2.

Jansen, B.H. (1991). Time series analysis by means of linear modeling. In Digital Biosignal Processing, Weitkunat, R. Elsevier, Amsterdam, (ed.), 157-180.

Jirsa, V. K., & Kelso, J. S. (2000). Spatiotemporal pattern formation in neural systems with heterogeneous connection topologies. Physical Review E, 62(6), 8462.

Jirsa, V. K., & Kelso, J. S. (2000). Spatiotemporal pattern formation in neural systems with heterogeneous connection topologies. Physical Review E, 62(6), 8462.

229 Jirsa, V., Sporns, O., Breakspear, M., Deco, G., & McIntosh, A. R. (2010). Towards the virtual brain: network modeling of the intact and the damaged brain. Archives italiennes de biologie, 148(3), 189-205.

Joseph T. Lizier, "JIDT: An information-theoretic toolkit for studying the dynamics of complex systems", Frontiers in Robotics and AI 1:11, 2014; doi:10.3389/frobt.2014.00011 (pre-print: arXiv:1408.3270)

Kaminski, M.J., Blinowska, K.J. (1991). A new method of the description of the information flow in the brain structures. Biological Cybernetics, 65:203-210.

Kant, Immanuel (1965) Critique of Pure Reason, orig. 1781, trans. N. Kemp Smith. New York: Macmillan Press.

Karimi, K. (2010). A Brief Introduction to Temporality and Causality. arXiv preprint arXiv:1007.2449.

Kelso JAS (2008) An essay on understanding the mind. Ecological Psychology 20:180-208.

Kelso JAS, (2009) Coordination dynamics. In Encyclopedia of Complexity and Systems Science (pp. 1537-1565). Springer.

Kelso JAS, Scholz JP, Schöner G (1986). Nonequilibrium phase transitions in coordinated biological motion: critical fluctuations. Physics Letters A, 118:279- 284.

Kelso JAS, Tognoli E (2009) Toward a complementary neuroscience: metastable coordination dynamics of the brain. In Downward Causation and the Neurobiology of Free Will (pp. 103-124). Springer.

Kelso, J. S. (1997). Dynamic patterns: The self-organization of brain and behavior. MIT press.

Kelso, JAS (1984) Phase transitions and critical behavior in human bimanual coordination. American Journal of Physiology-Regulatory, Integrative and Comparative Physiology, 246: R1000-R1004.

Kim, Y. H., Yu, R., Kulik, S. P., Shih, Y., & Scully, M. O. (2000). Delayed “choice” quantum eraser. Physical Review Letters, 84(1).

Knock, S. A., McIntosh, A. R., Sporns, O., Kötter, R., Hagmann, P., & Jirsa, V. K. (2009). The effects of physiologically plausible connectivity structure on local and global dynamics in large scale brain models. Journal of neuroscience methods, 183(1), 86-94.

230

Kozma R (2007) Neurodynamics of intentional behavior generation. In Neurodynamics of Cognition and Consciousness (pp. 131-161). Springer.

Kozma R, Freeman WJ (2009) The KIV model of intentional dynamics and decision making. Neural Networks 22:277-285.

Leon, P. S., Knock, S. A., Woodman, M. M., Domide, L., Mersmann, J., McIntosh, A. R., & Jirsa, V. (2013). The Virtual Brain: a simulator of primate brain network dynamics. Frontiers in neuroinformatics, 7.

Liang, H., Ding, M., Nakamura, R. & Bressler, S. L. (2000). Causal influences in primate cerebral cortex during visual pattern discrimination. Neuroreport,11, 2875-80. Lizier, J. T., & Prokopenko, M. (2010). Differentiating information transfer and causal effect. The European Physical Journal B-Condensed Matter and Complex Systems, 73(4), 605-615.

Luria AR (1966) Higher Cortical Functions in Man. Basic Books.

Mannino, M., & Bressler, S. L. (2015). Foundational perspectives on causality in large-scale brain networks. Physics of life reviews, 15, 107-123.

McIntosh, A.R., (2000). Towards a network theory of cognition. Neural Networks,13, 861–876.

McIntosh, A.R., (2004). Contexts and catalysts: a resolution of the localization and integration of function in the brain. Neuroinformatics, 2, 175–182.

Meehan, T. P., & Bressler, S. L. (2012). Neurocognitive networks: findings, models, and theory. Neuroscience & Biobehavioral Reviews, 36(10), 2232-2247.

Mesulam, M.-M., (1998). From sensation to cognition. Brain, 121, 1013–1052.

Mesulam, M., (1990). Large-scale neurocognitive networks and distributed processing for attention, language and memory. Annals of Neurology, 28, 597– 613.

Miles LK, Lumsden J, Richardson MJ, Macrae CN (2011) Do birds of a feather move together? Group membership and behavioral synchrony. Experimental Brain Research 211:495-503.

Miller, J. H., & Page, S. E. (2009). Complex adaptive systems: an introduction to computational models of social life. Princeton university press.

231 Morris, R., "David Hume", The Stanford Encyclopedia of Philosophy (Summer 2014 Edition), Edward N. Zalta (ed.), forthcoming URL = http://plato.stanford.edu/archives/sum2014/entries/hume/.

Motz, L., & Weaver, J. H. (1989). The Story of Physics. Springer.

Mountcastle, V.B., (1978). An organizing principle for cerebral function: the unit module and the distributed system. In: Edelman, G.M., Mountcastle, V.B. (Eds.), The Mindful Brain. MIT Press, Cambridge, MA, pp. 7–50.

Orpwood, R. (2017) Information and the origin of qualia. Frontiers in Systems Neuroscience 11:22.

Oullier O, De Guzman GC, Jantzen KJ, Lagarde J, Kelso JAS (2008) Social coordination dynamics: Measuring human bonding. Social Neuroscience 3:178- 192.

Oullier O, Kelso JAS (2009) Social coordination, from the perspective of coordination dynamics. In Encyclopedia of Complexity and Systems Science (pp. 8198-8213). Springer.

Pereda, E., Quiroga, R. Q. & Bhattacharya, J. (2005). Nonlinear multivariate analysis of neurophysiological signals. Progress in Neurobiology, 77, 1-37.

Perlovsky LI, Kozma R (2007) Neurodynamics of cognition and consciousness. In Neurodynamics of Cognition and Consciousness (pp. 1-8). Springer.

Pinker S (2005) So how does the mind work?. Mind & Language, 20(1), 1-24.

Pitt D (2017) "Mental Representation", The Stanford Encyclopedia of Philosophy (Spring 2017 Edition), EN Zalta (ed), https://plato.stanford.edu/archives/spr2017/entries/mental-representation.

Preston AR, Eichenbaum H (2013). Interplay of hippocampus and prefrontal cortex in memory. Current Biology 23:R764-R773.

Reichenbach, H. (1991). The direction of time (Vol. 65). Univ of California Press.

Ritter, P., Schirner, M., McIntosh, A. R., & Jirsa, V. K. (2013). The virtual brain integrates computational modeling and multimodal neuroimaging. Brain connectivity, 3(2), 121-145.

Roebroeck, A., Formisano, E., & Goebel, R. (2005). Mapping directed influence over the brain using Granger causality and fMRI. Neuroimage, 25(1), 230-242.

232 Roland P (in press). (2017) Space-time dynamics of membrane currents run the brain. Neuron.

Russell, B. (1913). On the notion of cause. In Proceedings of the Aristotelian society (Vol. 13, pp. 1-26). The Aristotelian Society; Blackwell Publishing.

Ryali, S., Supekar, K., Chen, T., & Menon, V. (2011). Multivariate dynamical systems models for estimating causal interactions in fMRI. Neuroimage, 54(2), 807-823.

Salazar, R. F., Dotson, N. M., Bressler, S. L., & Gray, C. M. (2012). Content- specific fronto-parietal synchronization during visual working memory. Science, 338(6110), 1097-1100.

Sanz-Leon, P., Knock, S. A., Spiegler, A., & Jirsa, V. K. (2015). Mathematical framework for large-scale brain network modeling in The Virtual Brain. Neuroimage, 111, 385-430.

Schaffer, J. "The Metaphysics of Causality", The Stanford Encyclopedia of Philosophy (Summer 2014 Edition), Edward N. Zalta (ed.). URL = http://plato.stanford.edu/archives/sum2014/entries/causation-metaphysics/. Schaffer, J. (2001). Causes as probability raisers of processes. The Journal of philosophy, 98(2), 75-92.

Schreiber, T. (2000). Measuring information transfer. Physical review letters, 85(2), 461.

Seth, A. K. (2005). Causal connectivity analysis of evolved neural networks during behavior. Network: Computation in Neural Systems, 16, 35-55.

Seth, A. K. & Edelman, G. M. (2007). Distinguishing causal interactions in neural populations. Neural Computation, 19(4): 910-933

Seth, A.K., (2008). Causal networks in simulated neural systems. Cognitive Neurodynamics. 2, 49–64.

Shannon, C. E. (2001). A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review, 5(1), 3-55.

Sharma, J., Angelucci, A., & Sur, M. (2000). Induction of visual orientation modules in auditory cortex. Nature, 404(6780), 841-847.

Shimony, A. "Bell's Theorem", The Stanford Encyclopedia of Philosophy (Winter 2013 Edition), Edward N. Zalta (ed.), URL = http://plato.stanford.edu/archives/win2013/entries/bell-theorem/.

233 Simon, P., de Laplace, M., & Truscott, F. W. (1951). A philosophical essay on probabilities. Dover Publ.

Skarda CA, Freeman WJ (1987) How brains make chaos in order to make sense of the world. Behavioral and Brain Sciences 10:161-173.

Sporns, O. (2011). Networks of the Brain. MIT press.

Sporns, Olaf (2007) Brain connectivity. Scholarpedia, 2(10):4695.

Stanford, Kyle (2013) "Underdetermination of Scientific Theory", The Stanford Encyclopedia of Philosophy (Spring 2016 Edition), Edward N. Zalta (ed.), URL = https://plato.stanford.edu/archives/spr2016/entries/scientific-underdetermination/.

Stefanescu, R. A., & Jirsa, V. K. (2008). A low dimensional description of globally coupled heterogeneous neural networks of excitatory and inhibitory neurons. PLoS computational biology, 4(11), e1000219.

Stephan, K. E., Penny, W. D., Moran, R. J., den Ouden, H. E., Daunizeau, J., & Friston, K. J. (2010). Ten simple rules for dynamic causal modeling. Neuroimage, 49(4), 3099-3109.

Stokes, P. A., & Purdon, P. L. (2017). A study of problems encountered in Granger causality analysis from a neuroscience perspective. Proceedings of the National Academy of Sciences, 114(34), E7063-E7072.

Suppes, P. (1970). A probabilistic theory of causality. Amsterdam: North-Holland Publishing Company, 7-8.

Tang, W., Bressler, S. L., Sylvester, C. M., Shulman, G. L., & Corbetta, M. (2012). Measuring Granger causality between cortical regions from voxelwise fMRI BOLD signals with LASSO. PLoS computational biology, 8(5), e1002513.

Tononi, G. & Sporns, O. (2003). Measuring information integration. BMC Neuroscience, 4, 31.

Valdes-Sosa, P.A., Roebroeck, A., Daunizeau, J., Friston, K. (2011). Effective connectivity: influence, causality and biophysical modeling. Neuroimage, 58:339- 361. Van Gelder T (1998) The dynamical hypothesis in cognitive science. Behavioral and Brain Sciences 21:615-628.

Wagner, A. (1999). Causality in complex systems. Biology and Philosophy, 14(1), 83-101.

234 Wang, H. E., Bénar, C. G., Quilichini, P. P., Friston, K. J., Jirsa, V. K., & Bernard, C. (2014). A systematic framework for functional connectivity measures. Frontiers in neuroscience, 8, 405.

Wen X, Rangarajan G, Ding M (2013). Is Granger Causality a Viable Technique for Analyzing fMRI Data? PLoS ONE, 8(7): e67428.

Wen, X., Yao, L., Liu, Y., & Ding, M. (2012). Causal interactions in attention networks predict behavioral performance. The Journal of Neuroscience, 32(4), 1284-1292.

Werner G (2007) Metastability, criticality and phase transitions in brain and its models. Biosystems 90:496-508.

Wibral, M., Vicente, R., & Lizier, J. T. (Eds.). (2014). Directed information measures in neuroscience. Springer.

Wiener, N. (1956). The theory of prediction. In Modern Mathematics for Engineers, vol. 1 (ed. E. F. Beckenbach). New York: McGraw-Hill.

Wiener, N. (1956). The theory of prediction. Modern mathematics for engineers, 1, 125-139.

Williamson, J. (2006a). Dispositional versus epistemic causality. Minds and Machines, 16:259-276.

Williamson, J. (2006b). From Bayesianism to the epistemic view of mathematics. Philosophia Mathematica, (III), 14(3):365-369.

Williamson, J. (2007a). Causality. In Gabbay, D. and Guenthner, F., editors, Handbook of Philosophical Logic, Springer, 14, 89-120.

Wimsatt, W. C. (1994). The ontology of complex systems: levels of organization, perspectives, and causal thickets. Canadian Journal of Philosophy, 20(2), 207- 274.

Woodman M, Perdikis D, Pillai, AS, Dodel S, Huys R, Bressler S, Jirsa, V (2011) Building neurocognitive networks with a distributed functional architecture. In From Brains to Systems (pp. 101-109). Springer New York. Woodman, M. M., Pezard, L., Domide, L., Knock, S. A., Sanz-Leon, P., Mersmann, J., ... & Jirsa, V. (2014). Integrating neuroinformatics tools in TheVirtualBrain. Frontiers in neuroinformatics, 8.

Xiaotong, W., Li, Y., Yijun L., & Ding, M. (2012). The Journal of Neuroscience, 32(4):1284 –1292.

235 Zanin, M., & Papo, D. (2013). Efficient neural codes can lead to spurious synchronization. Frontiers in computational neuroscience, 7.

236